Data_Sheet_1_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV
Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequ...
Main Authors: | , , , |
---|---|
Format: | Dataset |
Language: | unknown |
Published: |
2021
|
Subjects: | |
Online Access: | https://doi.org/10.3389/fgene.2021.656334.s001 https://figshare.com/articles/dataset/Data_Sheet_1_A_de_novo_Full-Length_mRNA_Transcriptome_Generated_From_Hybrid-Corrected_PacBio_Long-Reads_Improves_the_Transcript_Annotation_and_Identifies_Thousands_of_Novel_Splice_Variants_in_Atlantic_Salmon_CSV/14492682 |
id |
ftfrontimediafig:oai:figshare.com:article/14492682 |
---|---|
record_format |
openpolar |
spelling |
ftfrontimediafig:oai:figshare.com:article/14492682 2023-05-15T15:31:37+02:00 Data_Sheet_1_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV Sigmund Ramberg Bjørn Høyheim Tone-Kari Knutsdatter Østbye Rune Andreassen 2021-04-27T04:52:45Z https://doi.org/10.3389/fgene.2021.656334.s001 https://figshare.com/articles/dataset/Data_Sheet_1_A_de_novo_Full-Length_mRNA_Transcriptome_Generated_From_Hybrid-Corrected_PacBio_Long-Reads_Improves_the_Transcript_Annotation_and_Identifies_Thousands_of_Novel_Splice_Variants_in_Atlantic_Salmon_CSV/14492682 unknown doi:10.3389/fgene.2021.656334.s001 https://figshare.com/articles/dataset/Data_Sheet_1_A_de_novo_Full-Length_mRNA_Transcriptome_Generated_From_Hybrid-Corrected_PacBio_Long-Reads_Improves_the_Transcript_Annotation_and_Identifies_Thousands_of_Novel_Splice_Variants_in_Atlantic_Salmon_CSV/14492682 CC BY 4.0 CC-BY Genetics Genetic Engineering Biomarkers Developmental Genetics (incl. Sex Determination) Epigenetics (incl. Genome Methylation and Epigenomics) Gene Expression (incl. Microarray and other genome-wide approaches) Genome Structure and Regulation Genomics Genetically Modified Animals Livestock Cloning Gene and Molecular Therapy Atlantic salmon transcriptome full-length mRNA hybrid error correction PacBio Iso-seq Illumina sequencing Dataset 2021 ftfrontimediafig https://doi.org/10.3389/fgene.2021.656334.s001 2021-04-28T22:58:18Z Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequence based in silico predictions supported by ESTs and short-read sequencing data. However, recent progress in long-read sequencing technologies now allows for full-length transcript sequencing from single RNA-molecules. This study provides a de novo full-length mRNA transcriptome from liver, head-kidney and gill materials. A pipeline was developed based on Iso-seq sequencing of long-reads on the PacBio platform (HQ reads) followed by error-correction of the HQ reads by short-reads from the Illumina platform. The pipeline successfully processed more than 1.5 million long-reads and more than 900 million short-reads into error-corrected HQ reads. A surprisingly high percentage (32%) represented expressed interspersed repeats, while the remaining were processed into 71 461 full-length mRNAs from 23 071 loci. Each transcript was supported by several single-molecule long-read sequences and at least three short-reads, assuring a high sequence accuracy. On average, each gene was represented by three isoforms. Comparisons to the current Atlantic salmon transcripts in the RefSeq database showed that the long-read transcriptome validated 25% of all known transcripts, while the remaining full-length transcripts were novel isoforms, but few were transcripts from novel genes. A comparison to the current genome assembly indicates that the long-read transcriptome may aid in improving transcript annotation as well as provide long-read linkage information useful for improving the genome assembly. More than 80% of transcripts were assigned GO terms and thousands of transcripts were from genes or splice-variants expressed in an organ-specific manner demonstrating that hybrid error-corrected long-read transcriptomes may be applied to ... Dataset Atlantic salmon Salmo salar Frontiers: Figshare |
institution |
Open Polar |
collection |
Frontiers: Figshare |
op_collection_id |
ftfrontimediafig |
language |
unknown |
topic |
Genetics Genetic Engineering Biomarkers Developmental Genetics (incl. Sex Determination) Epigenetics (incl. Genome Methylation and Epigenomics) Gene Expression (incl. Microarray and other genome-wide approaches) Genome Structure and Regulation Genomics Genetically Modified Animals Livestock Cloning Gene and Molecular Therapy Atlantic salmon transcriptome full-length mRNA hybrid error correction PacBio Iso-seq Illumina sequencing |
spellingShingle |
Genetics Genetic Engineering Biomarkers Developmental Genetics (incl. Sex Determination) Epigenetics (incl. Genome Methylation and Epigenomics) Gene Expression (incl. Microarray and other genome-wide approaches) Genome Structure and Regulation Genomics Genetically Modified Animals Livestock Cloning Gene and Molecular Therapy Atlantic salmon transcriptome full-length mRNA hybrid error correction PacBio Iso-seq Illumina sequencing Sigmund Ramberg Bjørn Høyheim Tone-Kari Knutsdatter Østbye Rune Andreassen Data_Sheet_1_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV |
topic_facet |
Genetics Genetic Engineering Biomarkers Developmental Genetics (incl. Sex Determination) Epigenetics (incl. Genome Methylation and Epigenomics) Gene Expression (incl. Microarray and other genome-wide approaches) Genome Structure and Regulation Genomics Genetically Modified Animals Livestock Cloning Gene and Molecular Therapy Atlantic salmon transcriptome full-length mRNA hybrid error correction PacBio Iso-seq Illumina sequencing |
description |
Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequence based in silico predictions supported by ESTs and short-read sequencing data. However, recent progress in long-read sequencing technologies now allows for full-length transcript sequencing from single RNA-molecules. This study provides a de novo full-length mRNA transcriptome from liver, head-kidney and gill materials. A pipeline was developed based on Iso-seq sequencing of long-reads on the PacBio platform (HQ reads) followed by error-correction of the HQ reads by short-reads from the Illumina platform. The pipeline successfully processed more than 1.5 million long-reads and more than 900 million short-reads into error-corrected HQ reads. A surprisingly high percentage (32%) represented expressed interspersed repeats, while the remaining were processed into 71 461 full-length mRNAs from 23 071 loci. Each transcript was supported by several single-molecule long-read sequences and at least three short-reads, assuring a high sequence accuracy. On average, each gene was represented by three isoforms. Comparisons to the current Atlantic salmon transcripts in the RefSeq database showed that the long-read transcriptome validated 25% of all known transcripts, while the remaining full-length transcripts were novel isoforms, but few were transcripts from novel genes. A comparison to the current genome assembly indicates that the long-read transcriptome may aid in improving transcript annotation as well as provide long-read linkage information useful for improving the genome assembly. More than 80% of transcripts were assigned GO terms and thousands of transcripts were from genes or splice-variants expressed in an organ-specific manner demonstrating that hybrid error-corrected long-read transcriptomes may be applied to ... |
format |
Dataset |
author |
Sigmund Ramberg Bjørn Høyheim Tone-Kari Knutsdatter Østbye Rune Andreassen |
author_facet |
Sigmund Ramberg Bjørn Høyheim Tone-Kari Knutsdatter Østbye Rune Andreassen |
author_sort |
Sigmund Ramberg |
title |
Data_Sheet_1_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV |
title_short |
Data_Sheet_1_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV |
title_full |
Data_Sheet_1_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV |
title_fullStr |
Data_Sheet_1_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV |
title_full_unstemmed |
Data_Sheet_1_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV |
title_sort |
data_sheet_1_a de novo full-length mrna transcriptome generated from hybrid-corrected pacbio long-reads improves the transcript annotation and identifies thousands of novel splice variants in atlantic salmon.csv |
publishDate |
2021 |
url |
https://doi.org/10.3389/fgene.2021.656334.s001 https://figshare.com/articles/dataset/Data_Sheet_1_A_de_novo_Full-Length_mRNA_Transcriptome_Generated_From_Hybrid-Corrected_PacBio_Long-Reads_Improves_the_Transcript_Annotation_and_Identifies_Thousands_of_Novel_Splice_Variants_in_Atlantic_Salmon_CSV/14492682 |
genre |
Atlantic salmon Salmo salar |
genre_facet |
Atlantic salmon Salmo salar |
op_relation |
doi:10.3389/fgene.2021.656334.s001 https://figshare.com/articles/dataset/Data_Sheet_1_A_de_novo_Full-Length_mRNA_Transcriptome_Generated_From_Hybrid-Corrected_PacBio_Long-Reads_Improves_the_Transcript_Annotation_and_Identifies_Thousands_of_Novel_Splice_Variants_in_Atlantic_Salmon_CSV/14492682 |
op_rights |
CC BY 4.0 |
op_rightsnorm |
CC-BY |
op_doi |
https://doi.org/10.3389/fgene.2021.656334.s001 |
_version_ |
1766362152704147456 |