Data_Sheet_3_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV

Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequ...

Full description

Bibliographic Details
Main Authors: Sigmund Ramberg, Bjørn Høyheim, Tone-Kari Knutsdatter Østbye, Rune Andreassen
Format: Dataset
Language:unknown
Published: 2021
Subjects:
Online Access:https://doi.org/10.3389/fgene.2021.656334.s003
https://figshare.com/articles/dataset/Data_Sheet_3_A_de_novo_Full-Length_mRNA_Transcriptome_Generated_From_Hybrid-Corrected_PacBio_Long-Reads_Improves_the_Transcript_Annotation_and_Identifies_Thousands_of_Novel_Splice_Variants_in_Atlantic_Salmon_CSV/14492688
id ftfrontimediafig:oai:figshare.com:article/14492688
record_format openpolar
spelling ftfrontimediafig:oai:figshare.com:article/14492688 2023-05-15T15:31:37+02:00 Data_Sheet_3_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV Sigmund Ramberg Bjørn Høyheim Tone-Kari Knutsdatter Østbye Rune Andreassen 2021-04-27T04:52:46Z https://doi.org/10.3389/fgene.2021.656334.s003 https://figshare.com/articles/dataset/Data_Sheet_3_A_de_novo_Full-Length_mRNA_Transcriptome_Generated_From_Hybrid-Corrected_PacBio_Long-Reads_Improves_the_Transcript_Annotation_and_Identifies_Thousands_of_Novel_Splice_Variants_in_Atlantic_Salmon_CSV/14492688 unknown doi:10.3389/fgene.2021.656334.s003 https://figshare.com/articles/dataset/Data_Sheet_3_A_de_novo_Full-Length_mRNA_Transcriptome_Generated_From_Hybrid-Corrected_PacBio_Long-Reads_Improves_the_Transcript_Annotation_and_Identifies_Thousands_of_Novel_Splice_Variants_in_Atlantic_Salmon_CSV/14492688 CC BY 4.0 CC-BY Genetics Genetic Engineering Biomarkers Developmental Genetics (incl. Sex Determination) Epigenetics (incl. Genome Methylation and Epigenomics) Gene Expression (incl. Microarray and other genome-wide approaches) Genome Structure and Regulation Genomics Genetically Modified Animals Livestock Cloning Gene and Molecular Therapy Atlantic salmon transcriptome full-length mRNA hybrid error correction PacBio Iso-seq Illumina sequencing Dataset 2021 ftfrontimediafig https://doi.org/10.3389/fgene.2021.656334.s003 2021-04-28T22:58:18Z Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequence based in silico predictions supported by ESTs and short-read sequencing data. However, recent progress in long-read sequencing technologies now allows for full-length transcript sequencing from single RNA-molecules. This study provides a de novo full-length mRNA transcriptome from liver, head-kidney and gill materials. A pipeline was developed based on Iso-seq sequencing of long-reads on the PacBio platform (HQ reads) followed by error-correction of the HQ reads by short-reads from the Illumina platform. The pipeline successfully processed more than 1.5 million long-reads and more than 900 million short-reads into error-corrected HQ reads. A surprisingly high percentage (32%) represented expressed interspersed repeats, while the remaining were processed into 71 461 full-length mRNAs from 23 071 loci. Each transcript was supported by several single-molecule long-read sequences and at least three short-reads, assuring a high sequence accuracy. On average, each gene was represented by three isoforms. Comparisons to the current Atlantic salmon transcripts in the RefSeq database showed that the long-read transcriptome validated 25% of all known transcripts, while the remaining full-length transcripts were novel isoforms, but few were transcripts from novel genes. A comparison to the current genome assembly indicates that the long-read transcriptome may aid in improving transcript annotation as well as provide long-read linkage information useful for improving the genome assembly. More than 80% of transcripts were assigned GO terms and thousands of transcripts were from genes or splice-variants expressed in an organ-specific manner demonstrating that hybrid error-corrected long-read transcriptomes may be applied to ... Dataset Atlantic salmon Salmo salar Frontiers: Figshare
institution Open Polar
collection Frontiers: Figshare
op_collection_id ftfrontimediafig
language unknown
topic Genetics
Genetic Engineering
Biomarkers
Developmental Genetics (incl. Sex Determination)
Epigenetics (incl. Genome Methylation and Epigenomics)
Gene Expression (incl. Microarray and other genome-wide approaches)
Genome Structure and Regulation
Genomics
Genetically Modified Animals
Livestock Cloning
Gene and Molecular Therapy
Atlantic salmon
transcriptome
full-length mRNA
hybrid error correction
PacBio Iso-seq
Illumina sequencing
spellingShingle Genetics
Genetic Engineering
Biomarkers
Developmental Genetics (incl. Sex Determination)
Epigenetics (incl. Genome Methylation and Epigenomics)
Gene Expression (incl. Microarray and other genome-wide approaches)
Genome Structure and Regulation
Genomics
Genetically Modified Animals
Livestock Cloning
Gene and Molecular Therapy
Atlantic salmon
transcriptome
full-length mRNA
hybrid error correction
PacBio Iso-seq
Illumina sequencing
Sigmund Ramberg
Bjørn Høyheim
Tone-Kari Knutsdatter Østbye
Rune Andreassen
Data_Sheet_3_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV
topic_facet Genetics
Genetic Engineering
Biomarkers
Developmental Genetics (incl. Sex Determination)
Epigenetics (incl. Genome Methylation and Epigenomics)
Gene Expression (incl. Microarray and other genome-wide approaches)
Genome Structure and Regulation
Genomics
Genetically Modified Animals
Livestock Cloning
Gene and Molecular Therapy
Atlantic salmon
transcriptome
full-length mRNA
hybrid error correction
PacBio Iso-seq
Illumina sequencing
description Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequence based in silico predictions supported by ESTs and short-read sequencing data. However, recent progress in long-read sequencing technologies now allows for full-length transcript sequencing from single RNA-molecules. This study provides a de novo full-length mRNA transcriptome from liver, head-kidney and gill materials. A pipeline was developed based on Iso-seq sequencing of long-reads on the PacBio platform (HQ reads) followed by error-correction of the HQ reads by short-reads from the Illumina platform. The pipeline successfully processed more than 1.5 million long-reads and more than 900 million short-reads into error-corrected HQ reads. A surprisingly high percentage (32%) represented expressed interspersed repeats, while the remaining were processed into 71 461 full-length mRNAs from 23 071 loci. Each transcript was supported by several single-molecule long-read sequences and at least three short-reads, assuring a high sequence accuracy. On average, each gene was represented by three isoforms. Comparisons to the current Atlantic salmon transcripts in the RefSeq database showed that the long-read transcriptome validated 25% of all known transcripts, while the remaining full-length transcripts were novel isoforms, but few were transcripts from novel genes. A comparison to the current genome assembly indicates that the long-read transcriptome may aid in improving transcript annotation as well as provide long-read linkage information useful for improving the genome assembly. More than 80% of transcripts were assigned GO terms and thousands of transcripts were from genes or splice-variants expressed in an organ-specific manner demonstrating that hybrid error-corrected long-read transcriptomes may be applied to ...
format Dataset
author Sigmund Ramberg
Bjørn Høyheim
Tone-Kari Knutsdatter Østbye
Rune Andreassen
author_facet Sigmund Ramberg
Bjørn Høyheim
Tone-Kari Knutsdatter Østbye
Rune Andreassen
author_sort Sigmund Ramberg
title Data_Sheet_3_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV
title_short Data_Sheet_3_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV
title_full Data_Sheet_3_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV
title_fullStr Data_Sheet_3_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV
title_full_unstemmed Data_Sheet_3_A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.CSV
title_sort data_sheet_3_a de novo full-length mrna transcriptome generated from hybrid-corrected pacbio long-reads improves the transcript annotation and identifies thousands of novel splice variants in atlantic salmon.csv
publishDate 2021
url https://doi.org/10.3389/fgene.2021.656334.s003
https://figshare.com/articles/dataset/Data_Sheet_3_A_de_novo_Full-Length_mRNA_Transcriptome_Generated_From_Hybrid-Corrected_PacBio_Long-Reads_Improves_the_Transcript_Annotation_and_Identifies_Thousands_of_Novel_Splice_Variants_in_Atlantic_Salmon_CSV/14492688
genre Atlantic salmon
Salmo salar
genre_facet Atlantic salmon
Salmo salar
op_relation doi:10.3389/fgene.2021.656334.s003
https://figshare.com/articles/dataset/Data_Sheet_3_A_de_novo_Full-Length_mRNA_Transcriptome_Generated_From_Hybrid-Corrected_PacBio_Long-Reads_Improves_the_Transcript_Annotation_and_Identifies_Thousands_of_Novel_Splice_Variants_in_Atlantic_Salmon_CSV/14492688
op_rights CC BY 4.0
op_rightsnorm CC-BY
op_doi https://doi.org/10.3389/fgene.2021.656334.s003
_version_ 1766362152201879552