Transcript assembly and peptide sequences of Atlantic cod

The reference is based on a de novo Trinity [1,2] assembly because the official gene models versions did not contain full length sequences for all genes. The assembly consists of sequences from the following RNA-Seq data: Gadus morhua Transcriptome or Gene expressionNCBI project : PRJNA277848https:/...

Full description

Bibliographic Details
Main Authors: Xiaokang Zhang, Furmanek, Tomasz, Jonassen, Inge, Goksøyr, Anders
Format: Article in Journal/Newspaper
Language:unknown
Published: figshare 2020
Subjects:
Online Access:https://dx.doi.org/10.6084/m9.figshare.c.5168303
https://figshare.com/collections/Transcript_assembly_and_peptide_sequences_of_Atlantic_cod/5168303
Description
Summary:The reference is based on a de novo Trinity [1,2] assembly because the official gene models versions did not contain full length sequences for all genes. The assembly consists of sequences from the following RNA-Seq data: Gadus morhua Transcriptome or Gene expressionNCBI project : PRJNA277848https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA277848Developmental stages: 10-dph, 20-dph, 30-dph, 45-dph, 60-dph, 90-dph SRR2045416 brain ,SRR2045417 gills, SRR2045418 heart, SRR2045419 muscle, SRR2045420 liver, SRR2045421 kidney, SRR2045422 bones, SRR2045423 intestine, SRR2045425 embryo, SRR2045415 ovary Three cod liver samples:GEO accession: GSE106968 [3]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE106968Samples: dcod_12_S3, dcod_1_S1, dcod_25_S25 Reads were assembled using Trinity through the Agalma pipeline version 0.5.0 [4]. The transcript assemblies from different stages and tissue samples were mapped to both cod genomes (gadMor 1 and 2) [5,6]. Transcripts were mapped to both genomes as each genome is missing some genes. Transcripts were annotated with UniProtKB, Zebrafish ENSEMBL gene model and Medaka ENSEMBL gene model. The transcripts were grouped in clusters based on similarity. We chose the transcript with the longest match to the genomes, and the transcript with the best annotation BLAST score. In cases where the same transcript had the longest match to the genome and the best annotation blast score, only one transcript was added. As the transcripts are assembled from many samples we do not know if the differences between transcripts in a cluster are splice variants or assembly errors, as there may be errors in the assemblies based on RNA-Seq. The aim was to make a more complete reference with full length transcripts. Bibliography 1 Grabherr, M.G. et al. (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–6522 Haas, B.J. et al. (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–15123 Yadetie, F. et al. (2018) RNA-Seq analysis of transcriptome responses in Atlantic cod (Gadus morhua) precision-cut liver slices exposed to benzo[a]pyrene and 17α-ethynylestradiol. Aquat. Toxicol. 201, 174–1864 Dunn, C.W. et al. (2013) Agalma: an automated phylogenomics workflow. BMC Bioinformatics 14, 3305 Star, B. et al. (2011) The genome sequence of Atlantic cod reveals a unique immune system. Nature 477, 207–2106 Tørresen, O.K. et al. (2017) An improved genome assembly uncovers prolific tandem repeats in Atlantic cod. BMC Genomics 18, 95