Arctic char transcriptome annotations

Two annotation tables that for a liver transcriptome of Arctic char ( Salvelinus alpinus ) (Prokkola et al). The de novo assembly available at DDBJ/EMBL/GenBank under the accession GEKT00000000. Open reading frame (ORF) peptide sequences were obtained for transcripts in the final assembly using Tran...

Full description

Bibliographic Details
Main Author: Prokkola, Jenni
Format: Dataset
Language:unknown
Published: figshare 2017
Subjects:
Online Access:https://dx.doi.org/10.6084/m9.figshare.5662741
https://figshare.com/articles/Arctic_char_transcriptome_annotations/5662741
Description
Summary:Two annotation tables that for a liver transcriptome of Arctic char ( Salvelinus alpinus ) (Prokkola et al). The de novo assembly available at DDBJ/EMBL/GenBank under the accession GEKT00000000. Open reading frame (ORF) peptide sequences were obtained for transcripts in the final assembly using TransDecoder. The predicted ORFs were annotated with four databases using Basic Local Alignment Search Tool for proteins (BLASTp v.2.2.31): predicted zebrafish (downloaded Oct 24 2015 from Ensembl) and salmon (NCBI Salmo salar Annotation Release 100) proteins using a reciprocal best hits approach and an e-value cutoff 1x10 -5 . Additionally, the ORFs were annotated with NCBI non-redundant protein database (downloaded Nov 25 th 2015) with e-value cutoff 1x10 -5 and when the query sequence matched the target sequence at >50 % protein length, and with human peptides. The first file (Annotations_Salp...") contains results that were prioritized with the order zebrafish > salmon > NCBI nr. When available, gene descriptions were retrieved from Ensembl using biomaRt in R. Gene symbols were retrieved for zebrafish Ensembl IDs, salmon Refseq IDs and NCBI gene names using Biological DataBase Network (https://biodbnet-abcc.ncifcrf.gov). Annotations were retrieved for 9,491, 4,037 and 4,117 genes with zebrafish peptides, Atlantic salmon predicted peptides and the NCBI nr-database, respectively. In total, 20,394 out of 44,784 ORFs in the assembly (45.5%) were annotated with 18,013 unique protein IDs. The second table contains all gene symbols found for the above mentioned fish or for human peptide sequences using BLASTp v.2.4.0 with an E-value threshold 10 -5 . After identifying human orthologs, we supplemented the annotation by the previously obtained gene symbols for genes that were missing an annotation. In total 18,232 genes were annotated using this approach with 9,577 unique gene symbols.