The transcriptome of Gadus morhua and its discordance with the genome

The transcriptome assembly of Atlantic cod has previously been obtained in order to investigate the quality of the genome assembly and uncover genes that have not yet been annotated. However, it was observed that a considerable number of transcripts aligned partially to the genome assembly, and othe...

Full description

Bibliographic Details
Main Author: Ramsøy, Kari Jerve
Format: Master Thesis
Language:English
Published: 2020
Subjects:
Online Access:http://hdl.handle.net/10852/81842
Description
Summary:The transcriptome assembly of Atlantic cod has previously been obtained in order to investigate the quality of the genome assembly and uncover genes that have not yet been annotated. However, it was observed that a considerable number of transcripts aligned partially to the genome assembly, and others, not at all. Further, the Atlantic cod genome is highly enriched with short tandem repeats (STRs), compared to most other vertebrates. STRs are known to be challenging to sequence correctly and difficult for mapping tools to align. The aim of this thesis was to analyze transcripts in the transcriptome of Atlantic cod that were partially mapped and transcripts that showed no sequence similarity to the reference genome. Two different mapping tools, minimap2 and GMAP, were used, in order to investigate which performed the best in aligning genomic data highly enriched in STRs. The density of STRs in partially mapped transcripts were obtained, and enrichment of specific biological processes among these transcripts were analyzed. Differences in the alignment of STRs between the two mapping tools were compared. It was searched for sequence similarity of some non-mapped transcripts to all sequenced species available, and specifically to other species of fish. Further, it was looked for synteny regions in Atlantic herring and Zebra fish, with the aim of localizing unannotated genes in the Atlantic cod genome with help of unmapped transcripts. The results show that partially mapped transcripts had a significantly higher density of STRs than transcripts that had an almost complete alignment to the genome assembly. The number of base pairs in STRs likely affected if a transcript were aligned. Generally, both tools showed limitations in aligning repetitive segments, but minimap2 was considered to be the mapping tool that was most capable of aligning to genomic data enriched in STRs. Additionally, some transcripts with no sequence similarity to Atlantic cod showed sequence similarity to species of zooplankton. They could likely be contaminants in the sample before sequencing. Other transcripts showed sequence similarity to other species of fish, but not to Atlantic cod, which may indicate that these genes are missing from the Atlantic cod genome assembly. The nup155 gene was annotated in Zebrafish and Atlantic herring, but not in Atlantic cod. It was found synteny in the genes surrounding nup155 between the three genomes, therefore the annotation of this gene is suggested to be on chromosome 4 region 4:11,929,091 - 12,047,953 in the gadMor3 genome assembly of Atlantic cod. These findings are important when the transcriptome is used as a tool for further research on the genes in Atlantic cod. Furthermore, it is vital to be aware of the weaknesses these mapping tools show when aligning STRs, and this should be taken into account when using the transcriptome for annotating genes.