De novo assembly and comparative genomics of teleosts

During the last 20 years, genome sequencing and assembly projects have changed from requiring large international collaborations to a task that a handful of people can plan and conduct. This has been driven by improvements in sequencing technology and computational methods. More and more sequencing...

Full description

Bibliographic Details
Main Author: Tørresen, Ole Kristian
Format: Doctoral or Postdoctoral Thesis
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10852/57728
http://urn.nb.no/URN:NBN:no-60406
Description
Summary:During the last 20 years, genome sequencing and assembly projects have changed from requiring large international collaborations to a task that a handful of people can plan and conduct. This has been driven by improvements in sequencing technology and computational methods. More and more sequencing and assembly projects are being conducted, with older assemblies being updated and improved, resulting in deeper understanding of the biology of a large and steadily growing number of species. The projects described in this thesis focus on genome assemblies created from species of the order Gadiformes, an order containing commercially and ecologically important fishes. Here, these assemblies are investigated in detail and compared to other teleost genome assemblies, with special attention to immune genes and short tandem repeats. We have updated and substantially improved the Atlantic cod (Gadus morhua) genome assembly with the use of different sequencing technologies and computational approaches. A major finding was that the presence of short tandem repeats (STRs) is the main factor that led to the fragmentation of the previous assembly. STRs are hypermutating loci that occur at high frequency (loci/Mbp) and high density (bp/Mbp) in the cod genome, surpassing that of other published genome assemblies. The STRs likely contribute to substantial genetic variation in natural cod populations. The Atlantic cod lacks genes involved in the major histocompatibility complex (MHC) II pathway, which is the pathway that normally detects and initiates a response against bacterial pathogens and thus is a crucial part of the adaptive immune system. To infer when in the ancestry of cod these genes were lost, we sequenced and assembled the genomes of 66 teleost species. We found that the loss is shared by all species in the order Gadiformes, and that there is an expanded repertoire of MHCI genes in the Gadiformes, which is likely connected with the large number of species in this order. Since the 66 new teleost (including gadiform) genome assemblies are fragmented, the properties of STRs and multi-copy immune genes are not easily investigated. To further elucidate their role in Gadiformes, we sequenced and assembled the genome of haddock (Melanogrammus aeglefinus), a relative of cod. Our result shows that the high density and frequency of STRs is a feature likely shared by all codfishes (a family inside Gadiformes), and possibly all Gadiformes. Cod and haddock share a similar repertoire of the innate immune Toll-like receptor (TLR) genes, with both losses and expansions. The expansions might be part of a compensatory mechanism for the absence of MHCII. Another class of genes, the NOD-like receptors (NLRs) has been reported in large numbers in species without an adaptive immune system. We find that cod and haddock as well as most other teleosts generally have a high number of NLRs, with a likely expansion at the root of this clade. Thus, a high number of NLRs in teleosts does not seem to be connected with the presence or absence of MHCII. This thesis shows what kind of questions genome assemblies created for different purposes can answer. Ideally, genome assemblies for all kinds of species should be created, upgraded and updated based on the best available technologies. But this is costly. With the right planning and set-up, assemblies based on low-coverage sequencing can be very powerful with regards to topics such as the presence/absence of genes and for phylogeny. Also, even with moderate amounts of long-read PacBio sequencing, it is possible to create highly contiguous genome assemblies addressing issues that are impossible to elucidate with fragmented assemblies, such as the amount of multi-copy immune genes.