Development and application of computational methods for NGS-based microbiome research

The advance of DNA sequencing technologies has dramatically expanded our knowledge of microbial community composition and their functions from diverse environments. The most common Next Generation Sequencing (NGS)-based methods used for this purpose are marker genes (16S ribosomal RNA (rRNA), 18S rR...

Full description

Bibliographic Details
Published in:ERJ Open Research
Main Author: Xue, Yaxin
Other Authors: orcid:0000-0001-9516-286X
Format: Doctoral or Postdoctoral Thesis
Language:English
Published: The University of Bergen 2020
Subjects:
Online Access:https://hdl.handle.net/1956/24453
Description
Summary:The advance of DNA sequencing technologies has dramatically expanded our knowledge of microbial community composition and their functions from diverse environments. The most common Next Generation Sequencing (NGS)-based methods used for this purpose are marker genes (16S ribosomal RNA (rRNA), 18S rRNA and Internal transcribed spacer (ITS)), metagenome, and metatranscriptome, which all have wide applications with different prominence. Meanwhile, numerous bioinformatic tools and workflows have been developed for a complete and comprehensive analysis of above approaches, which makes it relatively easy to achieve basic results with standard procedure. However, current workflows can only provide generic analyses for well-studied environments, and the choice of methods affect results significantly. In this thesis, I explore best analytical practices and address bioinformatic challenges in NGS-based microbiome research, with emphasis on low-biomass and poorly characterized environments. My work constitutes a combination of both advanced applied bioinformatics and sophisticated method development. Paper I and II investigated microbial community composition in human obstructive lung diseases through marker gene sequencing, exploring the best sampling procedures to avoid contamination of airway microbiomes. Paper III and IV contributed to the microbial composition and functional potentials in permafrost metagenomic samples, applying novel bioinformatic methods in a metagenome-assembled genomes (MAGs) centric view. Paper V described the new MetaRib tool for reconstructing full-length ribosomal gene sequences from the large-scale metatranscriptomic datasets. MetaRib was able to perform fast rRNA reconstruction across multiple samples with a low false positive rate, even in very large datasets, in addition it provides accurate taxonomy-independent relative abundance estimation. Doktorgradsavhandling