From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data

The metagenome of a microbial community contains a large quantity of information about the inter-strain genetic variation present in that community. Genome assemblers using algorithms designed for use with isolate genomes obscure the inter-strain variation within metagenomic data. Analysing this var...

Full description

Bibliographic Details
Main Author: Amos, Timothy
Format: Master Thesis
Language:unknown
Published: UNSW Sydney 2011
Subjects:
Online Access:https://dx.doi.org/10.26190/unsworks/15386
http://hdl.handle.net/1959.4/51820
id ftdatacite:10.26190/unsworks/15386
record_format openpolar
spelling ftdatacite:10.26190/unsworks/15386 2023-05-15T13:57:30+02:00 From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data Amos, Timothy 2011 https://dx.doi.org/10.26190/unsworks/15386 http://hdl.handle.net/1959.4/51820 unknown UNSW Sydney https://creativecommons.org/licenses/by-nc-nd/3.0/au/ cc by-nc-nd 3.0 CC-BY-NC-ND Diversity Metagenomics Bioinformatics FOS Computer and information sciences Strains Dissertation thesis master thesis Thesis 2011 ftdatacite https://doi.org/10.26190/unsworks/15386 2022-04-01T18:55:33Z The metagenome of a microbial community contains a large quantity of information about the inter-strain genetic variation present in that community. Genome assemblers using algorithms designed for use with isolate genomes obscure the inter-strain variation within metagenomic data. Analysing this variation in metagenomic data is further complicated by sequencing errors that add noise to the system by making base assignments ambiguous. In order to develop improved computational methods for metagenome analysis, simulations were performed using genome data of individual species. A software program, MetaSim, was used to generate simulated reads. Assemblies of these reads were used to investigate the development of an error model to confidently identify SNPs (Single Nucleotide Polymorphisms). This approach proved limited due to the nature of the MetaSim software and the insufficient availability of consistent, well-documented data. As an alternative approach, a graphical analysis of unitigs (high confidence contigs) was developed. This approach provided accurate predictions of whether each unitig in an assembly of simulated reads consisted of only one strain, or more. The approach included developing a system of rules describing the relationship between the number and proportions of strains in an assembly and the positioning of clusters in scatter plots. The differences in densities of clusters were used to help distinguish between ambiguous cluster patterns. Idealised assemblies of simulated reads without sequencing errors were produced, to examine how sequence quality affects the ability to make inferences about inter-strain variation. Computational clustering was investigated as a means of automating the analysis. Having established an approach to analyse unitigs, environmental metagenome data was analysed. This graphical analysis provided a well-supported and parsimonious interpretation of the number of strains present in metagenome data of an Antarctic lake community, and their proportions. Master Thesis Antarc* Antarctic DataCite Metadata Store (German National Library of Science and Technology) Antarctic
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Diversity
Metagenomics
Bioinformatics
FOS Computer and information sciences
Strains
spellingShingle Diversity
Metagenomics
Bioinformatics
FOS Computer and information sciences
Strains
Amos, Timothy
From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
topic_facet Diversity
Metagenomics
Bioinformatics
FOS Computer and information sciences
Strains
description The metagenome of a microbial community contains a large quantity of information about the inter-strain genetic variation present in that community. Genome assemblers using algorithms designed for use with isolate genomes obscure the inter-strain variation within metagenomic data. Analysing this variation in metagenomic data is further complicated by sequencing errors that add noise to the system by making base assignments ambiguous. In order to develop improved computational methods for metagenome analysis, simulations were performed using genome data of individual species. A software program, MetaSim, was used to generate simulated reads. Assemblies of these reads were used to investigate the development of an error model to confidently identify SNPs (Single Nucleotide Polymorphisms). This approach proved limited due to the nature of the MetaSim software and the insufficient availability of consistent, well-documented data. As an alternative approach, a graphical analysis of unitigs (high confidence contigs) was developed. This approach provided accurate predictions of whether each unitig in an assembly of simulated reads consisted of only one strain, or more. The approach included developing a system of rules describing the relationship between the number and proportions of strains in an assembly and the positioning of clusters in scatter plots. The differences in densities of clusters were used to help distinguish between ambiguous cluster patterns. Idealised assemblies of simulated reads without sequencing errors were produced, to examine how sequence quality affects the ability to make inferences about inter-strain variation. Computational clustering was investigated as a means of automating the analysis. Having established an approach to analyse unitigs, environmental metagenome data was analysed. This graphical analysis provided a well-supported and parsimonious interpretation of the number of strains present in metagenome data of an Antarctic lake community, and their proportions.
format Master Thesis
author Amos, Timothy
author_facet Amos, Timothy
author_sort Amos, Timothy
title From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
title_short From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
title_full From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
title_fullStr From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
title_full_unstemmed From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
title_sort from organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
publisher UNSW Sydney
publishDate 2011
url https://dx.doi.org/10.26190/unsworks/15386
http://hdl.handle.net/1959.4/51820
geographic Antarctic
geographic_facet Antarctic
genre Antarc*
Antarctic
genre_facet Antarc*
Antarctic
op_rights https://creativecommons.org/licenses/by-nc-nd/3.0/au/
cc by-nc-nd 3.0
op_rightsnorm CC-BY-NC-ND
op_doi https://doi.org/10.26190/unsworks/15386
_version_ 1766265181684367360