From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data

The metagenome of a microbial community contains a large quantity of information about the inter-strain genetic variation present in that community. Genome assemblers using algorithms designed for use with isolate genomes obscure the inter-strain variation within metagenomic data. Analysing this var...

Full description

Bibliographic Details
Main Author: Amos, Timothy
Format: Master Thesis
Language:English
Published: UNSW, Sydney 2011
Subjects:
Online Access:http://hdl.handle.net/1959.4/51820
https://unsworks.unsw.edu.au/bitstreams/d8c01be4-83fe-4b2e-88a2-78f5d4b12394/download
https://doi.org/10.26190/unsworks/15386
id ftunswworks:oai:unsworks.library.unsw.edu.au:1959.4/51820
record_format openpolar
spelling ftunswworks:oai:unsworks.library.unsw.edu.au:1959.4/51820 2023-05-15T13:52:09+02:00 From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data Amos, Timothy 2011 application/pdf http://hdl.handle.net/1959.4/51820 https://unsworks.unsw.edu.au/bitstreams/d8c01be4-83fe-4b2e-88a2-78f5d4b12394/download https://doi.org/10.26190/unsworks/15386 EN eng UNSW, Sydney http://hdl.handle.net/1959.4/51820 https://unsworks.unsw.edu.au/bitstreams/d8c01be4-83fe-4b2e-88a2-78f5d4b12394/download https://doi.org/10.26190/unsworks/15386 open access https://purl.org/coar/access_right/c_abf2 CC BY-NC-ND 3.0 https://creativecommons.org/licenses/by-nc-nd/3.0/au/ free_to_read CC-BY-NC-ND Diversity Metagenomics Bioinformatics Strains master thesis http://purl.org/coar/resource_type/c_bdcc 2011 ftunswworks https://doi.org/10.26190/unsworks/15386 2022-08-09T07:40:36Z The metagenome of a microbial community contains a large quantity of information about the inter-strain genetic variation present in that community. Genome assemblers using algorithms designed for use with isolate genomes obscure the inter-strain variation within metagenomic data. Analysing this variation in metagenomic data is further complicated by sequencing errors that add noise to the system by making base assignments ambiguous. In order to develop improved computational methods for metagenome analysis, simulations were performed using genome data of individual species. A software program, MetaSim, was used to generate simulated reads. Assemblies of these reads were used to investigate the development of an error model to confidently identify SNPs (Single Nucleotide Polymorphisms). This approach proved limited due to the nature of the MetaSim software and the insufficient availability of consistent, well-documented data. As an alternative approach, a graphical analysis of unitigs (high confidence contigs) was developed. This approach provided accurate predictions of whether each unitig in an assembly of simulated reads consisted of only one strain, or more. The approach included developing a system of rules describing the relationship between the number and proportions of strains in an assembly and the positioning of clusters in scatter plots. The differences in densities of clusters were used to help distinguish between ambiguous cluster patterns. Idealised assemblies of simulated reads without sequencing errors were produced, to examine how sequence quality affects the ability to make inferences about inter-strain variation. Computational clustering was investigated as a means of automating the analysis. Having established an approach to analyse unitigs, environmental metagenome data was analysed. This graphical analysis provided a well-supported and parsimonious interpretation of the number of strains present in metagenome data of an Antarctic lake community, and their proportions. Master Thesis Antarc* Antarctic UNSW Sydney (The University of New South Wales): UNSWorks Antarctic
institution Open Polar
collection UNSW Sydney (The University of New South Wales): UNSWorks
op_collection_id ftunswworks
language English
topic Diversity
Metagenomics
Bioinformatics
Strains
spellingShingle Diversity
Metagenomics
Bioinformatics
Strains
Amos, Timothy
From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
topic_facet Diversity
Metagenomics
Bioinformatics
Strains
description The metagenome of a microbial community contains a large quantity of information about the inter-strain genetic variation present in that community. Genome assemblers using algorithms designed for use with isolate genomes obscure the inter-strain variation within metagenomic data. Analysing this variation in metagenomic data is further complicated by sequencing errors that add noise to the system by making base assignments ambiguous. In order to develop improved computational methods for metagenome analysis, simulations were performed using genome data of individual species. A software program, MetaSim, was used to generate simulated reads. Assemblies of these reads were used to investigate the development of an error model to confidently identify SNPs (Single Nucleotide Polymorphisms). This approach proved limited due to the nature of the MetaSim software and the insufficient availability of consistent, well-documented data. As an alternative approach, a graphical analysis of unitigs (high confidence contigs) was developed. This approach provided accurate predictions of whether each unitig in an assembly of simulated reads consisted of only one strain, or more. The approach included developing a system of rules describing the relationship between the number and proportions of strains in an assembly and the positioning of clusters in scatter plots. The differences in densities of clusters were used to help distinguish between ambiguous cluster patterns. Idealised assemblies of simulated reads without sequencing errors were produced, to examine how sequence quality affects the ability to make inferences about inter-strain variation. Computational clustering was investigated as a means of automating the analysis. Having established an approach to analyse unitigs, environmental metagenome data was analysed. This graphical analysis provided a well-supported and parsimonious interpretation of the number of strains present in metagenome data of an Antarctic lake community, and their proportions.
format Master Thesis
author Amos, Timothy
author_facet Amos, Timothy
author_sort Amos, Timothy
title From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
title_short From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
title_full From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
title_fullStr From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
title_full_unstemmed From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
title_sort from organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
publisher UNSW, Sydney
publishDate 2011
url http://hdl.handle.net/1959.4/51820
https://unsworks.unsw.edu.au/bitstreams/d8c01be4-83fe-4b2e-88a2-78f5d4b12394/download
https://doi.org/10.26190/unsworks/15386
geographic Antarctic
geographic_facet Antarctic
genre Antarc*
Antarctic
genre_facet Antarc*
Antarctic
op_relation http://hdl.handle.net/1959.4/51820
https://unsworks.unsw.edu.au/bitstreams/d8c01be4-83fe-4b2e-88a2-78f5d4b12394/download
https://doi.org/10.26190/unsworks/15386
op_rights open access
https://purl.org/coar/access_right/c_abf2
CC BY-NC-ND 3.0
https://creativecommons.org/licenses/by-nc-nd/3.0/au/
free_to_read
op_rightsnorm CC-BY-NC-ND
op_doi https://doi.org/10.26190/unsworks/15386
_version_ 1766256416012632064