From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data
The metagenome of a microbial community contains a large quantity of information about the inter-strain genetic variation present in that community. Genome assemblers using algorithms designed for use with isolate genomes obscure the inter-strain variation within metagenomic data. Analysing this var...
Main Author: | |
---|---|
Format: | Master Thesis |
Language: | unknown |
Published: |
UNSW Sydney
2011
|
Subjects: | |
Online Access: | https://dx.doi.org/10.26190/unsworks/15386 http://hdl.handle.net/1959.4/51820 |
id |
ftdatacite:10.26190/unsworks/15386 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.26190/unsworks/15386 2023-05-15T13:57:30+02:00 From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data Amos, Timothy 2011 https://dx.doi.org/10.26190/unsworks/15386 http://hdl.handle.net/1959.4/51820 unknown UNSW Sydney https://creativecommons.org/licenses/by-nc-nd/3.0/au/ cc by-nc-nd 3.0 CC-BY-NC-ND Diversity Metagenomics Bioinformatics FOS Computer and information sciences Strains Dissertation thesis master thesis Thesis 2011 ftdatacite https://doi.org/10.26190/unsworks/15386 2022-04-01T18:55:33Z The metagenome of a microbial community contains a large quantity of information about the inter-strain genetic variation present in that community. Genome assemblers using algorithms designed for use with isolate genomes obscure the inter-strain variation within metagenomic data. Analysing this variation in metagenomic data is further complicated by sequencing errors that add noise to the system by making base assignments ambiguous. In order to develop improved computational methods for metagenome analysis, simulations were performed using genome data of individual species. A software program, MetaSim, was used to generate simulated reads. Assemblies of these reads were used to investigate the development of an error model to confidently identify SNPs (Single Nucleotide Polymorphisms). This approach proved limited due to the nature of the MetaSim software and the insufficient availability of consistent, well-documented data. As an alternative approach, a graphical analysis of unitigs (high confidence contigs) was developed. This approach provided accurate predictions of whether each unitig in an assembly of simulated reads consisted of only one strain, or more. The approach included developing a system of rules describing the relationship between the number and proportions of strains in an assembly and the positioning of clusters in scatter plots. The differences in densities of clusters were used to help distinguish between ambiguous cluster patterns. Idealised assemblies of simulated reads without sequencing errors were produced, to examine how sequence quality affects the ability to make inferences about inter-strain variation. Computational clustering was investigated as a means of automating the analysis. Having established an approach to analyse unitigs, environmental metagenome data was analysed. This graphical analysis provided a well-supported and parsimonious interpretation of the number of strains present in metagenome data of an Antarctic lake community, and their proportions. Master Thesis Antarc* Antarctic DataCite Metadata Store (German National Library of Science and Technology) Antarctic |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
Diversity Metagenomics Bioinformatics FOS Computer and information sciences Strains |
spellingShingle |
Diversity Metagenomics Bioinformatics FOS Computer and information sciences Strains Amos, Timothy From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data |
topic_facet |
Diversity Metagenomics Bioinformatics FOS Computer and information sciences Strains |
description |
The metagenome of a microbial community contains a large quantity of information about the inter-strain genetic variation present in that community. Genome assemblers using algorithms designed for use with isolate genomes obscure the inter-strain variation within metagenomic data. Analysing this variation in metagenomic data is further complicated by sequencing errors that add noise to the system by making base assignments ambiguous. In order to develop improved computational methods for metagenome analysis, simulations were performed using genome data of individual species. A software program, MetaSim, was used to generate simulated reads. Assemblies of these reads were used to investigate the development of an error model to confidently identify SNPs (Single Nucleotide Polymorphisms). This approach proved limited due to the nature of the MetaSim software and the insufficient availability of consistent, well-documented data. As an alternative approach, a graphical analysis of unitigs (high confidence contigs) was developed. This approach provided accurate predictions of whether each unitig in an assembly of simulated reads consisted of only one strain, or more. The approach included developing a system of rules describing the relationship between the number and proportions of strains in an assembly and the positioning of clusters in scatter plots. The differences in densities of clusters were used to help distinguish between ambiguous cluster patterns. Idealised assemblies of simulated reads without sequencing errors were produced, to examine how sequence quality affects the ability to make inferences about inter-strain variation. Computational clustering was investigated as a means of automating the analysis. Having established an approach to analyse unitigs, environmental metagenome data was analysed. This graphical analysis provided a well-supported and parsimonious interpretation of the number of strains present in metagenome data of an Antarctic lake community, and their proportions. |
format |
Master Thesis |
author |
Amos, Timothy |
author_facet |
Amos, Timothy |
author_sort |
Amos, Timothy |
title |
From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data |
title_short |
From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data |
title_full |
From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data |
title_fullStr |
From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data |
title_full_unstemmed |
From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data |
title_sort |
from organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data |
publisher |
UNSW Sydney |
publishDate |
2011 |
url |
https://dx.doi.org/10.26190/unsworks/15386 http://hdl.handle.net/1959.4/51820 |
geographic |
Antarctic |
geographic_facet |
Antarctic |
genre |
Antarc* Antarctic |
genre_facet |
Antarc* Antarctic |
op_rights |
https://creativecommons.org/licenses/by-nc-nd/3.0/au/ cc by-nc-nd 3.0 |
op_rightsnorm |
CC-BY-NC-ND |
op_doi |
https://doi.org/10.26190/unsworks/15386 |
_version_ |
1766265181684367360 |