Novel methods for comparing and evaluating single and metagenomic assemblies

The current revolution in genomics has been made possible by software tools called genome assemblers, which stitch together DNA fragments “read” by sequencing machines into complete or nearly complete genome sequences. Despite decades of research in this field and the development of dozens of genome...

Full description

Bibliographic Details
Main Author: Hill, Christopher Michael
Other Authors: Pop, Mihai, Digital Repository at the University of Maryland, University of Maryland (College Park, Md.), Computer Science
Format: Doctoral or Postdoctoral Thesis
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/1903/17100
https://doi.org/10.13016/M28K9D
id ftunivmaryland:oai:drum.lib.umd.edu:1903/17100
record_format openpolar
spelling ftunivmaryland:oai:drum.lib.umd.edu:1903/17100 2023-05-15T15:16:30+02:00 Novel methods for comparing and evaluating single and metagenomic assemblies Hill, Christopher Michael Pop, Mihai Digital Repository at the University of Maryland University of Maryland (College Park, Md.) Computer Science 2015 application/pdf http://hdl.handle.net/1903/17100 https://doi.org/10.13016/M28K9D en eng doi:10.13016/M28K9D http://hdl.handle.net/1903/17100 Bioinformatics Computer science Assembly Genome Dissertation 2015 ftunivmaryland https://doi.org/10.13016/M28K9D 2022-11-11T11:16:10Z The current revolution in genomics has been made possible by software tools called genome assemblers, which stitch together DNA fragments “read” by sequencing machines into complete or nearly complete genome sequences. Despite decades of research in this field and the development of dozens of genome assemblers, assessing and comparing the quality of assembled genome sequences still heavily relies on the availability of independently determined standards, such as manually curated genome sequences, or independently produced mapping data. The focus of this work is to develop reference-free computational methods to accurately compare and evaluate genome assemblies. We introduce a reference-free likelihood-based measure of assembly quality which allows for an objective comparison of multiple assemblies generated from the same set of reads. We define the quality of a sequence produced by an assembler as the conditional probability of observing the sequenced reads from the assembled sequence. A key property of our metric is that the true genome sequence maximizes the score, unlike other commonly used metrics. Despite the unresolved challenges of single genome assembly, the decreasing costs of sequencing technology has led to a sharp increase in metagenomics projects over the past decade. These projects allow us to better understand the diversity and function of microbial communities found in the environment, including the ocean, Arctic regions, other living organisms, and the human body. We extend our likelihood-based framework and show that we can accurately compare assemblies of these complex bacterial communities. After an assembly has been produced, it is not an easy task determining what parts of the underlying genome are missing, what parts are mistakes, and what parts are due to experimental artifacts from the sequencing machine. Here we introduce VALET, the first reference-free pipeline that flags regions in metagenomic assemblies that are statistically inconsistent with the data generation process. VALET ... Doctoral or Postdoctoral Thesis Arctic University of Maryland: Digital Repository (DRUM) Arctic Valet ENVELOPE(151.050,151.050,61.917,61.917)
institution Open Polar
collection University of Maryland: Digital Repository (DRUM)
op_collection_id ftunivmaryland
language English
topic Bioinformatics
Computer science
Assembly
Genome
spellingShingle Bioinformatics
Computer science
Assembly
Genome
Hill, Christopher Michael
Novel methods for comparing and evaluating single and metagenomic assemblies
topic_facet Bioinformatics
Computer science
Assembly
Genome
description The current revolution in genomics has been made possible by software tools called genome assemblers, which stitch together DNA fragments “read” by sequencing machines into complete or nearly complete genome sequences. Despite decades of research in this field and the development of dozens of genome assemblers, assessing and comparing the quality of assembled genome sequences still heavily relies on the availability of independently determined standards, such as manually curated genome sequences, or independently produced mapping data. The focus of this work is to develop reference-free computational methods to accurately compare and evaluate genome assemblies. We introduce a reference-free likelihood-based measure of assembly quality which allows for an objective comparison of multiple assemblies generated from the same set of reads. We define the quality of a sequence produced by an assembler as the conditional probability of observing the sequenced reads from the assembled sequence. A key property of our metric is that the true genome sequence maximizes the score, unlike other commonly used metrics. Despite the unresolved challenges of single genome assembly, the decreasing costs of sequencing technology has led to a sharp increase in metagenomics projects over the past decade. These projects allow us to better understand the diversity and function of microbial communities found in the environment, including the ocean, Arctic regions, other living organisms, and the human body. We extend our likelihood-based framework and show that we can accurately compare assemblies of these complex bacterial communities. After an assembly has been produced, it is not an easy task determining what parts of the underlying genome are missing, what parts are mistakes, and what parts are due to experimental artifacts from the sequencing machine. Here we introduce VALET, the first reference-free pipeline that flags regions in metagenomic assemblies that are statistically inconsistent with the data generation process. VALET ...
author2 Pop, Mihai
Digital Repository at the University of Maryland
University of Maryland (College Park, Md.)
Computer Science
format Doctoral or Postdoctoral Thesis
author Hill, Christopher Michael
author_facet Hill, Christopher Michael
author_sort Hill, Christopher Michael
title Novel methods for comparing and evaluating single and metagenomic assemblies
title_short Novel methods for comparing and evaluating single and metagenomic assemblies
title_full Novel methods for comparing and evaluating single and metagenomic assemblies
title_fullStr Novel methods for comparing and evaluating single and metagenomic assemblies
title_full_unstemmed Novel methods for comparing and evaluating single and metagenomic assemblies
title_sort novel methods for comparing and evaluating single and metagenomic assemblies
publishDate 2015
url http://hdl.handle.net/1903/17100
https://doi.org/10.13016/M28K9D
long_lat ENVELOPE(151.050,151.050,61.917,61.917)
geographic Arctic
Valet
geographic_facet Arctic
Valet
genre Arctic
genre_facet Arctic
op_relation doi:10.13016/M28K9D
http://hdl.handle.net/1903/17100
op_doi https://doi.org/10.13016/M28K9D
_version_ 1766346800400171008