A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome

ABSTRACT Current supervised phylogeny-based methods fall short on recognizing species assembled from metagenomic datasets from under-investigated habitats, as they are often incomplete or lack closely known relatives. Here, we report an efficient software suite, “Genome Constellation”, that estimate...

Full description

Bibliographic Details
Main Authors: Wang, Zhong, Ho, Harrison, Egan, Rob, Yao, Shijie, Kang, Dongwan, Froula, Jeff, Sevim, Volkan, Schulz, Frederik, Shay, Jackie, Macklin, Derek, McCue, Kayla, Orsini, Rachel, Barich, Daniel, Sedlacek, Christopher, Li, Wei, Morgan-Kiss, Rachael, Woyke, Tanja, Slonczewski, Joan
Format: Article in Journal/Newspaper
Language:unknown
Published: eScholarship, University of California 2019
Subjects:
Online Access:https://escholarship.org/uc/item/5jx0c4hg
Description
Summary:ABSTRACT Current supervised phylogeny-based methods fall short on recognizing species assembled from metagenomic datasets from under-investigated habitats, as they are often incomplete or lack closely known relatives. Here, we report an efficient software suite, “Genome Constellation”, that estimates similarities between genomes based on their k-mer matches, and subsequently uses these similarities for classification, clustering, and visualization. The clusters of reference genomes formed by Genome Constellation closely resemble known phylogenetic relationships while simultaneously revealing unexpected connections. In a dataset containing 1,693 draft genomes assembled from the Antarctic lake communities where only 40% could be placed in a phylogenetic tree, Genome Constellation improves taxa assignment to 61%. It revealed six clusters derived from new bacterial phyla and 63 new giant viruses, 3 of which missed by the traditional marker-based approach. In summary, we demonstrate that Genome Constellation can tackle the computational and algorithmic challenges in large-scale taxonomy analyses in metagenomics.