A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome

ABSTRACT Current supervised phylogeny-based methods fall short on recognizing species assembled from metagenomic datasets from under-investigated habitats, as they are often incomplete or lack closely known relatives. Here, we report an efficient software suite, “Genome Constellation”, that estimate...

Full description

Bibliographic Details
Main Authors: Wang, Zhong, Ho, Harrison, Egan, Rob, Yao, Shijie, Kang, Dongwan, Froula, Jeff, Sevim, Volkan, Schulz, Frederik, Shay, Jackie, Macklin, Derek, McCue, Kayla, Orsini, Rachel, Barich, Daniel, Sedlacek, Christopher, Li, Wei, Morgan-Kiss, Rachael, Woyke, Tanja, Slonczewski, Joan
Format: Article in Journal/Newspaper
Language:unknown
Published: eScholarship, University of California 2019
Subjects:
Online Access:https://escholarship.org/uc/item/5jx0c4hg
id ftcdlib:oai:escholarship.org:ark:/13030/qt5jx0c4hg
record_format openpolar
spelling ftcdlib:oai:escholarship.org:ark:/13030/qt5jx0c4hg 2023-09-05T13:14:11+02:00 A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome Wang, Zhong Ho, Harrison Egan, Rob Yao, Shijie Kang, Dongwan Froula, Jeff Sevim, Volkan Schulz, Frederik Shay, Jackie Macklin, Derek McCue, Kayla Orsini, Rachel Barich, Daniel Sedlacek, Christopher Li, Wei Morgan-Kiss, Rachael Woyke, Tanja Slonczewski, Joan 2019-01-01 application/pdf https://escholarship.org/uc/item/5jx0c4hg unknown eScholarship, University of California qt5jx0c4hg https://escholarship.org/uc/item/5jx0c4hg public Genetics Networking and Information Technology R&D (NITRD) Human Genome article 2019 ftcdlib 2023-08-14T18:05:06Z ABSTRACT Current supervised phylogeny-based methods fall short on recognizing species assembled from metagenomic datasets from under-investigated habitats, as they are often incomplete or lack closely known relatives. Here, we report an efficient software suite, “Genome Constellation”, that estimates similarities between genomes based on their k-mer matches, and subsequently uses these similarities for classification, clustering, and visualization. The clusters of reference genomes formed by Genome Constellation closely resemble known phylogenetic relationships while simultaneously revealing unexpected connections. In a dataset containing 1,693 draft genomes assembled from the Antarctic lake communities where only 40% could be placed in a phylogenetic tree, Genome Constellation improves taxa assignment to 61%. It revealed six clusters derived from new bacterial phyla and 63 new giant viruses, 3 of which missed by the traditional marker-based approach. In summary, we demonstrate that Genome Constellation can tackle the computational and algorithmic challenges in large-scale taxonomy analyses in metagenomics. Article in Journal/Newspaper Antarc* Antarctic University of California: eScholarship Antarctic The Antarctic
institution Open Polar
collection University of California: eScholarship
op_collection_id ftcdlib
language unknown
topic Genetics
Networking and Information Technology R&D (NITRD)
Human Genome
spellingShingle Genetics
Networking and Information Technology R&D (NITRD)
Human Genome
Wang, Zhong
Ho, Harrison
Egan, Rob
Yao, Shijie
Kang, Dongwan
Froula, Jeff
Sevim, Volkan
Schulz, Frederik
Shay, Jackie
Macklin, Derek
McCue, Kayla
Orsini, Rachel
Barich, Daniel
Sedlacek, Christopher
Li, Wei
Morgan-Kiss, Rachael
Woyke, Tanja
Slonczewski, Joan
A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome
topic_facet Genetics
Networking and Information Technology R&D (NITRD)
Human Genome
description ABSTRACT Current supervised phylogeny-based methods fall short on recognizing species assembled from metagenomic datasets from under-investigated habitats, as they are often incomplete or lack closely known relatives. Here, we report an efficient software suite, “Genome Constellation”, that estimates similarities between genomes based on their k-mer matches, and subsequently uses these similarities for classification, clustering, and visualization. The clusters of reference genomes formed by Genome Constellation closely resemble known phylogenetic relationships while simultaneously revealing unexpected connections. In a dataset containing 1,693 draft genomes assembled from the Antarctic lake communities where only 40% could be placed in a phylogenetic tree, Genome Constellation improves taxa assignment to 61%. It revealed six clusters derived from new bacterial phyla and 63 new giant viruses, 3 of which missed by the traditional marker-based approach. In summary, we demonstrate that Genome Constellation can tackle the computational and algorithmic challenges in large-scale taxonomy analyses in metagenomics.
format Article in Journal/Newspaper
author Wang, Zhong
Ho, Harrison
Egan, Rob
Yao, Shijie
Kang, Dongwan
Froula, Jeff
Sevim, Volkan
Schulz, Frederik
Shay, Jackie
Macklin, Derek
McCue, Kayla
Orsini, Rachel
Barich, Daniel
Sedlacek, Christopher
Li, Wei
Morgan-Kiss, Rachael
Woyke, Tanja
Slonczewski, Joan
author_facet Wang, Zhong
Ho, Harrison
Egan, Rob
Yao, Shijie
Kang, Dongwan
Froula, Jeff
Sevim, Volkan
Schulz, Frederik
Shay, Jackie
Macklin, Derek
McCue, Kayla
Orsini, Rachel
Barich, Daniel
Sedlacek, Christopher
Li, Wei
Morgan-Kiss, Rachael
Woyke, Tanja
Slonczewski, Joan
author_sort Wang, Zhong
title A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome
title_short A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome
title_full A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome
title_fullStr A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome
title_full_unstemmed A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome
title_sort new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome
publisher eScholarship, University of California
publishDate 2019
url https://escholarship.org/uc/item/5jx0c4hg
geographic Antarctic
The Antarctic
geographic_facet Antarctic
The Antarctic
genre Antarc*
Antarctic
genre_facet Antarc*
Antarctic
op_relation qt5jx0c4hg
https://escholarship.org/uc/item/5jx0c4hg
op_rights public
_version_ 1776205227456724992