A novel bioinformatics tool for phylogenetic classification of genomic sequence fragments derived from mixed genomes of uncultured environmental microbes

A Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional complex data on a two-dimensional map. We modified the conventional SOM to genome informatics, making the learning process and resulting map independent of the order of data input, and developed a novel...

Full description

Bibliographic Details
Main Authors: Abe,Takashi, Sugawara,Hideaki, Kanaya,Shigehiko, Ikemura,Toshimichi
Format: Report
Language:English
Published: Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, and The Graduate University for Advanced Studies (Sokendai)/Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, and The Graduate University for Advanced Studies (Sokendai)/Department of Bioinformatics and Genomes, Graduate School of Information Science, Nara Institute of Science and Technology/The Graduate University for Advanced Studies (Sokendai), Hayama Center for Advanced Research 2006
Subjects:
Online Access:https://nipr.repo.nii.ac.jp/?action=repository_uri&item_id=6264
http://id.nii.ac.jp/1291/00006264/
https://nipr.repo.nii.ac.jp/?action=repository_action_common_download&item_id=6264&item_no=1&attribute_id=18&file_no=1
Description
Summary:A Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional complex data on a two-dimensional map. We modified the conventional SOM to genome informatics, making the learning process and resulting map independent of the order of data input, and developed a novel bioinformatics tool for phylogenetic classification of sequence fragments obtained from pooled genome samples of microorganisms in environmental samples allowing visualization of microbial diversity and the relative abundance of microorganisms on a map. First we constructed SOMs of tri- and tetranucleotide frequencies from a total of 3.3-Gb of sequences derived using 113 prokaryotic and 13 eukaryotic genomes, for which complete genome sequences are available. SOMs classified the 330000 10-kb sequences from these genomes mainly according to species without information on the species. Importantly, classification was possible without orthologous sequence sets and thus was useful for studies of novel sequences from poorly characterized species such as those living only under extreme conditions and which have attracted wide scientific and industrial attention. Using the SOM method, sequences that were derived from a single genome but cloned independently in a metagenome library could be reassociated in silico. The usefulness of SOMs in metagenome studies was also discussed.