A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA
Summary: In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ∼10 000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that ar...
Published in: | Bioinformatics |
---|---|
Main Authors: | , |
Format: | Text |
Language: | English |
Published: |
Oxford University Press
2014
|
Subjects: | |
Online Access: | http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866550 http://www.ncbi.nlm.nih.gov/pubmed/23658416 https://doi.org/10.1093/bioinformatics/btt254 |
_version_ | 1821682214989987840 |
---|---|
author | Huson, Daniel H. Xie, Chao |
author_facet | Huson, Daniel H. Xie, Chao |
author_sort | Huson, Daniel H. |
collection | PubMed Central (PMC) |
container_issue | 1 |
container_start_page | 38 |
container_title | Bioinformatics |
container_volume | 30 |
description | Summary: In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ∼10 000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLASTX. PAUDA requires <80 CPU hours to analyze a dataset of 246 million Illumina DNA reads from permafrost soil for which a previous BLASTX analysis (on a subset of 176 million reads) reportedly required 800 000 CPU hours, leading to the same clustering of samples by functional profiles. |
format | Text |
genre | permafrost |
genre_facet | permafrost |
id | ftpubmed:oai:pubmedcentral.nih.gov:3866550 |
institution | Open Polar |
language | English |
op_collection_id | ftpubmed |
op_container_end_page | 39 |
op_doi | https://doi.org/10.1093/bioinformatics/btt254 |
op_relation | http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866550 http://www.ncbi.nlm.nih.gov/pubmed/23658416 http://dx.doi.org/10.1093/bioinformatics/btt254 |
op_rights | © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
op_rightsnorm | CC-BY |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | openpolar |
spelling | ftpubmed:oai:pubmedcentral.nih.gov:3866550 2025-01-17T00:16:04+00:00 A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA Huson, Daniel H. Xie, Chao 2014-01-01 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866550 http://www.ncbi.nlm.nih.gov/pubmed/23658416 https://doi.org/10.1093/bioinformatics/btt254 en eng Oxford University Press http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866550 http://www.ncbi.nlm.nih.gov/pubmed/23658416 http://dx.doi.org/10.1093/bioinformatics/btt254 © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. CC-BY Hitseq Papers Text 2014 ftpubmed https://doi.org/10.1093/bioinformatics/btt254 2013-12-22T01:52:42Z Summary: In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ∼10 000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLASTX. PAUDA requires <80 CPU hours to analyze a dataset of 246 million Illumina DNA reads from permafrost soil for which a previous BLASTX analysis (on a subset of 176 million reads) reportedly required 800 000 CPU hours, leading to the same clustering of samples by functional profiles. Text permafrost PubMed Central (PMC) Bioinformatics 30 1 38 39 |
spellingShingle | Hitseq Papers Huson, Daniel H. Xie, Chao A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA |
title | A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA |
title_full | A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA |
title_fullStr | A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA |
title_full_unstemmed | A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA |
title_short | A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA |
title_sort | poor man’s blastx—high-throughput metagenomic protein database search using pauda |
topic | Hitseq Papers |
topic_facet | Hitseq Papers |
url | http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866550 http://www.ncbi.nlm.nih.gov/pubmed/23658416 https://doi.org/10.1093/bioinformatics/btt254 |