Fast-HBR: Fast hash based duplicate read remover
The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate rea...
Published in: | Bioinformation |
---|---|
Main Authors: | , |
Format: | Text |
Language: | English |
Published: |
Biomedical Informatics
2022
|
Subjects: | |
Online Access: | http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ https://doi.org/10.6026/97320630018036 |
id |
ftpubmed:oai:pubmedcentral.nih.gov:9200608 |
---|---|
record_format |
openpolar |
spelling |
ftpubmed:oai:pubmedcentral.nih.gov:9200608 2023-05-15T18:11:46+02:00 Fast-HBR: Fast hash based duplicate read remover Altayyar, Sami Artoli, Abdel Monim 2022-01-31 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ https://doi.org/10.6026/97320630018036 en eng Biomedical Informatics http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ http://dx.doi.org/10.6026/97320630018036 © 2022 Biomedical Informatics https://creativecommons.org/licenses/by/3.0/This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License. CC-BY Bioinformation Research Article Text 2022 ftpubmed https://doi.org/10.6026/97320630018036 2022-07-10T00:29:45Z The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR. Text sami PubMed Central (PMC) Bioinformation 18 1 36 40 |
institution |
Open Polar |
collection |
PubMed Central (PMC) |
op_collection_id |
ftpubmed |
language |
English |
topic |
Research Article |
spellingShingle |
Research Article Altayyar, Sami Artoli, Abdel Monim Fast-HBR: Fast hash based duplicate read remover |
topic_facet |
Research Article |
description |
The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR. |
format |
Text |
author |
Altayyar, Sami Artoli, Abdel Monim |
author_facet |
Altayyar, Sami Artoli, Abdel Monim |
author_sort |
Altayyar, Sami |
title |
Fast-HBR: Fast hash based duplicate read remover |
title_short |
Fast-HBR: Fast hash based duplicate read remover |
title_full |
Fast-HBR: Fast hash based duplicate read remover |
title_fullStr |
Fast-HBR: Fast hash based duplicate read remover |
title_full_unstemmed |
Fast-HBR: Fast hash based duplicate read remover |
title_sort |
fast-hbr: fast hash based duplicate read remover |
publisher |
Biomedical Informatics |
publishDate |
2022 |
url |
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ https://doi.org/10.6026/97320630018036 |
genre |
sami |
genre_facet |
sami |
op_source |
Bioinformation |
op_relation |
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ http://dx.doi.org/10.6026/97320630018036 |
op_rights |
© 2022 Biomedical Informatics https://creativecommons.org/licenses/by/3.0/This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License. |
op_rightsnorm |
CC-BY |
op_doi |
https://doi.org/10.6026/97320630018036 |
container_title |
Bioinformation |
container_volume |
18 |
container_issue |
1 |
container_start_page |
36 |
op_container_end_page |
40 |
_version_ |
1766184398085947392 |