Fast-HBR: Fast hash based duplicate read remover

The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate rea...

Full description

Bibliographic Details
Published in:Bioinformation
Main Authors: Altayyar, Sami, Artoli, Abdel Monim
Format: Text
Language:English
Published: Biomedical Informatics 2022
Subjects:
Online Access:http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/
https://doi.org/10.6026/97320630018036
id ftpubmed:oai:pubmedcentral.nih.gov:9200608
record_format openpolar
spelling ftpubmed:oai:pubmedcentral.nih.gov:9200608 2023-05-15T18:11:46+02:00 Fast-HBR: Fast hash based duplicate read remover Altayyar, Sami Artoli, Abdel Monim 2022-01-31 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ https://doi.org/10.6026/97320630018036 en eng Biomedical Informatics http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ http://dx.doi.org/10.6026/97320630018036 © 2022 Biomedical Informatics https://creativecommons.org/licenses/by/3.0/This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License. CC-BY Bioinformation Research Article Text 2022 ftpubmed https://doi.org/10.6026/97320630018036 2022-07-10T00:29:45Z The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR. Text sami PubMed Central (PMC) Bioinformation 18 1 36 40
institution Open Polar
collection PubMed Central (PMC)
op_collection_id ftpubmed
language English
topic Research Article
spellingShingle Research Article
Altayyar, Sami
Artoli, Abdel Monim
Fast-HBR: Fast hash based duplicate read remover
topic_facet Research Article
description The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR.
format Text
author Altayyar, Sami
Artoli, Abdel Monim
author_facet Altayyar, Sami
Artoli, Abdel Monim
author_sort Altayyar, Sami
title Fast-HBR: Fast hash based duplicate read remover
title_short Fast-HBR: Fast hash based duplicate read remover
title_full Fast-HBR: Fast hash based duplicate read remover
title_fullStr Fast-HBR: Fast hash based duplicate read remover
title_full_unstemmed Fast-HBR: Fast hash based duplicate read remover
title_sort fast-hbr: fast hash based duplicate read remover
publisher Biomedical Informatics
publishDate 2022
url http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/
https://doi.org/10.6026/97320630018036
genre sami
genre_facet sami
op_source Bioinformation
op_relation http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/
http://dx.doi.org/10.6026/97320630018036
op_rights © 2022 Biomedical Informatics
https://creativecommons.org/licenses/by/3.0/This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.
op_rightsnorm CC-BY
op_doi https://doi.org/10.6026/97320630018036
container_title Bioinformation
container_volume 18
container_issue 1
container_start_page 36
op_container_end_page 40
_version_ 1766184398085947392