Fast-HBR: Fast hash based duplicate read remover

The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate rea...

Full description

Bibliographic Details
Published in:	Bioinformation
Main Authors:	Altayyar, Sami, Artoli, Abdel Monim
Format:	Text
Language:	English
Published:	Biomedical Informatics 2022
Subjects:	Research Article sami
Online Access:	http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ https://doi.org/10.6026/97320630018036

id	ftpubmed:oai:pubmedcentral.nih.gov:9200608
record_format	openpolar
spelling	ftpubmed:oai:pubmedcentral.nih.gov:9200608 2023-05-15T18:11:46+02:00 Fast-HBR: Fast hash based duplicate read remover Altayyar, Sami Artoli, Abdel Monim 2022-01-31 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ https://doi.org/10.6026/97320630018036 en eng Biomedical Informatics http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ http://dx.doi.org/10.6026/97320630018036 © 2022 Biomedical Informatics https://creativecommons.org/licenses/by/3.0/This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License. CC-BY Bioinformation Research Article Text 2022 ftpubmed https://doi.org/10.6026/97320630018036 2022-07-10T00:29:45Z The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR. Text sami PubMed Central (PMC) Bioinformation 18 1 36 40
institution	Open Polar
collection	PubMed Central (PMC)
op_collection_id	ftpubmed
language	English
topic	Research Article
spellingShingle	Research Article Altayyar, Sami Artoli, Abdel Monim Fast-HBR: Fast hash based duplicate read remover
topic_facet	Research Article
description	The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR.
format	Text
author	Altayyar, Sami Artoli, Abdel Monim
author_facet	Altayyar, Sami Artoli, Abdel Monim
author_sort	Altayyar, Sami
title	Fast-HBR: Fast hash based duplicate read remover
title_short	Fast-HBR: Fast hash based duplicate read remover
title_full	Fast-HBR: Fast hash based duplicate read remover
title_fullStr	Fast-HBR: Fast hash based duplicate read remover
title_full_unstemmed	Fast-HBR: Fast hash based duplicate read remover
title_sort	fast-hbr: fast hash based duplicate read remover
publisher	Biomedical Informatics
publishDate	2022
url	http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ https://doi.org/10.6026/97320630018036
genre	sami
genre_facet	sami
op_source	Bioinformation
op_relation	http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ http://dx.doi.org/10.6026/97320630018036
op_rights	© 2022 Biomedical Informatics https://creativecommons.org/licenses/by/3.0/This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.
op_rightsnorm	CC-BY
op_doi	https://doi.org/10.6026/97320630018036
container_title	Bioinformation
container_volume	18
container_issue	1
container_start_page	36
op_container_end_page	40
_version_	1766184398085947392

Fast-HBR: Fast hash based duplicate read remover

Similar Items