DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets

Este artículo contiene 16 páginas, 5 figuras. DNA metabarcoding is broadly used in biodiversity studies encompassing a wide range of organisms. Erroneous amplicons, generated during amplification and sequencing procedures, constitute one of the major sources of concern for the interpretation of meta...

Full description

Bibliographic Details
Published in:PeerJ
Main Authors: Antich, Adrià, Palacín, Cruz, Turon, Xavier, Wangensteen, Owen S.
Format: Article in Journal/Newspaper
Language:English
Published: PeerJ 2022
Subjects:
Online Access:http://hdl.handle.net/10261/258251
id ftcsic:oai:digital.csic.es:10261/258251
record_format openpolar
spelling ftcsic:oai:digital.csic.es:10261/258251 2024-02-11T10:09:29+01:00 DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets Antich, Adrià Palacín, Cruz Turon, Xavier Wangensteen, Owen S. 2022 http://hdl.handle.net/10261/258251 en eng PeerJ Publisher's version http://doi.org/10.7717/peerj.12758 Sí PeerJ 10 : e12758 (2022) http://hdl.handle.net/10261/258251 2167-8359 open Metabarcoding Bioinformatic pipelines Metaphylogeography Entropy correction Denoising algorithms Coding markers artículo http://purl.org/coar/resource_type/c_6501 2022 ftcsic https://doi.org/10.7717/peerj.12758 2024-01-16T11:17:37Z Este artículo contiene 16 páginas, 5 figuras. DNA metabarcoding is broadly used in biodiversity studies encompassing a wide range of organisms. Erroneous amplicons, generated during amplification and sequencing procedures, constitute one of the major sources of concern for the interpretation of metabarcoding results. Several denoising programs have been implemented to detect and eliminate these errors. However, almost all denoising software currently available has been designed to process non-coding ribosomal sequences, most notably prokaryotic 16S rDNA. The growing number of metabarcoding studies using coding markers such as COI or RuBisCO demands a re-assessment and calibration of denoising algorithms. Here we present DnoisE, the first denoising program designed to detect erroneous reads and merge them with the correct ones using information from the natural variability (entropy) associated to each codon position in coding barcodes. We have developed an open-source software using a modified version of the UNOISE algorithm. DnoisE implements different merging procedures as options, and can incorporate codon entropy information either retrieved from the data or supplied by the user. In addition, the algorithm of DnoisE is parallelizable, greatly reducing runtimes on computer clusters. Our program also allows different input file formats, so it can be readily incorporated into existing metabarcoding pipelines. This research was funded by the projects PopCOmics (CTM2017-88080, MCIN/AEI/10.13039/ 501100011033 and ``ERDF A way of making Europe'', EU), MARGECH (PID2020- 118550RB, MCIN/AEI/10.13039/501100011033), and BigPark (OAPN, 2462/2017) from the Spanish Government. The publication charges for this article have been funded by a grant from the publication fund of UiT The Arctic University of Norway. Peer reviewed Article in Journal/Newspaper Arctic University of Norway UiT The Arctic University of Norway Digital.CSIC (Spanish National Research Council) Arctic Norway PeerJ 10 e12758
institution Open Polar
collection Digital.CSIC (Spanish National Research Council)
op_collection_id ftcsic
language English
topic Metabarcoding
Bioinformatic pipelines
Metaphylogeography
Entropy correction
Denoising algorithms
Coding markers
spellingShingle Metabarcoding
Bioinformatic pipelines
Metaphylogeography
Entropy correction
Denoising algorithms
Coding markers
Antich, Adrià
Palacín, Cruz
Turon, Xavier
Wangensteen, Owen S.
DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets
topic_facet Metabarcoding
Bioinformatic pipelines
Metaphylogeography
Entropy correction
Denoising algorithms
Coding markers
description Este artículo contiene 16 páginas, 5 figuras. DNA metabarcoding is broadly used in biodiversity studies encompassing a wide range of organisms. Erroneous amplicons, generated during amplification and sequencing procedures, constitute one of the major sources of concern for the interpretation of metabarcoding results. Several denoising programs have been implemented to detect and eliminate these errors. However, almost all denoising software currently available has been designed to process non-coding ribosomal sequences, most notably prokaryotic 16S rDNA. The growing number of metabarcoding studies using coding markers such as COI or RuBisCO demands a re-assessment and calibration of denoising algorithms. Here we present DnoisE, the first denoising program designed to detect erroneous reads and merge them with the correct ones using information from the natural variability (entropy) associated to each codon position in coding barcodes. We have developed an open-source software using a modified version of the UNOISE algorithm. DnoisE implements different merging procedures as options, and can incorporate codon entropy information either retrieved from the data or supplied by the user. In addition, the algorithm of DnoisE is parallelizable, greatly reducing runtimes on computer clusters. Our program also allows different input file formats, so it can be readily incorporated into existing metabarcoding pipelines. This research was funded by the projects PopCOmics (CTM2017-88080, MCIN/AEI/10.13039/ 501100011033 and ``ERDF A way of making Europe'', EU), MARGECH (PID2020- 118550RB, MCIN/AEI/10.13039/501100011033), and BigPark (OAPN, 2462/2017) from the Spanish Government. The publication charges for this article have been funded by a grant from the publication fund of UiT The Arctic University of Norway. Peer reviewed
format Article in Journal/Newspaper
author Antich, Adrià
Palacín, Cruz
Turon, Xavier
Wangensteen, Owen S.
author_facet Antich, Adrià
Palacín, Cruz
Turon, Xavier
Wangensteen, Owen S.
author_sort Antich, Adrià
title DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets
title_short DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets
title_full DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets
title_fullStr DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets
title_full_unstemmed DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets
title_sort dnoise: distance denoising by entropy. an open-source parallelizable alternative for denoising sequence datasets
publisher PeerJ
publishDate 2022
url http://hdl.handle.net/10261/258251
geographic Arctic
Norway
geographic_facet Arctic
Norway
genre Arctic University of Norway
UiT The Arctic University of Norway
genre_facet Arctic University of Norway
UiT The Arctic University of Norway
op_relation Publisher's version
http://doi.org/10.7717/peerj.12758

PeerJ 10 : e12758 (2022)
http://hdl.handle.net/10261/258251
2167-8359
op_rights open
op_doi https://doi.org/10.7717/peerj.12758
container_title PeerJ
container_volume 10
container_start_page e12758
_version_ 1790609405998268416