Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies

Abstract How does the evolution of bioinformatics tools impact the biological interpretation of high‐throughput sequencing datasets? For eukaryotic metabarcoding studies, in particular, researchers often rely on tools originally developed for the analysis of 16S ribosomal RNA (rRNA) datasets. Such t...

Full description

Bibliographic Details
Published in:Environmental DNA
Main Authors: De Santiago, Alejandro, Pereira, Tiago José, Mincks, Sarah L., Bik, Holly M.
Other Authors: North Pacific Research Board, Gulf of Mexico Research Initiative
Format: Article in Journal/Newspaper
Language:English
Published: Wiley 2021
Subjects:
Online Access:http://dx.doi.org/10.1002/edn3.255
https://onlinelibrary.wiley.com/doi/pdf/10.1002/edn3.255
https://onlinelibrary.wiley.com/doi/full-xml/10.1002/edn3.255
id crwiley:10.1002/edn3.255
record_format openpolar
spelling crwiley:10.1002/edn3.255 2024-06-23T07:50:43+00:00 Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies De Santiago, Alejandro Pereira, Tiago José Mincks, Sarah L. Bik, Holly M. North Pacific Research Board Gulf of Mexico Research Initiative 2021 http://dx.doi.org/10.1002/edn3.255 https://onlinelibrary.wiley.com/doi/pdf/10.1002/edn3.255 https://onlinelibrary.wiley.com/doi/full-xml/10.1002/edn3.255 en eng Wiley http://creativecommons.org/licenses/by-nc/4.0/ Environmental DNA volume 4, issue 2, page 363-384 ISSN 2637-4943 2637-4943 journal-article 2021 crwiley https://doi.org/10.1002/edn3.255 2024-06-06T04:23:50Z Abstract How does the evolution of bioinformatics tools impact the biological interpretation of high‐throughput sequencing datasets? For eukaryotic metabarcoding studies, in particular, researchers often rely on tools originally developed for the analysis of 16S ribosomal RNA (rRNA) datasets. Such tools do not adequately account for the complexity of eukaryotic genomes, the ubiquity of intragenomic variation in eukaryotic metabarcoding loci, or the differential evolutionary rates observed across eukaryotic genes and taxa. Recently, metabarcoding workflows have shifted away from the use of operational taxonomic units (OTUs) toward delimitation of amplicon sequence variants (ASVs). We assessed how the choice of bioinformatics algorithm impacts the downstream biological conclusions that are drawn from eukaryotic 18S rRNA metabarcoding studies. We focused on four workflows including UCLUST and VSearch algorithms for OTU clustering, and DADA2 and Deblur algorithms for ASV delimitation. We used two 18S rRNA datasets to further evaluate whether dataset complexity had a major impact on the statistical trends and ecological metrics: a “high complexity” (HC) environmental dataset generated from community DNA in Arctic marine sediments, and a “low complexity” (LC) dataset representing individually barcoded nematodes. Our results indicate that ASV algorithms produce more biologically realistic metabarcoding outputs, with DADA2 being the most consistent and accurate pipeline regardless of dataset complexity. In contrast, OTU clustering algorithms inflate the metabarcoding‐derived estimates of biodiversity, consistently returning a high proportion of “rare” molecular operational taxonomic units (MOTUs) that appear to represent computational artifacts and sequencing errors. However, species‐specific MOTUs with high relative abundance are often recovered regardless of the bioinformatics approach. We also found high concordance across pipelines for downstream ecological analysis based on beta‐diversity and alpha‐diversity ... Article in Journal/Newspaper Arctic Wiley Online Library Arctic Environmental DNA 4 2 363 384
institution Open Polar
collection Wiley Online Library
op_collection_id crwiley
language English
description Abstract How does the evolution of bioinformatics tools impact the biological interpretation of high‐throughput sequencing datasets? For eukaryotic metabarcoding studies, in particular, researchers often rely on tools originally developed for the analysis of 16S ribosomal RNA (rRNA) datasets. Such tools do not adequately account for the complexity of eukaryotic genomes, the ubiquity of intragenomic variation in eukaryotic metabarcoding loci, or the differential evolutionary rates observed across eukaryotic genes and taxa. Recently, metabarcoding workflows have shifted away from the use of operational taxonomic units (OTUs) toward delimitation of amplicon sequence variants (ASVs). We assessed how the choice of bioinformatics algorithm impacts the downstream biological conclusions that are drawn from eukaryotic 18S rRNA metabarcoding studies. We focused on four workflows including UCLUST and VSearch algorithms for OTU clustering, and DADA2 and Deblur algorithms for ASV delimitation. We used two 18S rRNA datasets to further evaluate whether dataset complexity had a major impact on the statistical trends and ecological metrics: a “high complexity” (HC) environmental dataset generated from community DNA in Arctic marine sediments, and a “low complexity” (LC) dataset representing individually barcoded nematodes. Our results indicate that ASV algorithms produce more biologically realistic metabarcoding outputs, with DADA2 being the most consistent and accurate pipeline regardless of dataset complexity. In contrast, OTU clustering algorithms inflate the metabarcoding‐derived estimates of biodiversity, consistently returning a high proportion of “rare” molecular operational taxonomic units (MOTUs) that appear to represent computational artifacts and sequencing errors. However, species‐specific MOTUs with high relative abundance are often recovered regardless of the bioinformatics approach. We also found high concordance across pipelines for downstream ecological analysis based on beta‐diversity and alpha‐diversity ...
author2 North Pacific Research Board
Gulf of Mexico Research Initiative
format Article in Journal/Newspaper
author De Santiago, Alejandro
Pereira, Tiago José
Mincks, Sarah L.
Bik, Holly M.
spellingShingle De Santiago, Alejandro
Pereira, Tiago José
Mincks, Sarah L.
Bik, Holly M.
Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies
author_facet De Santiago, Alejandro
Pereira, Tiago José
Mincks, Sarah L.
Bik, Holly M.
author_sort De Santiago, Alejandro
title Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies
title_short Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies
title_full Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies
title_fullStr Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies
title_full_unstemmed Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies
title_sort dataset complexity impacts both motu delimitation and biodiversity estimates in eukaryotic 18s rrna metabarcoding studies
publisher Wiley
publishDate 2021
url http://dx.doi.org/10.1002/edn3.255
https://onlinelibrary.wiley.com/doi/pdf/10.1002/edn3.255
https://onlinelibrary.wiley.com/doi/full-xml/10.1002/edn3.255
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_source Environmental DNA
volume 4, issue 2, page 363-384
ISSN 2637-4943 2637-4943
op_rights http://creativecommons.org/licenses/by-nc/4.0/
op_doi https://doi.org/10.1002/edn3.255
container_title Environmental DNA
container_volume 4
container_issue 2
container_start_page 363
op_container_end_page 384
_version_ 1802641621620097024