An automated workflow to assess completeness and curate GenBank for environmental DNA metabarcoding: The marine fish assemblage as case study

Abstract To successfully implement environmental DNA‐based (eDNA) diversity monitoring, the completeness and accuracy of reference databases used for taxonomic assignment of eDNA sequences are among the challenges to be tackled. Here, we have developed a workflow that evaluates the current status of...

Full description

Bibliographic Details
Published in:Environmental DNA
Main Authors: Claver, Cristina, Canals, Oriol, de Amézaga, Leire G., Mendibil, Iñaki, Rodriguez‐Ezpeleta, Naiara
Other Authors: Eusko Jaurlaritza, H2020 European Institute of Innovation and Technology, Hezkuntza, Hizkuntza Politika Eta Kultura Saila, Eusko Jaurlaritza
Format: Article in Journal/Newspaper
Language:English
Published: Wiley 2023
Subjects:
Online Access:http://dx.doi.org/10.1002/edn3.433
https://onlinelibrary.wiley.com/doi/pdf/10.1002/edn3.433
id crwiley:10.1002/edn3.433
record_format openpolar
spelling crwiley:10.1002/edn3.433 2024-09-15T18:25:28+00:00 An automated workflow to assess completeness and curate GenBank for environmental DNA metabarcoding: The marine fish assemblage as case study Claver, Cristina Canals, Oriol de Amézaga, Leire G. Mendibil, Iñaki Rodriguez‐Ezpeleta, Naiara Eusko Jaurlaritza H2020 European Institute of Innovation and Technology Hezkuntza, Hizkuntza Politika Eta Kultura Saila, Eusko Jaurlaritza 2023 http://dx.doi.org/10.1002/edn3.433 https://onlinelibrary.wiley.com/doi/pdf/10.1002/edn3.433 en eng Wiley http://creativecommons.org/licenses/by-nc-nd/4.0/ Environmental DNA volume 5, issue 4, page 634-647 ISSN 2637-4943 2637-4943 journal-article 2023 crwiley https://doi.org/10.1002/edn3.433 2024-09-03T04:24:52Z Abstract To successfully implement environmental DNA‐based (eDNA) diversity monitoring, the completeness and accuracy of reference databases used for taxonomic assignment of eDNA sequences are among the challenges to be tackled. Here, we have developed a workflow that evaluates the current status of GenBank for marine fishes. For a given combination of species and barcodes, a gap analysis is performed and potentially erroneous sequences are identified. Our gap analysis based on the four most used genes (cytochrome c oxidase subunit 1, 12S rRNA, 16S rRNA, and cytochrome b) for fish eDNA metabarcoding found that COI, the universal choice for metazoans, is the gene covering the highest number of Northeast Atlantic marine fishes (70%), while 12S rRNA, the preferred region for fish‐targeting studies, only covers about 50% of the species. The presence of too close and too distant barcode sequences as expected by their taxonomic classification confirms the existence of erroneous sequences in GenBank that our workflow can detect and eliminate. Comparing taxonomic assignments of real marine eDNA samples with raw and clean reference databases for the most used 12S rRNA barcodes ( teleo and MiFish ), we confirmed that both barcodes perform differently and demonstrated that the application of the database cleaning workflow can result in drastic changes in community composition. Besides providing a tool for reference database curation, this study confirms the need to increase 12S rRNA reference sequences for European marine fishes and evidences the dangers of taxonomic assignments by directly querying GenBank. We have developed a workflow that evaluates the current status of GenBank for marine fishes. For a given combination of species and barcodes, a gap analysis is performed and potentially erroneous sequences are identified. Article in Journal/Newspaper Northeast Atlantic Wiley Online Library Environmental DNA 5 4 634 647
institution Open Polar
collection Wiley Online Library
op_collection_id crwiley
language English
description Abstract To successfully implement environmental DNA‐based (eDNA) diversity monitoring, the completeness and accuracy of reference databases used for taxonomic assignment of eDNA sequences are among the challenges to be tackled. Here, we have developed a workflow that evaluates the current status of GenBank for marine fishes. For a given combination of species and barcodes, a gap analysis is performed and potentially erroneous sequences are identified. Our gap analysis based on the four most used genes (cytochrome c oxidase subunit 1, 12S rRNA, 16S rRNA, and cytochrome b) for fish eDNA metabarcoding found that COI, the universal choice for metazoans, is the gene covering the highest number of Northeast Atlantic marine fishes (70%), while 12S rRNA, the preferred region for fish‐targeting studies, only covers about 50% of the species. The presence of too close and too distant barcode sequences as expected by their taxonomic classification confirms the existence of erroneous sequences in GenBank that our workflow can detect and eliminate. Comparing taxonomic assignments of real marine eDNA samples with raw and clean reference databases for the most used 12S rRNA barcodes ( teleo and MiFish ), we confirmed that both barcodes perform differently and demonstrated that the application of the database cleaning workflow can result in drastic changes in community composition. Besides providing a tool for reference database curation, this study confirms the need to increase 12S rRNA reference sequences for European marine fishes and evidences the dangers of taxonomic assignments by directly querying GenBank. We have developed a workflow that evaluates the current status of GenBank for marine fishes. For a given combination of species and barcodes, a gap analysis is performed and potentially erroneous sequences are identified.
author2 Eusko Jaurlaritza
H2020 European Institute of Innovation and Technology
Hezkuntza, Hizkuntza Politika Eta Kultura Saila, Eusko Jaurlaritza
format Article in Journal/Newspaper
author Claver, Cristina
Canals, Oriol
de Amézaga, Leire G.
Mendibil, Iñaki
Rodriguez‐Ezpeleta, Naiara
spellingShingle Claver, Cristina
Canals, Oriol
de Amézaga, Leire G.
Mendibil, Iñaki
Rodriguez‐Ezpeleta, Naiara
An automated workflow to assess completeness and curate GenBank for environmental DNA metabarcoding: The marine fish assemblage as case study
author_facet Claver, Cristina
Canals, Oriol
de Amézaga, Leire G.
Mendibil, Iñaki
Rodriguez‐Ezpeleta, Naiara
author_sort Claver, Cristina
title An automated workflow to assess completeness and curate GenBank for environmental DNA metabarcoding: The marine fish assemblage as case study
title_short An automated workflow to assess completeness and curate GenBank for environmental DNA metabarcoding: The marine fish assemblage as case study
title_full An automated workflow to assess completeness and curate GenBank for environmental DNA metabarcoding: The marine fish assemblage as case study
title_fullStr An automated workflow to assess completeness and curate GenBank for environmental DNA metabarcoding: The marine fish assemblage as case study
title_full_unstemmed An automated workflow to assess completeness and curate GenBank for environmental DNA metabarcoding: The marine fish assemblage as case study
title_sort automated workflow to assess completeness and curate genbank for environmental dna metabarcoding: the marine fish assemblage as case study
publisher Wiley
publishDate 2023
url http://dx.doi.org/10.1002/edn3.433
https://onlinelibrary.wiley.com/doi/pdf/10.1002/edn3.433
genre Northeast Atlantic
genre_facet Northeast Atlantic
op_source Environmental DNA
volume 5, issue 4, page 634-647
ISSN 2637-4943 2637-4943
op_rights http://creativecommons.org/licenses/by-nc-nd/4.0/
op_doi https://doi.org/10.1002/edn3.433
container_title Environmental DNA
container_volume 5
container_issue 4
container_start_page 634
op_container_end_page 647
_version_ 1810465977957089280