Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes

Background: As most viruses remain uncultivated, metagenomics is currently the main method for virus discovery. Detecting viruses in metagenomic data is not trivial. In the past few years, many bioinformatic virus identification tools have been developed for this task, making it challenging to choos...

Full description

Bibliographic Details
Main Authors: Wu, Ling Yi, Wijesekara, Yasas, Piedade, Gonçalo J., Pappas, Nikolaos, Brussaard, Corina P.D., Dutilh, Bas E.
Other Authors: Sub Bioinformatics, Theoretical Biology and Bioinformatics
Format: Article in Journal/Newspaper
Language:English
Published: 2024
Subjects:
Online Access:https://dspace.library.uu.nl/handle/1874/438498
id ftunivutrecht:oai:dspace.library.uu.nl:1874/438498
record_format openpolar
spelling ftunivutrecht:oai:dspace.library.uu.nl:1874/438498 2024-05-19T07:30:20+00:00 Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes Wu, Ling Yi Wijesekara, Yasas Piedade, Gonçalo J. Pappas, Nikolaos Brussaard, Corina P.D. Dutilh, Bas E. Sub Bioinformatics Theoretical Biology and Bioinformatics 2024-04-15 application/pdf https://dspace.library.uu.nl/handle/1874/438498 en eng 1474-7596 https://dspace.library.uu.nl/handle/1874/438498 info:eu-repo/semantics/OpenAccess Ecology Evolution Behavior and Systematics Genetics Cell Biology Article 2024 ftunivutrecht 2024-04-29T15:16:39Z Background: As most viruses remain uncultivated, metagenomics is currently the main method for virus discovery. Detecting viruses in metagenomic data is not trivial. In the past few years, many bioinformatic virus identification tools have been developed for this task, making it challenging to choose the right tools, parameters, and cutoffs. As all these tools measure different biological signals, and use different algorithms and training and reference databases, it is imperative to conduct an independent benchmarking to give users objective guidance. Results: We compare the performance of nine state-of-the-art virus identification tools in thirteen modes on eight paired viral and microbial datasets from three distinct biomes, including a new complex dataset from Antarctic coastal waters. The tools have highly variable true positive rates (0–97%) and false positive rates (0–30%). PPR-Meta best distinguishes viral from microbial contigs, followed by DeepVirFinder, VirSorter2, and VIBRANT. Different tools identify different subsets of the benchmarking data and all tools, except for Sourmash, find unique viral contigs. Performance of tools improved with adjusted parameter cutoffs, indicating that adjustment of parameter cutoffs before usage should be considered. Conclusions: Together, our independent benchmarking facilitates selecting choices of bioinformatic virus identification tools and gives suggestions for parameter adjustments to viromics researchers. Article in Journal/Newspaper Antarc* Antarctic Utrecht University Repository
institution Open Polar
collection Utrecht University Repository
op_collection_id ftunivutrecht
language English
topic Ecology
Evolution
Behavior and Systematics
Genetics
Cell Biology
spellingShingle Ecology
Evolution
Behavior and Systematics
Genetics
Cell Biology
Wu, Ling Yi
Wijesekara, Yasas
Piedade, Gonçalo J.
Pappas, Nikolaos
Brussaard, Corina P.D.
Dutilh, Bas E.
Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
topic_facet Ecology
Evolution
Behavior and Systematics
Genetics
Cell Biology
description Background: As most viruses remain uncultivated, metagenomics is currently the main method for virus discovery. Detecting viruses in metagenomic data is not trivial. In the past few years, many bioinformatic virus identification tools have been developed for this task, making it challenging to choose the right tools, parameters, and cutoffs. As all these tools measure different biological signals, and use different algorithms and training and reference databases, it is imperative to conduct an independent benchmarking to give users objective guidance. Results: We compare the performance of nine state-of-the-art virus identification tools in thirteen modes on eight paired viral and microbial datasets from three distinct biomes, including a new complex dataset from Antarctic coastal waters. The tools have highly variable true positive rates (0–97%) and false positive rates (0–30%). PPR-Meta best distinguishes viral from microbial contigs, followed by DeepVirFinder, VirSorter2, and VIBRANT. Different tools identify different subsets of the benchmarking data and all tools, except for Sourmash, find unique viral contigs. Performance of tools improved with adjusted parameter cutoffs, indicating that adjustment of parameter cutoffs before usage should be considered. Conclusions: Together, our independent benchmarking facilitates selecting choices of bioinformatic virus identification tools and gives suggestions for parameter adjustments to viromics researchers.
author2 Sub Bioinformatics
Theoretical Biology and Bioinformatics
format Article in Journal/Newspaper
author Wu, Ling Yi
Wijesekara, Yasas
Piedade, Gonçalo J.
Pappas, Nikolaos
Brussaard, Corina P.D.
Dutilh, Bas E.
author_facet Wu, Ling Yi
Wijesekara, Yasas
Piedade, Gonçalo J.
Pappas, Nikolaos
Brussaard, Corina P.D.
Dutilh, Bas E.
author_sort Wu, Ling Yi
title Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
title_short Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
title_full Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
title_fullStr Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
title_full_unstemmed Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
title_sort benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
publishDate 2024
url https://dspace.library.uu.nl/handle/1874/438498
genre Antarc*
Antarctic
genre_facet Antarc*
Antarctic
op_relation 1474-7596
https://dspace.library.uu.nl/handle/1874/438498
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1799485499933458432