Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes

Background As most viruses remain uncultivated, metagenomics is currently the main method for virus discovery. Detecting viruses in metagenomic data is not trivial. In the past few years, many bioinformatic virus identification tools have been developed for this task, making it challenging to choose...

Full description

Bibliographic Details
Published in:Genome Biology
Main Authors: Wu, L.-Y., Wijesekara, Y., Piedade, G., Pappas, N., Brussaard, C.P.D., Dutilh, B.E.
Format: Article in Journal/Newspaper
Language:English
Published: 2024
Subjects:
Online Access:https://www.vliz.be/imisdocs/publications/92/408192.pdf
id ftnioz:oai:imis.nioz.nl:393599
record_format openpolar
spelling ftnioz:oai:imis.nioz.nl:393599 2024-09-15T17:43:10+00:00 Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes Wu, L.-Y. Wijesekara, Y. Piedade, G. Pappas, N. Brussaard, C.P.D. Dutilh, B.E. 2024 application/pdf https://www.vliz.be/imisdocs/publications/92/408192.pdf en eng info:eu-repo/semantics/altIdentifier/doi/doi.org/10.1186/s13059-024-03236-4 https://www.vliz.be/imisdocs/publications/92/408192.pdf info:eu-repo/semantics/openAccess %3Ci%3EGenome+Biol.+25%281%29%3C%2Fi%3E%3A+97.+%3Ca+href%3D%22https%3A%2F%2Fdx.doi.org%2F10.1186%2Fs13059-024-03236-4%22+target%3D%22_blank%22%3Ehttps%3A%2F%2Fdx.doi.org%2F10.1186%2Fs13059-024-03236-4%3C%2Fa%3E info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion 2024 ftnioz https://doi.org/10.1186/s13059-024-03236-4 2024-08-05T23:36:41Z Background As most viruses remain uncultivated, metagenomics is currently the main method for virus discovery. Detecting viruses in metagenomic data is not trivial. In the past few years, many bioinformatic virus identification tools have been developed for this task, making it challenging to choose the right tools, parameters, and cutoffs. As all these tools measure different biological signals, and use different algorithms and training and reference databases, it is imperative to conduct an independent benchmarking to give users objective guidance. Results We compare the performance of nine state-of-the-art virus identification tools in thirteen modes on eight paired viral and microbial datasets from three distinct biomes, including a new complex dataset from Antarctic coastal waters. The tools have highly variable true positive rates (0–97%) and false positive rates (0–30%). PPR-Meta best distinguishes viral from microbial contigs, followed by DeepVirFinder, VirSorter2, and VIBRANT. Different tools identify different subsets of the benchmarking data and all tools, except for Sourmash, find unique viral contigs. Performance of tools improved with adjusted parameter cutoffs, indicating that adjustment of parameter cutoffs before usage should be considered. Conclusions Together, our independent benchmarking facilitates selecting choices of bioinformatic virus identification tools and gives suggestions for parameter adjustments to viromics researchers. Article in Journal/Newspaper Antarc* Antarctic NIOZ Repository (Royal Netherlands Institute for Sea Research) Genome Biology 25 1
institution Open Polar
collection NIOZ Repository (Royal Netherlands Institute for Sea Research)
op_collection_id ftnioz
language English
description Background As most viruses remain uncultivated, metagenomics is currently the main method for virus discovery. Detecting viruses in metagenomic data is not trivial. In the past few years, many bioinformatic virus identification tools have been developed for this task, making it challenging to choose the right tools, parameters, and cutoffs. As all these tools measure different biological signals, and use different algorithms and training and reference databases, it is imperative to conduct an independent benchmarking to give users objective guidance. Results We compare the performance of nine state-of-the-art virus identification tools in thirteen modes on eight paired viral and microbial datasets from three distinct biomes, including a new complex dataset from Antarctic coastal waters. The tools have highly variable true positive rates (0–97%) and false positive rates (0–30%). PPR-Meta best distinguishes viral from microbial contigs, followed by DeepVirFinder, VirSorter2, and VIBRANT. Different tools identify different subsets of the benchmarking data and all tools, except for Sourmash, find unique viral contigs. Performance of tools improved with adjusted parameter cutoffs, indicating that adjustment of parameter cutoffs before usage should be considered. Conclusions Together, our independent benchmarking facilitates selecting choices of bioinformatic virus identification tools and gives suggestions for parameter adjustments to viromics researchers.
format Article in Journal/Newspaper
author Wu, L.-Y.
Wijesekara, Y.
Piedade, G.
Pappas, N.
Brussaard, C.P.D.
Dutilh, B.E.
spellingShingle Wu, L.-Y.
Wijesekara, Y.
Piedade, G.
Pappas, N.
Brussaard, C.P.D.
Dutilh, B.E.
Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
author_facet Wu, L.-Y.
Wijesekara, Y.
Piedade, G.
Pappas, N.
Brussaard, C.P.D.
Dutilh, B.E.
author_sort Wu, L.-Y.
title Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
title_short Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
title_full Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
title_fullStr Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
title_full_unstemmed Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
title_sort benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
publishDate 2024
url https://www.vliz.be/imisdocs/publications/92/408192.pdf
genre Antarc*
Antarctic
genre_facet Antarc*
Antarctic
op_source %3Ci%3EGenome+Biol.+25%281%29%3C%2Fi%3E%3A+97.+%3Ca+href%3D%22https%3A%2F%2Fdx.doi.org%2F10.1186%2Fs13059-024-03236-4%22+target%3D%22_blank%22%3Ehttps%3A%2F%2Fdx.doi.org%2F10.1186%2Fs13059-024-03236-4%3C%2Fa%3E
op_relation info:eu-repo/semantics/altIdentifier/doi/doi.org/10.1186/s13059-024-03236-4
https://www.vliz.be/imisdocs/publications/92/408192.pdf
op_rights info:eu-repo/semantics/openAccess
op_doi https://doi.org/10.1186/s13059-024-03236-4
container_title Genome Biology
container_volume 25
container_issue 1
_version_ 1810490023982661632