High-resolution shotgun metagenomics: the more data, the better?

Abstract In shotgun metagenomics (SM), the state-of-the-art bioinformatic workflows are referred to as high-resolution shotgun metagenomics (HRSM) and require intensive computing and disk storage resources. While the increase in data output of the latest iteration of high-throughput DNA sequencing s...

Full description

Bibliographic Details
Published in:	Briefings in Bioinformatics
Main Authors:	Tremblay, Julien, Schreiber, Lars, Greer, Charles W
Format:	Article in Journal/Newspaper
Language:	English
Published:	Oxford University Press (OUP) 2022
Subjects:	Antarctic Antarc*
Online Access:	http://dx.doi.org/10.1093/bib/bbac443 https://academic.oup.com/bib/article-pdf/23/6/bbac443/47244300/bbac443.pdf

id	croxfordunivpr:10.1093/bib/bbac443
record_format	openpolar
spelling	croxfordunivpr:10.1093/bib/bbac443 2024-09-30T14:23:53+00:00 High-resolution shotgun metagenomics: the more data, the better? Tremblay, Julien Schreiber, Lars Greer, Charles W 2022 http://dx.doi.org/10.1093/bib/bbac443 https://academic.oup.com/bib/article-pdf/23/6/bbac443/47244300/bbac443.pdf en eng Oxford University Press (OUP) https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model Briefings in Bioinformatics volume 23, issue 6 ISSN 1467-5463 1477-4054 journal-article 2022 croxfordunivpr https://doi.org/10.1093/bib/bbac443 2024-09-03T04:11:28Z Abstract In shotgun metagenomics (SM), the state-of-the-art bioinformatic workflows are referred to as high-resolution shotgun metagenomics (HRSM) and require intensive computing and disk storage resources. While the increase in data output of the latest iteration of high-throughput DNA sequencing systems can allow for unprecedented sequencing depth at a minimal cost, adjustments in HRSM workflows will be needed to properly process these ever-increasing sequence datasets. One potential adaptation is to generate so-called shallow SM datasets that contain fewer sequencing data per sample as compared with the more classic high coverage sequencing. While shallow sequencing is a promising avenue for SM data analysis, detailed benchmarks using real-data are lacking. In this case study, we took four public SM datasets, one massive and the others moderate in size and subsampled each dataset at various levels to mimic shallow sequencing datasets of various sequencing depths. Our results suggest that shallow SM sequencing is a viable avenue to obtain sound results regarding microbial community structures and that high-depth sequencing does not bring additional elements for ecological interpretation. More specifically, results obtained by subsampling as little as 0.5 M sequencing clusters per sample were similar to the results obtained with the largest subsampled dataset for human gut and agricultural soil datasets. For an Antarctic dataset, which contained only a few samples, 4 M sequencing clusters per sample was found to generate comparable results to the full dataset. One area where ultra-deep sequencing and maximizing the usage of all data was undeniably beneficial was in the generation of metagenome-assembled genomes. Article in Journal/Newspaper Antarc* Antarctic Oxford University Press Antarctic Briefings in Bioinformatics 23 6
institution	Open Polar
collection	Oxford University Press
op_collection_id	croxfordunivpr
language	English
description	Abstract In shotgun metagenomics (SM), the state-of-the-art bioinformatic workflows are referred to as high-resolution shotgun metagenomics (HRSM) and require intensive computing and disk storage resources. While the increase in data output of the latest iteration of high-throughput DNA sequencing systems can allow for unprecedented sequencing depth at a minimal cost, adjustments in HRSM workflows will be needed to properly process these ever-increasing sequence datasets. One potential adaptation is to generate so-called shallow SM datasets that contain fewer sequencing data per sample as compared with the more classic high coverage sequencing. While shallow sequencing is a promising avenue for SM data analysis, detailed benchmarks using real-data are lacking. In this case study, we took four public SM datasets, one massive and the others moderate in size and subsampled each dataset at various levels to mimic shallow sequencing datasets of various sequencing depths. Our results suggest that shallow SM sequencing is a viable avenue to obtain sound results regarding microbial community structures and that high-depth sequencing does not bring additional elements for ecological interpretation. More specifically, results obtained by subsampling as little as 0.5 M sequencing clusters per sample were similar to the results obtained with the largest subsampled dataset for human gut and agricultural soil datasets. For an Antarctic dataset, which contained only a few samples, 4 M sequencing clusters per sample was found to generate comparable results to the full dataset. One area where ultra-deep sequencing and maximizing the usage of all data was undeniably beneficial was in the generation of metagenome-assembled genomes.
format	Article in Journal/Newspaper
author	Tremblay, Julien Schreiber, Lars Greer, Charles W
spellingShingle	Tremblay, Julien Schreiber, Lars Greer, Charles W High-resolution shotgun metagenomics: the more data, the better?
author_facet	Tremblay, Julien Schreiber, Lars Greer, Charles W
author_sort	Tremblay, Julien
title	High-resolution shotgun metagenomics: the more data, the better?
title_short	High-resolution shotgun metagenomics: the more data, the better?
title_full	High-resolution shotgun metagenomics: the more data, the better?
title_fullStr	High-resolution shotgun metagenomics: the more data, the better?
title_full_unstemmed	High-resolution shotgun metagenomics: the more data, the better?
title_sort	high-resolution shotgun metagenomics: the more data, the better?
publisher	Oxford University Press (OUP)
publishDate	2022
url	http://dx.doi.org/10.1093/bib/bbac443 https://academic.oup.com/bib/article-pdf/23/6/bbac443/47244300/bbac443.pdf
geographic	Antarctic
geographic_facet	Antarctic
genre	Antarc* Antarctic
genre_facet	Antarc* Antarctic
op_source	Briefings in Bioinformatics volume 23, issue 6 ISSN 1467-5463 1477-4054
op_rights	https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model
op_doi	https://doi.org/10.1093/bib/bbac443
container_title	Briefings in Bioinformatics
container_volume	23
container_issue	6
_version_	1811638905070419968

High-resolution shotgun metagenomics: the more data, the better?

Similar Items