Supporting data for "To assemble or not to resemble – A validated Comparative Metatranscriptomics Workflow (CoMW)"

Metatranscriptomics has been used widely for investigation and quantification of microbial communities’ activity in response to external stimuli. By assessing the genes expressed, metatranscriptomics provide an understanding of the interactions between different major functional guilds and the envir...

Full description

Bibliographic Details
Main Authors:	Lanzen Anders, Carsten, Jacobsen Suhr, Muhammad, Anwar Zohaib, Toke, Bang-Andreasen
Format:	Dataset
Language:	English
Published:	GigaScience Database 2019
Subjects:	Metagenomic Transcriptomic Workflow Software metatranscriptomics benchmarking assembly alignment precision recall false positives Arctic
Online Access:	https://dx.doi.org/10.5524/100630 http://gigadb.org/dataset/100630

id	ftdatacite:10.5524/100630
record_format	openpolar
spelling	ftdatacite:10.5524/100630 2023-05-15T15:13:54+02:00 Supporting data for "To assemble or not to resemble – A validated Comparative Metatranscriptomics Workflow (CoMW)" Lanzen Anders Carsten, Jacobsen Suhr Muhammad, Anwar Zohaib Toke, Bang-Andreasen 2019 https://dx.doi.org/10.5524/100630 http://gigadb.org/dataset/100630 en eng GigaScience Database CC0 1.0 Universal http://creativecommons.org/publicdomain/zero/1.0 CC0 Metagenomic Transcriptomic Workflow Software metatranscriptomics benchmarking assembly alignment precision recall false positives dataset Dataset GigaDB Dataset 2019 ftdatacite https://doi.org/10.5524/100630 2021-11-05T12:55:41Z Metatranscriptomics has been used widely for investigation and quantification of microbial communities’ activity in response to external stimuli. By assessing the genes expressed, metatranscriptomics provide an understanding of the interactions between different major functional guilds and the environment. Here, we present de-novo assembly-based Comparative Metatranscriptomics Workflow (CoMW) implemented in a modular, reproducible structure, significantly improving the annotation and quantification of metatranscriptomes. Metatranscriptomics typically utilize short sequence reads, which can either be directly aligned to external reference databases (“assembly-free approach”) or first assembled into contigs before alignment (“assembly-based approach”). We also compare CoMW (assembly-based implementation) with assembly-free alternative workflow, using simulated and realworld metatranscriptomes from Arctic and Temperate terrestrial environments. We evaluate their accuracy in precision and recall using generic and specialized hierarchical protein databases. CoMW provided significantly fewer false positives resulting in more precise identification and quantification of functional genes in metatranscriptomes. Using the comprehensive database M5nr, the assembly-based approach identified genes with only 0.6% false positives at thresholds ranging from inclusive to stringent compared to the assembly-free approach yielding up to 15% false positives. Using specialized databases (Carbohydrate Active-enzyme and Nitrogen Cycle), the assembly-based approach identified and quantified genes with 3-5x less false positives. We also evaluated the impact of both approaches on real-world datasets. We present an open source de-novo assembly-based Comparative Metatranscriptomics Workflow (CoMW). Our benchmarking findings support the argument of assembling short reads into contigs before alignment to a reference database, since this provides higher precision and minimizes false positives. Dataset Arctic DataCite Metadata Store (German National Library of Science and Technology) Arctic
institution	Open Polar
collection	DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id	ftdatacite
language	English
topic	Metagenomic Transcriptomic Workflow Software metatranscriptomics benchmarking assembly alignment precision recall false positives
spellingShingle	Metagenomic Transcriptomic Workflow Software metatranscriptomics benchmarking assembly alignment precision recall false positives Lanzen Anders Carsten, Jacobsen Suhr Muhammad, Anwar Zohaib Toke, Bang-Andreasen Supporting data for "To assemble or not to resemble – A validated Comparative Metatranscriptomics Workflow (CoMW)"
topic_facet	Metagenomic Transcriptomic Workflow Software metatranscriptomics benchmarking assembly alignment precision recall false positives
description	Metatranscriptomics has been used widely for investigation and quantification of microbial communities’ activity in response to external stimuli. By assessing the genes expressed, metatranscriptomics provide an understanding of the interactions between different major functional guilds and the environment. Here, we present de-novo assembly-based Comparative Metatranscriptomics Workflow (CoMW) implemented in a modular, reproducible structure, significantly improving the annotation and quantification of metatranscriptomes. Metatranscriptomics typically utilize short sequence reads, which can either be directly aligned to external reference databases (“assembly-free approach”) or first assembled into contigs before alignment (“assembly-based approach”). We also compare CoMW (assembly-based implementation) with assembly-free alternative workflow, using simulated and realworld metatranscriptomes from Arctic and Temperate terrestrial environments. We evaluate their accuracy in precision and recall using generic and specialized hierarchical protein databases. CoMW provided significantly fewer false positives resulting in more precise identification and quantification of functional genes in metatranscriptomes. Using the comprehensive database M5nr, the assembly-based approach identified genes with only 0.6% false positives at thresholds ranging from inclusive to stringent compared to the assembly-free approach yielding up to 15% false positives. Using specialized databases (Carbohydrate Active-enzyme and Nitrogen Cycle), the assembly-based approach identified and quantified genes with 3-5x less false positives. We also evaluated the impact of both approaches on real-world datasets. We present an open source de-novo assembly-based Comparative Metatranscriptomics Workflow (CoMW). Our benchmarking findings support the argument of assembling short reads into contigs before alignment to a reference database, since this provides higher precision and minimizes false positives.
format	Dataset
author	Lanzen Anders Carsten, Jacobsen Suhr Muhammad, Anwar Zohaib Toke, Bang-Andreasen
author_facet	Lanzen Anders Carsten, Jacobsen Suhr Muhammad, Anwar Zohaib Toke, Bang-Andreasen
author_sort	Lanzen Anders
title	Supporting data for "To assemble or not to resemble – A validated Comparative Metatranscriptomics Workflow (CoMW)"
title_short	Supporting data for "To assemble or not to resemble – A validated Comparative Metatranscriptomics Workflow (CoMW)"
title_full	Supporting data for "To assemble or not to resemble – A validated Comparative Metatranscriptomics Workflow (CoMW)"
title_fullStr	Supporting data for "To assemble or not to resemble – A validated Comparative Metatranscriptomics Workflow (CoMW)"
title_full_unstemmed	Supporting data for "To assemble or not to resemble – A validated Comparative Metatranscriptomics Workflow (CoMW)"
title_sort	supporting data for "to assemble or not to resemble – a validated comparative metatranscriptomics workflow (comw)"
publisher	GigaScience Database
publishDate	2019
url	https://dx.doi.org/10.5524/100630 http://gigadb.org/dataset/100630
geographic	Arctic
geographic_facet	Arctic
genre	Arctic
genre_facet	Arctic
op_rights	CC0 1.0 Universal http://creativecommons.org/publicdomain/zero/1.0
op_rightsnorm	CC0
op_doi	https://doi.org/10.5524/100630
_version_	1766344412223242240

Supporting data for "To assemble or not to resemble – A validated Comparative Metatranscriptomics Workflow (CoMW)"

Similar Items