To assemble or not to resemble:A validated Comparative Metatranscriptomics Workflow (CoMW)

Background Metatranscriptomics has been used widely for investigation and quantification of microbial communities’ activity in response to external stimuli. By assessing the genes expressed, metatranscriptomics provide an understanding of the interactions between different major functional guilds an...

Full description

Bibliographic Details
Main Authors: Anwar, Muhammad Zohaib, Lanzen, Anders, Bang-Andreasen, Toke, Jacobsen, Carsten Suhr
Format: Report
Language:English
Published: 2019
Subjects:
Online Access:https://pure.au.dk/portal/da/publications/to-assemble-or-not-to-resemble(962d057c-aa44-474d-8705-8e4f48d42208).html
https://doi.org/10.1101/642348
id ftuniaarhuspubl:oai:pure.atira.dk:publications/962d057c-aa44-474d-8705-8e4f48d42208
record_format openpolar
spelling ftuniaarhuspubl:oai:pure.atira.dk:publications/962d057c-aa44-474d-8705-8e4f48d42208 2023-05-15T15:11:17+02:00 To assemble or not to resemble:A validated Comparative Metatranscriptomics Workflow (CoMW) Anwar, Muhammad Zohaib Lanzen, Anders Bang-Andreasen, Toke Jacobsen, Carsten Suhr 2019 https://pure.au.dk/portal/da/publications/to-assemble-or-not-to-resemble(962d057c-aa44-474d-8705-8e4f48d42208).html https://doi.org/10.1101/642348 eng eng info:eu-repo/semantics/openAccess Anwar , M Z , Lanzen , A , Bang-Andreasen , T & Jacobsen , C S 2019 ' To assemble or not to resemble : A validated Comparative Metatranscriptomics Workflow (CoMW) ' . https://doi.org/10.1101/642348 workingPaper 2019 ftuniaarhuspubl https://doi.org/10.1101/642348 2020-07-18T22:16:30Z Background Metatranscriptomics has been used widely for investigation and quantification of microbial communities’ activity in response to external stimuli. By assessing the genes expressed, metatranscriptomics provide an understanding of the interactions between different major functional guilds and the environment. Here, we present de-novo assembly-based Comparative Metatranscriptomics Workflow (CoMW) implemented in a modular, reproducible structure, significantly improving the annotation and quantification of metatranscriptomes. Metatranscriptomics typically utilize short sequence reads, which can either be directly aligned to external reference databases (“assembly-free approach”) or first assembled into contigs before alignment (“assembly-based approach”). We also compare CoMW (assembly-based implementation) with assembly-free alternative workflow, using simulated and real-world metatranscriptomes from Arctic and Temperate terrestrial environments. We evaluate their accuracy in precision and recall using generic and specialized hierarchical protein databases. Results CoMW provided significantly fewer false positives resulting in more precise identification and quantification of functional genes in metatranscriptomes. Using the comprehensive database M5nr, the assembly-based approach identified genes with only 0.6% false positives at thresholds ranging from inclusive to stringent compared to the assembly-free approach yielding up to 15% false positives. Using specialized databases (Carbohydrate Active-enzyme and Nitrogen Cycle), the assembly-based approach identified and quantified genes with 3-5x less false positives. We also evaluated the impact of both approaches on real-world datasets. Conclusions We present an open source de-novo assembly-based Comparative Metatranscriptomics Workflow (CoMW). Our benchmarking findings support the argument of assembling short reads into contigs before alignment to a reference database, since this provides higher precision and minimizes false positives. Report Arctic Aarhus University: Research Arctic
institution Open Polar
collection Aarhus University: Research
op_collection_id ftuniaarhuspubl
language English
description Background Metatranscriptomics has been used widely for investigation and quantification of microbial communities’ activity in response to external stimuli. By assessing the genes expressed, metatranscriptomics provide an understanding of the interactions between different major functional guilds and the environment. Here, we present de-novo assembly-based Comparative Metatranscriptomics Workflow (CoMW) implemented in a modular, reproducible structure, significantly improving the annotation and quantification of metatranscriptomes. Metatranscriptomics typically utilize short sequence reads, which can either be directly aligned to external reference databases (“assembly-free approach”) or first assembled into contigs before alignment (“assembly-based approach”). We also compare CoMW (assembly-based implementation) with assembly-free alternative workflow, using simulated and real-world metatranscriptomes from Arctic and Temperate terrestrial environments. We evaluate their accuracy in precision and recall using generic and specialized hierarchical protein databases. Results CoMW provided significantly fewer false positives resulting in more precise identification and quantification of functional genes in metatranscriptomes. Using the comprehensive database M5nr, the assembly-based approach identified genes with only 0.6% false positives at thresholds ranging from inclusive to stringent compared to the assembly-free approach yielding up to 15% false positives. Using specialized databases (Carbohydrate Active-enzyme and Nitrogen Cycle), the assembly-based approach identified and quantified genes with 3-5x less false positives. We also evaluated the impact of both approaches on real-world datasets. Conclusions We present an open source de-novo assembly-based Comparative Metatranscriptomics Workflow (CoMW). Our benchmarking findings support the argument of assembling short reads into contigs before alignment to a reference database, since this provides higher precision and minimizes false positives.
format Report
author Anwar, Muhammad Zohaib
Lanzen, Anders
Bang-Andreasen, Toke
Jacobsen, Carsten Suhr
spellingShingle Anwar, Muhammad Zohaib
Lanzen, Anders
Bang-Andreasen, Toke
Jacobsen, Carsten Suhr
To assemble or not to resemble:A validated Comparative Metatranscriptomics Workflow (CoMW)
author_facet Anwar, Muhammad Zohaib
Lanzen, Anders
Bang-Andreasen, Toke
Jacobsen, Carsten Suhr
author_sort Anwar, Muhammad Zohaib
title To assemble or not to resemble:A validated Comparative Metatranscriptomics Workflow (CoMW)
title_short To assemble or not to resemble:A validated Comparative Metatranscriptomics Workflow (CoMW)
title_full To assemble or not to resemble:A validated Comparative Metatranscriptomics Workflow (CoMW)
title_fullStr To assemble or not to resemble:A validated Comparative Metatranscriptomics Workflow (CoMW)
title_full_unstemmed To assemble or not to resemble:A validated Comparative Metatranscriptomics Workflow (CoMW)
title_sort to assemble or not to resemble:a validated comparative metatranscriptomics workflow (comw)
publishDate 2019
url https://pure.au.dk/portal/da/publications/to-assemble-or-not-to-resemble(962d057c-aa44-474d-8705-8e4f48d42208).html
https://doi.org/10.1101/642348
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_source Anwar , M Z , Lanzen , A , Bang-Andreasen , T & Jacobsen , C S 2019 ' To assemble or not to resemble : A validated Comparative Metatranscriptomics Workflow (CoMW) ' . https://doi.org/10.1101/642348
op_rights info:eu-repo/semantics/openAccess
op_doi https://doi.org/10.1101/642348
_version_ 1766342166494314496