Eukaryotic metabarcoding pipelines for biodiversity assessment of marine benthic communities affected by ocean acidification

The development of high-throughput sequencing technologies has provided ecologists with an efficient approach to assess biodiversity in benthic communities, particularly with the recent advances in metabarcoding technologies using universal primers. However, analyzing such high-throughput data is po...

Full description

Bibliographic Details
Main Author: Soto Valdés, Ana Zaida
Other Authors: Rodrigues, Américo do Patrocínio, Wangensteen, Owen S.
Format: Master Thesis
Language:English
Published: 2017
Subjects:
COI
Online Access:http://hdl.handle.net/10400.8/2854
Description
Summary:The development of high-throughput sequencing technologies has provided ecologists with an efficient approach to assess biodiversity in benthic communities, particularly with the recent advances in metabarcoding technologies using universal primers. However, analyzing such high-throughput data is posing important computational challenges, requiring specialized bioinformatics solutions at different stages during the processing pipeline, such as assembly of paired-end reads, chimera removal, correction of sequencing errors, and clustering of obtained sequences into Molecular Operational Taxonomic Units (MOTUs). The inferred MOTUs can then be used to estimate species diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into MOTUs, relatively little guidance is available on their relative performance. We focused our study in the benthic community from a natural CO2 vent present in the Canary Islands, as it can be used as a natural laboratory in which to investigate the impacts of chronic ocean acidification. Here, we propose a pipeline for studying this community using a fragment of the mitochondrial cytochrome c oxidase I (COI) sequence. We compared two DNA extraction methods, two clustering methods and validated a robust method to eliminate false positives. We found that we can obtain optimal results purifying DNA from 0.3 g of sample. Using the step-by-step aggregation algorithm implemented in SWARM for clustering yields similar results as using the Bayesian clustering method of CROP, in much less time. We introduced the new algorithm MINT (Multiple Intersection of N Tags), in order to eliminate false positives due to random errors produced before or after the sequencing. Our results show that a fully-automated analysis pipeline can be used for assessing biodiversity of marine benthic communities using COI as a metabarcoding marker in an objective, accurate and affordable manner.