Data from: Sequence clustering threshold has little effect on the recovery of microbial community structure ...

Analysis of microbial community structure by multivariate ordination methods, using data obtained by high throughput sequencing of amplified markers (i.e., DNA metabarcoding), often requires clustering of DNA sequences into operational taxonomic units (OTUs). Parameters for the clustering procedure...

Full description

Bibliographic Details
Main Authors: Botnen, Synnøve Smebye, Davey, Marie Louise, Halvorsen, Rune, Kauserud, Håvard
Format: Dataset
Language:English
Published: Dryad 2018
Subjects:
Online Access:https://dx.doi.org/10.5061/dryad.jb79430
https://datadryad.org/stash/dataset/doi:10.5061/dryad.jb79430
Description
Summary:Analysis of microbial community structure by multivariate ordination methods, using data obtained by high throughput sequencing of amplified markers (i.e., DNA metabarcoding), often requires clustering of DNA sequences into operational taxonomic units (OTUs). Parameters for the clustering procedure tend not to be justified but are set by tradition rather than being based on explicit knowledge. In this study, we explore the extent to which ordination results are affected by variation in parameter settings for the clustering procedure. Amplicon sequence data from nine microbial community studies, representing different sampling designs, spatial scales and ecosystems, were subjected to clustering into OTUs at seven different similarity thresholds (clustering thresholds) ranging from 87% to 99% sequence similarity. The 63 data sets thus obtained were subjected to parallel DCA and GNMDS ordinations. The resulting community structures were highly similar across all clustering thresholds. We explain this pattern by ... : Dataset4_RawDataTarball containing raw data in the form of 5 .sff.txt files. Corresponding mapping files for demultiplexing of each raw file are provided, in addition to a combined mapping file with treatment information.Dataset4_Dryad.tar.gzDataset7_DryadTar archive containing raw data for Dataset 7 in the form of 4 .sff files. Corresponding mapping files for demultiplexing are provided for each data file.Dataset3_DryadTar archive consisting of 20 .fastq files representing raw, demultiplexed data.Dataset9_DryadTar archive containing 20 .fastq files representing raw, demultiplexed illumina sequencing data.Sequence_Data_For_ClusteringFasta files containing the quality filtered sequences used as the basis for all clustering analyses.sequence_data_for_clustering.tar.gz ...