Remediation of petroleum contaminants in the Antarctic and subantarctic - pyrosequencing genomic DNA extracts from soil

This dataset contains information obtained by pyrosequencing genomic DNA extracts from soil with PCR primers targeting the bacterial 16S gene (27F/519R) and fungal ITS region (ITS1/ITS4-B). The data were processed in a pipeline using freely available 'mothur' software (v1.24.1). The reads...

Full description

Bibliographic Details
Other Authors: SNAPE, IAN (hasPrincipalInvestigator), SICILIANO, STEVEN (hasPrincipalInvestigator), LAGEREWSKIJ, GREG (processor), Australian Antarctic Data Centre (publisher)
Format: Dataset
Language:unknown
Published: Australian Antarctic Data Centre
Subjects:
Online Access:https://researchdata.ands.org.au/remediation-petroleum-contaminants-extracts-soil/699480
https://data.aad.gov.au/metadata/records/ASAC_1163_pyrosequencing
http://nla.gov.au/nla.party-617536
Description
Summary:This dataset contains information obtained by pyrosequencing genomic DNA extracts from soil with PCR primers targeting the bacterial 16S gene (27F/519R) and fungal ITS region (ITS1/ITS4-B). The data were processed in a pipeline using freely available 'mothur' software (v1.24.1). The reads were processed in 4 ways, this was a combination of subsampling the data to a number that normalised all but the 10 lowest samples and excluding operational taxonomic units (OTUs) that only occurred once in the entire dataset. This resulted in designations FULL_READS for the unsubsampled analyses, and SUBSAMPLED for those subsampled. Then SINGLETONS_INCLUDED (or SING_INC) for analyses where singleton OTUs were included and SINGLETONS_EXCLUDED (or SING_EXC) for those where dataset wide singletons were removed. For each analysis, this produced a .fasta (sequence info), .names (sequence redundancies) and a .groups (sequence to sample assignment) (in the chimera checked data fo lders) For each of these combinations an OTU abundance matrix was generated that has a .shared extension, which is a table of OTU by samples and the corresponding abundance of the OTU. Various alpha and beta diversity measure were calculated for each analysis, including diversity indices (extension .groups.summary), catchall (various .csv files), rarefaction data for each sample (extension .rarefaction), relative and species abundance data (extension .rabund and .sabund), unifrac community similarity measures (contained in the unifrac folder, are a distance matrix of the sample-by-sample dissimilarity, and a list .summary of the dissimilarities, in addition a neighbour-joining tree of the entire dataset from which the unifrac measures are calculated). In addition to these diversity measures, taxonomy was defined by bayesian searching of each OTU sequence against the GreenGenes database (2011 version, McDonald et al 2011, ISME J) this is provided as a .taxonomy and .summary file in the taxonomy folders. Also representative sequences for each OTU are in the OTU_rep folders as .fasta and .names files. The raw data are provided in the preprocessing files folders.