Associate Professor

#This dataset is associated with the following publication: Potential Rhodopsin- and Bacteriochlorophyll-Based Dual Phototrophy in a High Arctic Glacier by Yonghui Zeng, Xihan Chen, Anne Mette Madsen, Athanasios Zervas, Tue Kjærgaard Nielsen, Adrian-Stefan Andrei, Lars Chresten Lund-Hansen, Yongqin...

Full description

Bibliographic Details
Main Author: Zeng, Yonghui
Format: Dataset
Language:unknown
Published: University of Copenhagen 2021
Subjects:
Ice
Online Access:https://dx.doi.org/10.17894/ucph.4cc53df8-4f2d-4b22-bc9e-b2c004764cac
https://erda.ku.dk/public/archives/06cc144c61cfb4edd33a317c562c7619/published-archive.html
Description
Summary:#This dataset is associated with the following publication: Potential Rhodopsin- and Bacteriochlorophyll-Based Dual Phototrophy in a High Arctic Glacier by Yonghui Zeng, Xihan Chen, Anne Mette Madsen, Athanasios Zervas, Tue Kjærgaard Nielsen, Adrian-Stefan Andrei, Lars Chresten Lund-Hansen, Yongqin Liu, Lars Hestbjerg Hansen. published at mBio in Nov 2020, 11 (6) e02641-20; DOI: 10.1128/mBio.02641-20 #The dataset consists of: 1. assembled contigs from the metagenomes VRS.soil (exposed soil sample) and VRS.ice ("Lille Firm" glacial ice sample). files: VRS.ice.contigs.fa.tar.gz and VRS.soil.contigs.fa.tar.gz 2. metagenome-assembled genomes of the VRS exposed soil sample (ES) and the "Lille Firm" glacial ice sample (LF). file: all.MAGs.tar.gz 3. Table S2 in the original article, which contains detailed information about each MAG. file: Table S2 - MAG summary.xlsx #For original reads, please refer to the NCBI BioProjects PRJNA548505 and PRJNA552582. : #The following sampling and sequencing details related to the generation of these datasets were extracted from the original publication (Zeng et al. mBio, 2020): The sampling site was the “Lille Firn” (LF) glacier (81.566° N, 16.363° W) in the Knuths Fjeld area of northeast Greenland, 5.6 km away from the Villum Research Station (VRS). The LF glacier was independent and formed at the lee side of a small hill, surrounded by ~160 km2 land of permafrost. Surface ice was collected on 2 July 2018 using a sharp spear after removal of the top ~2 m thick snow cover. The sampled ice was processed within 24 hrs in the VRS laboratory. The ice surface was cleaned with running pre-sterilized and cooled water before melting at 4 °C in sterile whirl-pak sampling bags (Nasco) for 24-48 hours. During our fieldwork, the melt season just began and the majority of the whole area was heavily covered by snow. There was a small patch of exposed soil (designated as ES), ca. 50 meters away to the north of the glacier sampling site, where a few kilograms of surface soil (top layer, a few centimeters thick) were collected into a sterile whirl-pak bag and kept at 4 °C as a comparison site for the following cultivation and metagenomics analyses. For amplicon and metagenomics analyses, cells in ~20 L of 3.0 μm-prefiltered melt ice were collected onto 0.2 μm filters and the total DNA was extracted using the DNeasy PowerWater DNA extraction kit (Qiagen, Germany). Amplicon sequencing of 16S rRNA gene was conducted at BGI Hong Kong targeting the V3-V4 region and the data were analyzed using the 16S pipeline embedded in the Geneious Prime (Biomatters, New Zealand). The PE reads were first end-trimmed with a quality score set as Q>20 and then merged. Non-merged reads were discarded and only the high-quality merged reads were used for the community structure analysis. Total environmental DNA was sequenced on an Illumina NovaSeq platform (BGI Hong Kong). The generated ~220 G bases of PE reads (150 bp) were end-trimmed (>Q20, >50 bases long) and assembled using Megahit (ver. 1.1.x; Li et al., 2015) with a minimum contig length of 500 bp. There were 948,300 contigs (⩾1 kb; total length, 2.69 G bases) assembled for the LF glacial sample and 2,834,721 contigs (⩾1 kb; total length, 5.63 G bases) for the ES soil sample. Binning of metagenome-assembled contigs was performed using MetaBAT2 with default settings (Kang et al., 2019). Genomic bins were de-replicated using dRep (Olm et al., 2017) and quality checked with CheckM by following the lineage-specific workflow (Parks et al., 2015). Only bins of good quality (>50% completeness, <10% contamination; recommended by Bowers et al., 2017) were included for further analysis. Each bin was taxonomically classified using the GTDB-Tk tool (https://github.com/Ecogenomics/GTDBTk, Parks, et al., 2018). Based on the GTDB-Tk classification results, genomes that have more than 10% of markers with multiple hits were discarded.