Summary: | Dataset: Salp genome and transcriptome A preliminary genome sequence and complete reference transcriptome have been assembled for the Southern Ocean salp, Salpa thompsoni (Urochordata, Thaliacea). The reference transcriptome contains 216,931 sequences; 41,210 (18%) were associated with predicted, hypothetical, or known proteins; 13,058 (6%) were mapped and annotated. Whole-transcriptome (RNA-seq) analysis of 39 samples collected during austral spring and summer 2011 in the WAP, and in summer 2009 in the Indian Sector revealed clustering of samples by regions, seasons, and areas (Bray-Curtis similarity). Spring versus summer samples showed significant differential expression of 77 genes associated with environmental stress response and 51 genes associated with sexual reproduction (paired t-tests, p<0.05). Gene Ontology (GO) term enrichment analysis identified 41 GO terms responsible for spring versus summer differences, including 156 genes associated with translation (i.e., protein synthesis). The genome sequence of 318,767,936 bp covers >50% of the estimated 602 MB (±173 MB) genome size for S. thompsoni, with >50% (16,823) of sequences showing significant homology to known proteins and ~38% (12,151) of the total protein predictions associated with Gene Ontology functional information. A total of 109,958 SNP variants and 9,782 indel predictions were generated, serving as a resource for future phylogenomic and population genomic studies. Salpa thompsoni exhibits rapid rates of evolution (>1.5 times that observed for vertebrates) typical of other urochordates examined. An initial survey of small RNAs revealed the presence of known, conserved miRNAs, as well as novel miRNA genes; unique piRNAs; and mature miRNA signatures for varying developmental stages. For a complete list of measurements, refer to the supplemental document 'Field_names.pdf', and a full dataset description is included in the supplemental file 'Dataset_description.pdf'. The most current version of this dataset is available at: ...
|