Data from: 3D genomics across the tree of life identifies condensin II as a determinant of architecture type

We analyzed conservation of condensin II complex in 24 species across the tree of life subunits with a multistep BLAST approach. The data found here is the BLAST alignments for these searches. The first searches were conducted in October/November 2019 and were manually double-checked in February and...

Full description

Bibliographic Details
Main Author: Hoencamp, Claire
Format: Dataset
Language:unknown
Published: Harvard Dataverse 2021
Subjects:
Online Access:https://dx.doi.org/10.7910/dvn/urokag
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/UROKAG
Description
Summary:We analyzed conservation of condensin II complex in 24 species across the tree of life subunits with a multistep BLAST approach. The data found here is the BLAST alignments for these searches. The first searches were conducted in October/November 2019 and were manually double-checked in February and March 2020. Searches for other organisms were conducted in June 2020. All alignments were posted in: Our approach was based on a search strategy as used in earlier work by King et al. (https://doi.org/10.1093/molbev/msz140). We started by collecting publicly available protein sequences of the condensin I and II complex subunits of four diverse species from Uniprot: Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana. As a positive control we searched for SMC2 and SMC4, and the condensin I subunits, which are thought to be essential in all species. In the first alignment step, we used tblastn to search with the translated protein sequences of the above species against the nucleotide collection (nr/nt) database of the target species. The Expect threshold was set at 0.05. We reported an alignment as a hit when it had an E-value of 1E-10 or less with multiple regions of alignment. If there was an alignment with less confidence, we did an extra validation step to confirm the alignment. This step entailed downloading the translated nucleotide sequence of the putative alignment and using tblastn to search against the genome of a closely related organism with an annotated genome. If this search yielded the putative protein we used as a bait, we considered the hit validated. In the second alignment step we used the same approach, but we blasted against the wgs database of the target species. We again used 1E-10 as E-value cut-off. In the third step, only a few organisms still had missing subunits. To make an extra effort to find these subunits, we used the corresponding subunits of the nearest neighbour, which we identified in step 1 or 2, as bait. As the identified subunits were all nucleotide sequences, we used tblastx to translate these query sequences to protein sequences and blast against a translated nucleotide database. In this step we searched both the nr/nt database and the wgs database. As we were able to identify all SMC2/4 subunits, but still missed condensin II subunits we are now fairly sure these organisms indeed miss these condensin II subunits. However, it is still possible these organisms do have all condensin II subunits, but with very low sequence conservation. We were also able to identify the condensin I subunits in almost all species, with two notable exceptions (see Table S4). The Arctic lamprey lacked condensin I subunits CAPG and CAPD2. Because we were able to identify all condensin II subunits in this organism, we still included this species in our analysis. The other exception is the tardigrade. In this species we identified SMC2 and SMC4, but could not identify any of the accessory subunits of condensin I nor II. There are multiple possible explanations for this. On the one hand, it might have a biological explanation, for example in this organism condensin’s accessory subunits have evolved beyond recognition with our methods, or this species indeed has lost both condensin I and II. On the other hand, the missing subunits may be explained by a technical issue, e.g. the quality of the databases. Therefore we cannot with full certainty conclude that condensin II is indeed missing in the tardigrade, and this will need to be investigated further.