Data from: A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification

Whole genome duplication (WGD) is often considered to be mechanistically associated with species diversification. Such ideas have been anecdotally attached to a WGD at the stem of the salmonid fish family, but remain untested. Here, we characterized an extensive set of gene paralogues retained from...

Full description

Bibliographic Details
Main Authors: Macqueen, Daniel J., Johnston, Ian A.
Format: Dataset
Language:English
Published: Dryad 2014
Subjects:
Online Access:https://dx.doi.org/10.5061/dryad.2m3v4
http://datadryad.org/stash/dataset/doi:10.5061/dryad.2m3v4
Description
Summary:Whole genome duplication (WGD) is often considered to be mechanistically associated with species diversification. Such ideas have been anecdotally attached to a WGD at the stem of the salmonid fish family, but remain untested. Here, we characterized an extensive set of gene paralogues retained from the salmonid WGD, in species covering the major lineages (subfamilies Salmoninae, Thymallinae and Coregoninae). By combining the data in calibrated relaxed molecular clock analyses, we provide the first well-constrained and direct estimate for the timing of the salmonid WGD. Our results suggest that the event occurred no later in time than 88 Ma and that 40–50 Myr passed subsequently until the subfamilies diverged. We also recovered a Thymallinae–Coregoninae sister relationship with maximal support. Comparative phylogenetic tests demonstrated that salmonid diversification patterns are closely allied in time with the continuous climatic cooling that followed the Eocene–Oligocene transition, with the highest diversification rates coinciding with recent ice ages. Further tests revealed considerably higher speciation rates in lineages that evolved anadromy—the physiological capacity to migrate between fresh and seawater—than in sister groups that retained the ancestral state of freshwater residency. Anadromy, which probably evolved in response to climatic cooling, is an established catalyst of genetic isolation, particularly during environmental perturbations (for example, glaciation cycles). We thus conclude that climate-linked ecophysiological factors, rather than WGD, were the primary drivers of salmonid diversification. : Data files related to CO1 diversification analysesThis zip file contains data related to the diversification analyses. This includes the cytochrome c oxidase 1 DNA alignment in fasta format, the time-calibrated phylogenetic tree in newick format and nexus format (with FigTree annotations including node posterior probabilities and 95% credibility intervals) as well as the state file used in BiSSE as csv.CO1 data.7zData files for salmonid nuclear gene paralogsThis zip file contains 18 nucleotide sequence alignment files in fasta format that were combined in the BEAST analyses that led to the dating of the salmonid whole genome duplication (WGD) event and in a range of other phylogenetic analyses described in the paper (see Figure 2, and Figure S20 and Figure S27 in the electronic supplementary material). Each alignment is in frame, so can be translated into coded protein data easily. These files have names that can be attached to individual phylogenetic tree data provided in the electronic supplementary material of the paper (Figures S1-S18). Note: the nomenclature for species is identical across alignments as follows: Osmerus = Osmerus mordax; Esox = Esox Lucius; Ss1 = Salmo salar WGD paralog 1; Ss2 = S. salar WGD paralog 2; Om1 = Oncorhynchus mykiss WGD paralog 1; Om2 = O. mykiss WGD paralog 2; Cl1 = Coregonus lavaretus WGD paralog 1; Cl2 = C. lavaretus WGD paralog 2; Tt1 = Thymallus thymallus WGD paralog 1; Tt2 = T. thymallus WGD paralog 2.WGD paralog alignments.7zNucleotide sequence data used for mitogenome analysesThis zip file contains 26 nucleotide sequence alignment files in fasta format, representing protein-coding genes spanning the mitogenome of 24 salmonid species and 3 related outgroup species. Each of 13 clearly named protein-coding genes is represented by two files, representing codon position 1 and 2, which are stated as P1 and P2. This data was combined in a time calibrated BEAST analysis, dating major divergence times in salmonid evolution and in a range of other phylogenetic analyses described in the paper (see Figures S21-S26 and Figure S28 in the electronic supplementary material).Nuc sequences for mitogenome analyses.7zProtein sequence data for mitogenome analysesThis zip file contains 13 protein sequence alignment files in fasta format, representing 13 proteins coded in the mitogenomes of 24 salmonid species and 3 related outgroup species. This data was used in a range of other phylogenetic analyses described in the paper (see Figures S21-S26 and Figure S28 in the electronic supplementary material).Prot sequenes for mitogenome analyses.7z