Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data
Canine distemper virus (CDV) and phocine distemper virus (PDV) are major pathogens to terrestrial and marine mammals. Yet little is known about the timing and geographical origin of distemper viruses and to what extent it was influenced by environmental change and human activities. To address this,...
Main Authors: | , , , , , , , |
---|---|
Format: | Dataset |
Language: | English |
Published: |
Dryad
2021
|
Subjects: | |
Online Access: | https://dx.doi.org/10.5061/dryad.fxpnvx0sq http://datadryad.org/stash/dataset/doi:10.5061/dryad.fxpnvx0sq |
id |
ftdatacite:10.5061/dryad.fxpnvx0sq |
---|---|
record_format |
openpolar |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
English |
description |
Canine distemper virus (CDV) and phocine distemper virus (PDV) are major pathogens to terrestrial and marine mammals. Yet little is known about the timing and geographical origin of distemper viruses and to what extent it was influenced by environmental change and human activities. To address this, we i) performed the first comprehensive time-calibrated phylogenetic analysis of the two distemper viruses; ii) mapped distemper antibody and virus detection data from marine mammals collected between 1972-2018; iii) and compiled historical reports on distemper dating back to the 18th century. We find that CDV and PDV diverged in the early 17th century. Modern CDV strains last shared a common ancestor in the 19th century with a marked radiation during the 1930s-50s. Modern PDV strains are of more recent origin, diverging in the 1970s-80s. Based on the compiled information on distemper distribution, the diverse host range of CDV and basal phylogenetic placement of terrestrial morbilliviruses, we hypothesize a terrestrial CDV-like ancestor giving rise to PDV in the North Atlantic. Moreover, given the estimated timing of distemper origin and radiation, we hypothesize a prominent role of environmental change such as the Little Ice Age, and human activities like globalisation and war in distemper virus evolution. : The data sets were created by compiling recently published H gene PDV sequences and CDV sequence data obtained from NCBI. The first data (Alignment 1) set consists of 446 near-complete H gene sequences (25 PDV + 421 CDV sequences; 1,668 bp; position 7,199-8,866 in NC_001921) comprising the majority of distemper H gene sequences available in GenBank at the time of analysis (July 2020). The second data set (Alignment 2) represents the full sequences used in the Bayesian phylogenetic analyses. The alignment consists of 125 full-length H gene sequences (25 PDV and 100 CDV; 1,812 bp; position 7,079-8,890 in NC_001921), representing major PDV and CDV clades in terrestrial and marine mammals detected between 1982-2018. Both data sets were imported and edited in Geneious version 9.1.8, the alignments were generated using MUSCLE and sequences were edited to the same reading frame and length excluding stop codons. Sequences obtained through the studies Stokholm et al, 2019 and Puryear et al, in review, have not been made publicly available yet but sequences have been submitted to genbank and will be released under the accession numbers OK104948-91 and MW581015-26. The third data set (Alignment 3) was used for the final Bayesian phylogenetic analyses. It consists of 125 Hemagglutinin gene sequences (25 PDV sequences and 100 CDV sequences) without the 3rd codon positions (1,208 bp). The 3rd codon positions were removed due to the detection of substitution saturation. The bash codes included here were used for the following tasks: - Edit the input sequence names in the fasta file. - Edit the fasta files for excluding the third codon positions. - Submit many slurm jobs in parallel on a remote server, that can run BEAST v.2.6.3 in parallel for multiple xml files at once. - Edit many xml files created in 'BEAuti' in order to be able to perform stepping stone (or pathsampling) analysis in order to obtain marginal likelihood values for each analysis of each xml file. - Collect the marginal likelihood values generated from each individual analysis. - Start multiple slurm jobs in 'treeanotator' in parallel to get consensus tress. - Collect all consensus trees. The R code included here was used to generate Supplementary Figure 3 "Rcode_for_making_Supplementary Figure_3_2021mar.R" based on "marginal_lkhoods_tmp08.txt". The code can be run in R v.4.0.2. It is used to plot a diagram that shows a summary of the tree age estimates and HPD 95 % intervals of CDV, PDV and CDV/PDV sequences of BEAST analyses using different setups. The tree ages are obtained from the consensus trees generated through the BEAST analysis of each xml file. Please note that all these pieces of code have been setup to run on a specific remote server. They will need to be adjusted in order to be able to run elsewhere. |
format |
Dataset |
author |
Stokholm, Iben Puryear, Wendy Sawatzki, Kaitlin Wilhelm Knudsen, Steen Terkelsen, Thilde Becher, Paul Siebert, Ursula Tange Olsen, Morten |
spellingShingle |
Stokholm, Iben Puryear, Wendy Sawatzki, Kaitlin Wilhelm Knudsen, Steen Terkelsen, Thilde Becher, Paul Siebert, Ursula Tange Olsen, Morten Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data |
author_facet |
Stokholm, Iben Puryear, Wendy Sawatzki, Kaitlin Wilhelm Knudsen, Steen Terkelsen, Thilde Becher, Paul Siebert, Ursula Tange Olsen, Morten |
author_sort |
Stokholm, Iben |
title |
Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data |
title_short |
Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data |
title_full |
Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data |
title_fullStr |
Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data |
title_full_unstemmed |
Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data |
title_sort |
emergence and radiation of distemper viruses in terrestrial and marine mammals - input files, bash and r codes for analysing pdv and cdv sequence data |
publisher |
Dryad |
publishDate |
2021 |
url |
https://dx.doi.org/10.5061/dryad.fxpnvx0sq http://datadryad.org/stash/dataset/doi:10.5061/dryad.fxpnvx0sq |
genre |
North Atlantic |
genre_facet |
North Atlantic |
op_relation |
https://dx.doi.org/10.5281/zenodo.5539996 https://dx.doi.org/10.5281/zenodo.5539998 |
op_rights |
Creative Commons Zero v1.0 Universal https://creativecommons.org/publicdomain/zero/1.0/legalcode cc0-1.0 |
op_rightsnorm |
CC0 |
op_doi |
https://doi.org/10.5061/dryad.fxpnvx0sq https://doi.org/10.5281/zenodo.5539996 https://doi.org/10.5281/zenodo.5539998 |
_version_ |
1766137465010126848 |
spelling |
ftdatacite:10.5061/dryad.fxpnvx0sq 2023-05-15T17:37:30+02:00 Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data Stokholm, Iben Puryear, Wendy Sawatzki, Kaitlin Wilhelm Knudsen, Steen Terkelsen, Thilde Becher, Paul Siebert, Ursula Tange Olsen, Morten 2021 https://dx.doi.org/10.5061/dryad.fxpnvx0sq http://datadryad.org/stash/dataset/doi:10.5061/dryad.fxpnvx0sq en eng Dryad https://dx.doi.org/10.5281/zenodo.5539996 https://dx.doi.org/10.5281/zenodo.5539998 Creative Commons Zero v1.0 Universal https://creativecommons.org/publicdomain/zero/1.0/legalcode cc0-1.0 CC0 dataset Dataset 2021 ftdatacite https://doi.org/10.5061/dryad.fxpnvx0sq https://doi.org/10.5281/zenodo.5539996 https://doi.org/10.5281/zenodo.5539998 2022-02-08T13:02:41Z Canine distemper virus (CDV) and phocine distemper virus (PDV) are major pathogens to terrestrial and marine mammals. Yet little is known about the timing and geographical origin of distemper viruses and to what extent it was influenced by environmental change and human activities. To address this, we i) performed the first comprehensive time-calibrated phylogenetic analysis of the two distemper viruses; ii) mapped distemper antibody and virus detection data from marine mammals collected between 1972-2018; iii) and compiled historical reports on distemper dating back to the 18th century. We find that CDV and PDV diverged in the early 17th century. Modern CDV strains last shared a common ancestor in the 19th century with a marked radiation during the 1930s-50s. Modern PDV strains are of more recent origin, diverging in the 1970s-80s. Based on the compiled information on distemper distribution, the diverse host range of CDV and basal phylogenetic placement of terrestrial morbilliviruses, we hypothesize a terrestrial CDV-like ancestor giving rise to PDV in the North Atlantic. Moreover, given the estimated timing of distemper origin and radiation, we hypothesize a prominent role of environmental change such as the Little Ice Age, and human activities like globalisation and war in distemper virus evolution. : The data sets were created by compiling recently published H gene PDV sequences and CDV sequence data obtained from NCBI. The first data (Alignment 1) set consists of 446 near-complete H gene sequences (25 PDV + 421 CDV sequences; 1,668 bp; position 7,199-8,866 in NC_001921) comprising the majority of distemper H gene sequences available in GenBank at the time of analysis (July 2020). The second data set (Alignment 2) represents the full sequences used in the Bayesian phylogenetic analyses. The alignment consists of 125 full-length H gene sequences (25 PDV and 100 CDV; 1,812 bp; position 7,079-8,890 in NC_001921), representing major PDV and CDV clades in terrestrial and marine mammals detected between 1982-2018. Both data sets were imported and edited in Geneious version 9.1.8, the alignments were generated using MUSCLE and sequences were edited to the same reading frame and length excluding stop codons. Sequences obtained through the studies Stokholm et al, 2019 and Puryear et al, in review, have not been made publicly available yet but sequences have been submitted to genbank and will be released under the accession numbers OK104948-91 and MW581015-26. The third data set (Alignment 3) was used for the final Bayesian phylogenetic analyses. It consists of 125 Hemagglutinin gene sequences (25 PDV sequences and 100 CDV sequences) without the 3rd codon positions (1,208 bp). The 3rd codon positions were removed due to the detection of substitution saturation. The bash codes included here were used for the following tasks: - Edit the input sequence names in the fasta file. - Edit the fasta files for excluding the third codon positions. - Submit many slurm jobs in parallel on a remote server, that can run BEAST v.2.6.3 in parallel for multiple xml files at once. - Edit many xml files created in 'BEAuti' in order to be able to perform stepping stone (or pathsampling) analysis in order to obtain marginal likelihood values for each analysis of each xml file. - Collect the marginal likelihood values generated from each individual analysis. - Start multiple slurm jobs in 'treeanotator' in parallel to get consensus tress. - Collect all consensus trees. The R code included here was used to generate Supplementary Figure 3 "Rcode_for_making_Supplementary Figure_3_2021mar.R" based on "marginal_lkhoods_tmp08.txt". The code can be run in R v.4.0.2. It is used to plot a diagram that shows a summary of the tree age estimates and HPD 95 % intervals of CDV, PDV and CDV/PDV sequences of BEAST analyses using different setups. The tree ages are obtained from the consensus trees generated through the BEAST analysis of each xml file. Please note that all these pieces of code have been setup to run on a specific remote server. They will need to be adjusted in order to be able to run elsewhere. Dataset North Atlantic DataCite Metadata Store (German National Library of Science and Technology) |