Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data

Canine distemper virus (CDV) and phocine distemper virus (PDV) are major pathogens to terrestrial and marine mammals. Yet little is known about the timing and geographical origin of distemper viruses and to what extent it was influenced by environmental change and human activities. To address this,...

Full description

Bibliographic Details
Main Authors: Stokholm, Iben, Puryear, Wendy, Sawatzki, Kaitlin, Wilhelm Knudsen, Steen, Terkelsen, Thilde, Becher, Paul, Siebert, Ursula, Tange Olsen, Morten
Format: Dataset
Language:English
Published: Dryad 2021
Subjects:
Online Access:https://dx.doi.org/10.5061/dryad.fxpnvx0sq
http://datadryad.org/stash/dataset/doi:10.5061/dryad.fxpnvx0sq
id ftdatacite:10.5061/dryad.fxpnvx0sq
record_format openpolar
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language English
description Canine distemper virus (CDV) and phocine distemper virus (PDV) are major pathogens to terrestrial and marine mammals. Yet little is known about the timing and geographical origin of distemper viruses and to what extent it was influenced by environmental change and human activities. To address this, we i) performed the first comprehensive time-calibrated phylogenetic analysis of the two distemper viruses; ii) mapped distemper antibody and virus detection data from marine mammals collected between 1972-2018; iii) and compiled historical reports on distemper dating back to the 18th century. We find that CDV and PDV diverged in the early 17th century. Modern CDV strains last shared a common ancestor in the 19th century with a marked radiation during the 1930s-50s. Modern PDV strains are of more recent origin, diverging in the 1970s-80s. Based on the compiled information on distemper distribution, the diverse host range of CDV and basal phylogenetic placement of terrestrial morbilliviruses, we hypothesize a terrestrial CDV-like ancestor giving rise to PDV in the North Atlantic. Moreover, given the estimated timing of distemper origin and radiation, we hypothesize a prominent role of environmental change such as the Little Ice Age, and human activities like globalisation and war in distemper virus evolution. : The data sets were created by compiling recently published H gene PDV sequences and CDV sequence data obtained from NCBI. The first data (Alignment 1) set consists of 446 near-complete H gene sequences (25 PDV + 421 CDV sequences; 1,668 bp; position 7,199-8,866 in NC_001921) comprising the majority of distemper H gene sequences available in GenBank at the time of analysis (July 2020). The second data set (Alignment 2) represents the full sequences used in the Bayesian phylogenetic analyses. The alignment consists of 125 full-length H gene sequences (25 PDV and 100 CDV; 1,812 bp; position 7,079-8,890 in NC_001921), representing major PDV and CDV clades in terrestrial and marine mammals detected between 1982-2018. Both data sets were imported and edited in Geneious version 9.1.8, the alignments were generated using MUSCLE and sequences were edited to the same reading frame and length excluding stop codons. Sequences obtained through the studies Stokholm et al, 2019 and Puryear et al, in review, have not been made publicly available yet but sequences have been submitted to genbank and will be released under the accession numbers OK104948-91 and MW581015-26. The third data set (Alignment 3) was used for the final Bayesian phylogenetic analyses. It consists of 125 Hemagglutinin gene sequences (25 PDV sequences and 100 CDV sequences) without the 3rd codon positions (1,208 bp). The 3rd codon positions were removed due to the detection of substitution saturation. The bash codes included here were used for the following tasks: - Edit the input sequence names in the fasta file. - Edit the fasta files for excluding the third codon positions. - Submit many slurm jobs in parallel on a remote server, that can run BEAST v.2.6.3 in parallel for multiple xml files at once. - Edit many xml files created in 'BEAuti' in order to be able to perform stepping stone (or pathsampling) analysis in order to obtain marginal likelihood values for each analysis of each xml file. - Collect the marginal likelihood values generated from each individual analysis. - Start multiple slurm jobs in 'treeanotator' in parallel to get consensus tress. - Collect all consensus trees. The R code included here was used to generate Supplementary Figure 3 "Rcode_for_making_Supplementary Figure_3_2021mar.R" based on "marginal_lkhoods_tmp08.txt". The code can be run in R v.4.0.2. It is used to plot a diagram that shows a summary of the tree age estimates and HPD 95 % intervals of CDV, PDV and CDV/PDV sequences of BEAST analyses using different setups. The tree ages are obtained from the consensus trees generated through the BEAST analysis of each xml file. Please note that all these pieces of code have been setup to run on a specific remote server. They will need to be adjusted in order to be able to run elsewhere.
format Dataset
author Stokholm, Iben
Puryear, Wendy
Sawatzki, Kaitlin
Wilhelm Knudsen, Steen
Terkelsen, Thilde
Becher, Paul
Siebert, Ursula
Tange Olsen, Morten
spellingShingle Stokholm, Iben
Puryear, Wendy
Sawatzki, Kaitlin
Wilhelm Knudsen, Steen
Terkelsen, Thilde
Becher, Paul
Siebert, Ursula
Tange Olsen, Morten
Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data
author_facet Stokholm, Iben
Puryear, Wendy
Sawatzki, Kaitlin
Wilhelm Knudsen, Steen
Terkelsen, Thilde
Becher, Paul
Siebert, Ursula
Tange Olsen, Morten
author_sort Stokholm, Iben
title Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data
title_short Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data
title_full Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data
title_fullStr Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data
title_full_unstemmed Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data
title_sort emergence and radiation of distemper viruses in terrestrial and marine mammals - input files, bash and r codes for analysing pdv and cdv sequence data
publisher Dryad
publishDate 2021
url https://dx.doi.org/10.5061/dryad.fxpnvx0sq
http://datadryad.org/stash/dataset/doi:10.5061/dryad.fxpnvx0sq
genre North Atlantic
genre_facet North Atlantic
op_relation https://dx.doi.org/10.5281/zenodo.5539996
https://dx.doi.org/10.5281/zenodo.5539998
op_rights Creative Commons Zero v1.0 Universal
https://creativecommons.org/publicdomain/zero/1.0/legalcode
cc0-1.0
op_rightsnorm CC0
op_doi https://doi.org/10.5061/dryad.fxpnvx0sq
https://doi.org/10.5281/zenodo.5539996
https://doi.org/10.5281/zenodo.5539998
_version_ 1766137465010126848
spelling ftdatacite:10.5061/dryad.fxpnvx0sq 2023-05-15T17:37:30+02:00 Emergence and radiation of distemper viruses in terrestrial and marine mammals - Input files, bash and R codes for analysing PDV and CDV sequence data Stokholm, Iben Puryear, Wendy Sawatzki, Kaitlin Wilhelm Knudsen, Steen Terkelsen, Thilde Becher, Paul Siebert, Ursula Tange Olsen, Morten 2021 https://dx.doi.org/10.5061/dryad.fxpnvx0sq http://datadryad.org/stash/dataset/doi:10.5061/dryad.fxpnvx0sq en eng Dryad https://dx.doi.org/10.5281/zenodo.5539996 https://dx.doi.org/10.5281/zenodo.5539998 Creative Commons Zero v1.0 Universal https://creativecommons.org/publicdomain/zero/1.0/legalcode cc0-1.0 CC0 dataset Dataset 2021 ftdatacite https://doi.org/10.5061/dryad.fxpnvx0sq https://doi.org/10.5281/zenodo.5539996 https://doi.org/10.5281/zenodo.5539998 2022-02-08T13:02:41Z Canine distemper virus (CDV) and phocine distemper virus (PDV) are major pathogens to terrestrial and marine mammals. Yet little is known about the timing and geographical origin of distemper viruses and to what extent it was influenced by environmental change and human activities. To address this, we i) performed the first comprehensive time-calibrated phylogenetic analysis of the two distemper viruses; ii) mapped distemper antibody and virus detection data from marine mammals collected between 1972-2018; iii) and compiled historical reports on distemper dating back to the 18th century. We find that CDV and PDV diverged in the early 17th century. Modern CDV strains last shared a common ancestor in the 19th century with a marked radiation during the 1930s-50s. Modern PDV strains are of more recent origin, diverging in the 1970s-80s. Based on the compiled information on distemper distribution, the diverse host range of CDV and basal phylogenetic placement of terrestrial morbilliviruses, we hypothesize a terrestrial CDV-like ancestor giving rise to PDV in the North Atlantic. Moreover, given the estimated timing of distemper origin and radiation, we hypothesize a prominent role of environmental change such as the Little Ice Age, and human activities like globalisation and war in distemper virus evolution. : The data sets were created by compiling recently published H gene PDV sequences and CDV sequence data obtained from NCBI. The first data (Alignment 1) set consists of 446 near-complete H gene sequences (25 PDV + 421 CDV sequences; 1,668 bp; position 7,199-8,866 in NC_001921) comprising the majority of distemper H gene sequences available in GenBank at the time of analysis (July 2020). The second data set (Alignment 2) represents the full sequences used in the Bayesian phylogenetic analyses. The alignment consists of 125 full-length H gene sequences (25 PDV and 100 CDV; 1,812 bp; position 7,079-8,890 in NC_001921), representing major PDV and CDV clades in terrestrial and marine mammals detected between 1982-2018. Both data sets were imported and edited in Geneious version 9.1.8, the alignments were generated using MUSCLE and sequences were edited to the same reading frame and length excluding stop codons. Sequences obtained through the studies Stokholm et al, 2019 and Puryear et al, in review, have not been made publicly available yet but sequences have been submitted to genbank and will be released under the accession numbers OK104948-91 and MW581015-26. The third data set (Alignment 3) was used for the final Bayesian phylogenetic analyses. It consists of 125 Hemagglutinin gene sequences (25 PDV sequences and 100 CDV sequences) without the 3rd codon positions (1,208 bp). The 3rd codon positions were removed due to the detection of substitution saturation. The bash codes included here were used for the following tasks: - Edit the input sequence names in the fasta file. - Edit the fasta files for excluding the third codon positions. - Submit many slurm jobs in parallel on a remote server, that can run BEAST v.2.6.3 in parallel for multiple xml files at once. - Edit many xml files created in 'BEAuti' in order to be able to perform stepping stone (or pathsampling) analysis in order to obtain marginal likelihood values for each analysis of each xml file. - Collect the marginal likelihood values generated from each individual analysis. - Start multiple slurm jobs in 'treeanotator' in parallel to get consensus tress. - Collect all consensus trees. The R code included here was used to generate Supplementary Figure 3 "Rcode_for_making_Supplementary Figure_3_2021mar.R" based on "marginal_lkhoods_tmp08.txt". The code can be run in R v.4.0.2. It is used to plot a diagram that shows a summary of the tree age estimates and HPD 95 % intervals of CDV, PDV and CDV/PDV sequences of BEAST analyses using different setups. The tree ages are obtained from the consensus trees generated through the BEAST analysis of each xml file. Please note that all these pieces of code have been setup to run on a specific remote server. They will need to be adjusted in order to be able to run elsewhere. Dataset North Atlantic DataCite Metadata Store (German National Library of Science and Technology)