Data from: Phylogenomics from whole genome sequences using aTRAM

AbstractNovel sequencing technologies are rapidly expanding the size of data sets that can be applied to phylogenetic studies. Currently the most commonly used phylogenomic approaches involve some form of genome reduction. While these approaches make assembling phylogenomic data sets more economical...

Full description

Bibliographic Details
Main Authors: Allen, Julie M., Boyd, Bret, Nguyen, Nam-Phuong, Vachaspati, Pranjal, Warnow, Tandy, Huang, Daisie I., Grady, Patrick G. S., Bell, Kayce C., Cronk, Quentin C.B., Mugisha, Lawrence, Pittendrigh, Barry R., Soledad Leonardi, M., Reed, David L., Johnson, Kevin P.
Format: Dataset
Language:unknown
Published: 2021
Subjects:
Online Access:https://search.dataone.org/view/sha256:2fe44e9234c157c5e857571cb5f44746251402aeceb90ed1f6ee4bfa8cf2f891
id dataone:sha256:2fe44e9234c157c5e857571cb5f44746251402aeceb90ed1f6ee4bfa8cf2f891
record_format openpolar
spelling dataone:sha256:2fe44e9234c157c5e857571cb5f44746251402aeceb90ed1f6ee4bfa8cf2f891 2024-10-03T18:45:36+00:00 Data from: Phylogenomics from whole genome sequences using aTRAM Allen, Julie M. Boyd, Bret Nguyen, Nam-Phuong Vachaspati, Pranjal Warnow, Tandy Huang, Daisie I. Grady, Patrick G. S. Bell, Kayce C. Cronk, Quentin C.B. Mugisha, Lawrence Pittendrigh, Barry R. Soledad Leonardi, M. Reed, David L. Johnson, Kevin P. 2021-05-19T00:00:00Z https://search.dataone.org/view/sha256:2fe44e9234c157c5e857571cb5f44746251402aeceb90ed1f6ee4bfa8cf2f891 unknown Hoplopleura arboricola Pthirus pubis Holocene Haematopinus eurysternus Proechinopthirus fluctus Bothriometopus macrocnemus Pthirus gorillae Pediculus schaeffi Genome sequencing Stimulopalpus japonicus Pediculus humanus Pedicinus badii Osborniella crotophagae Linognathus spicatus gene assembly present day Other Pedicinus badii aTRAM Haematopinus eurysternus Bureelia antiqua Neohaematopinus pacificus Degeeriella rufa Antarctopthirus microchir Dataset 2021 dataone:urn:node:BOREALIS 2024-10-03T18:17:47Z AbstractNovel sequencing technologies are rapidly expanding the size of data sets that can be applied to phylogenetic studies. Currently the most commonly used phylogenomic approaches involve some form of genome reduction. While these approaches make assembling phylogenomic data sets more economical for organisms with large genomes, they reduce the genomic coverage and thereby the long-term utility of the data. Currently, for organisms with moderate to small genomes (<1000 Mbp) it is feasible to sequence the entire genome at modest coverage (10−30×). Computational challenges for handling these large data sets can be alleviated by assembling targeted reads, rather than assembling the entire genome, to produce a phylogenomic data matrix. Here we demonstrate the use of automated Target Restricted Assembly Method (aTRAM) to assemble 1107 single-copy ortholog genes from whole genome sequencing of sucking lice (Anoplura) and out-groups. We developed a pipeline to extract exon sequences from the aTRAM assemblies by annotating them with respect to the original target protein. We aligned these protein sequences with the inferred amino acids and then performed phylogenetic analyses on both the concatenated matrix of genes and on each gene separately in a coalescent analysis. Finally, we tested the limits of successful assembly in aTRAM by assembling 100 genes from close- to distantly related taxa at high to low levels of coverage., Usage notesConcatenated alignment and treeAlignment and phylogenetic tree of the concatenated 1,101 exon DNA alignment from 15 louse taxa. Genes were assembled from raw genomic DNA with aTRAM and exons extracted and stitched together. Third codon position was removed due to base composition bias, and tree build in RAxML.Dataset_1.zipIndividual Gene Trees and AlignmentsAll 1,101 gene trees and alignments for the 15 taxon dataset. Each gene was aligned using PASTA and UPP for fragmentary sequences. Each gene tree was built using ASTRAL.Dataset_2.zipSupplementaryTableDNA extraction, and quality clean up for each dataset. Illumina reads. Alignments of each gene and the tree analysis.Supplementary FigureBox plot of the standard deviations away from mean for each codon position for each of the GTR rate parameters. The majorities of the extreme outliers fell above 10 standard deviations from the mean and were removed from the analysis.SupplementalFigure1.pdf Dataset Antarc* Unknown
institution Open Polar
collection Unknown
op_collection_id dataone:urn:node:BOREALIS
language unknown
topic Hoplopleura arboricola
Pthirus pubis
Holocene
Haematopinus eurysternus
Proechinopthirus fluctus
Bothriometopus macrocnemus
Pthirus gorillae
Pediculus schaeffi
Genome sequencing
Stimulopalpus japonicus
Pediculus humanus
Pedicinus badii
Osborniella crotophagae
Linognathus spicatus
gene assembly
present day
Other
Pedicinus badii
aTRAM
Haematopinus eurysternus
Bureelia antiqua
Neohaematopinus pacificus
Degeeriella rufa
Antarctopthirus microchir
spellingShingle Hoplopleura arboricola
Pthirus pubis
Holocene
Haematopinus eurysternus
Proechinopthirus fluctus
Bothriometopus macrocnemus
Pthirus gorillae
Pediculus schaeffi
Genome sequencing
Stimulopalpus japonicus
Pediculus humanus
Pedicinus badii
Osborniella crotophagae
Linognathus spicatus
gene assembly
present day
Other
Pedicinus badii
aTRAM
Haematopinus eurysternus
Bureelia antiqua
Neohaematopinus pacificus
Degeeriella rufa
Antarctopthirus microchir
Allen, Julie M.
Boyd, Bret
Nguyen, Nam-Phuong
Vachaspati, Pranjal
Warnow, Tandy
Huang, Daisie I.
Grady, Patrick G. S.
Bell, Kayce C.
Cronk, Quentin C.B.
Mugisha, Lawrence
Pittendrigh, Barry R.
Soledad Leonardi, M.
Reed, David L.
Johnson, Kevin P.
Data from: Phylogenomics from whole genome sequences using aTRAM
topic_facet Hoplopleura arboricola
Pthirus pubis
Holocene
Haematopinus eurysternus
Proechinopthirus fluctus
Bothriometopus macrocnemus
Pthirus gorillae
Pediculus schaeffi
Genome sequencing
Stimulopalpus japonicus
Pediculus humanus
Pedicinus badii
Osborniella crotophagae
Linognathus spicatus
gene assembly
present day
Other
Pedicinus badii
aTRAM
Haematopinus eurysternus
Bureelia antiqua
Neohaematopinus pacificus
Degeeriella rufa
Antarctopthirus microchir
description AbstractNovel sequencing technologies are rapidly expanding the size of data sets that can be applied to phylogenetic studies. Currently the most commonly used phylogenomic approaches involve some form of genome reduction. While these approaches make assembling phylogenomic data sets more economical for organisms with large genomes, they reduce the genomic coverage and thereby the long-term utility of the data. Currently, for organisms with moderate to small genomes (<1000 Mbp) it is feasible to sequence the entire genome at modest coverage (10−30×). Computational challenges for handling these large data sets can be alleviated by assembling targeted reads, rather than assembling the entire genome, to produce a phylogenomic data matrix. Here we demonstrate the use of automated Target Restricted Assembly Method (aTRAM) to assemble 1107 single-copy ortholog genes from whole genome sequencing of sucking lice (Anoplura) and out-groups. We developed a pipeline to extract exon sequences from the aTRAM assemblies by annotating them with respect to the original target protein. We aligned these protein sequences with the inferred amino acids and then performed phylogenetic analyses on both the concatenated matrix of genes and on each gene separately in a coalescent analysis. Finally, we tested the limits of successful assembly in aTRAM by assembling 100 genes from close- to distantly related taxa at high to low levels of coverage., Usage notesConcatenated alignment and treeAlignment and phylogenetic tree of the concatenated 1,101 exon DNA alignment from 15 louse taxa. Genes were assembled from raw genomic DNA with aTRAM and exons extracted and stitched together. Third codon position was removed due to base composition bias, and tree build in RAxML.Dataset_1.zipIndividual Gene Trees and AlignmentsAll 1,101 gene trees and alignments for the 15 taxon dataset. Each gene was aligned using PASTA and UPP for fragmentary sequences. Each gene tree was built using ASTRAL.Dataset_2.zipSupplementaryTableDNA extraction, and quality clean up for each dataset. Illumina reads. Alignments of each gene and the tree analysis.Supplementary FigureBox plot of the standard deviations away from mean for each codon position for each of the GTR rate parameters. The majorities of the extreme outliers fell above 10 standard deviations from the mean and were removed from the analysis.SupplementalFigure1.pdf
format Dataset
author Allen, Julie M.
Boyd, Bret
Nguyen, Nam-Phuong
Vachaspati, Pranjal
Warnow, Tandy
Huang, Daisie I.
Grady, Patrick G. S.
Bell, Kayce C.
Cronk, Quentin C.B.
Mugisha, Lawrence
Pittendrigh, Barry R.
Soledad Leonardi, M.
Reed, David L.
Johnson, Kevin P.
author_facet Allen, Julie M.
Boyd, Bret
Nguyen, Nam-Phuong
Vachaspati, Pranjal
Warnow, Tandy
Huang, Daisie I.
Grady, Patrick G. S.
Bell, Kayce C.
Cronk, Quentin C.B.
Mugisha, Lawrence
Pittendrigh, Barry R.
Soledad Leonardi, M.
Reed, David L.
Johnson, Kevin P.
author_sort Allen, Julie M.
title Data from: Phylogenomics from whole genome sequences using aTRAM
title_short Data from: Phylogenomics from whole genome sequences using aTRAM
title_full Data from: Phylogenomics from whole genome sequences using aTRAM
title_fullStr Data from: Phylogenomics from whole genome sequences using aTRAM
title_full_unstemmed Data from: Phylogenomics from whole genome sequences using aTRAM
title_sort data from: phylogenomics from whole genome sequences using atram
publishDate 2021
url https://search.dataone.org/view/sha256:2fe44e9234c157c5e857571cb5f44746251402aeceb90ed1f6ee4bfa8cf2f891
genre Antarc*
genre_facet Antarc*
_version_ 1811924202985357312