Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal

Abstract Background Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the be...

Full description

Bibliographic Details
Main Authors: Vendrami, David, Forcada, Jaume, Hoffman, Joseph
Format: Article in Journal/Newspaper
Language:unknown
Published: Figshare 2019
Subjects:
Online Access:https://dx.doi.org/10.6084/m9.figshare.c.4374809
https://springernature.figshare.com/collections/Experimental_validation_of_in_silico_predicted_RAD_locus_frequencies_using_genomic_resources_and_short_read_data_from_a_model_marine_mammal/4374809
id ftdatacite:10.6084/m9.figshare.c.4374809
record_format openpolar
spelling ftdatacite:10.6084/m9.figshare.c.4374809 2023-05-15T13:46:18+02:00 Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal Vendrami, David Forcada, Jaume Hoffman, Joseph 2019 https://dx.doi.org/10.6084/m9.figshare.c.4374809 https://springernature.figshare.com/collections/Experimental_validation_of_in_silico_predicted_RAD_locus_frequencies_using_genomic_resources_and_short_read_data_from_a_model_marine_mammal/4374809 unknown Figshare https://dx.doi.org/10.1186/s12864-019-5440-8 CC BY 4.0 https://creativecommons.org/licenses/by/4.0 CC-BY Genetics FOS Biological sciences Molecular Biology 59999 Environmental Sciences not elsewhere classified FOS Earth and related environmental sciences Ecology 69999 Biological Sciences not elsewhere classified 80699 Information Systems not elsewhere classified FOS Computer and information sciences Cancer Inorganic Chemistry FOS Chemical sciences 110309 Infectious Diseases FOS Health sciences 60506 Virology Collection article 2019 ftdatacite https://doi.org/10.6084/m9.figshare.c.4374809 https://doi.org/10.1186/s12864-019-5440-8 2021-11-05T12:55:41Z Abstract Background Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies. Results PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the expectation based on in silico digestion. Conclusions Our study reveals generally high concordance between PredRAD predictions and empirical estimates of the number of RAD loci. This further supports the utility of PredRAD, while also suggesting that it may be feasible to sequence and assemble the majority of RAD loci present in an organismâ s genome. Article in Journal/Newspaper Antarc* Antarctic Antarctic Fur Seal Arctocephalus gazella DataCite Metadata Store (German National Library of Science and Technology) Antarctic The Antarctic
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Genetics
FOS Biological sciences
Molecular Biology
59999 Environmental Sciences not elsewhere classified
FOS Earth and related environmental sciences
Ecology
69999 Biological Sciences not elsewhere classified
80699 Information Systems not elsewhere classified
FOS Computer and information sciences
Cancer
Inorganic Chemistry
FOS Chemical sciences
110309 Infectious Diseases
FOS Health sciences
60506 Virology
spellingShingle Genetics
FOS Biological sciences
Molecular Biology
59999 Environmental Sciences not elsewhere classified
FOS Earth and related environmental sciences
Ecology
69999 Biological Sciences not elsewhere classified
80699 Information Systems not elsewhere classified
FOS Computer and information sciences
Cancer
Inorganic Chemistry
FOS Chemical sciences
110309 Infectious Diseases
FOS Health sciences
60506 Virology
Vendrami, David
Forcada, Jaume
Hoffman, Joseph
Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
topic_facet Genetics
FOS Biological sciences
Molecular Biology
59999 Environmental Sciences not elsewhere classified
FOS Earth and related environmental sciences
Ecology
69999 Biological Sciences not elsewhere classified
80699 Information Systems not elsewhere classified
FOS Computer and information sciences
Cancer
Inorganic Chemistry
FOS Chemical sciences
110309 Infectious Diseases
FOS Health sciences
60506 Virology
description Abstract Background Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies. Results PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the expectation based on in silico digestion. Conclusions Our study reveals generally high concordance between PredRAD predictions and empirical estimates of the number of RAD loci. This further supports the utility of PredRAD, while also suggesting that it may be feasible to sequence and assemble the majority of RAD loci present in an organismâ s genome.
format Article in Journal/Newspaper
author Vendrami, David
Forcada, Jaume
Hoffman, Joseph
author_facet Vendrami, David
Forcada, Jaume
Hoffman, Joseph
author_sort Vendrami, David
title Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_short Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_full Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_fullStr Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_full_unstemmed Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_sort experimental validation of in silico predicted rad locus frequencies using genomic resources and short read data from a model marine mammal
publisher Figshare
publishDate 2019
url https://dx.doi.org/10.6084/m9.figshare.c.4374809
https://springernature.figshare.com/collections/Experimental_validation_of_in_silico_predicted_RAD_locus_frequencies_using_genomic_resources_and_short_read_data_from_a_model_marine_mammal/4374809
geographic Antarctic
The Antarctic
geographic_facet Antarctic
The Antarctic
genre Antarc*
Antarctic
Antarctic Fur Seal
Arctocephalus gazella
genre_facet Antarc*
Antarctic
Antarctic Fur Seal
Arctocephalus gazella
op_relation https://dx.doi.org/10.1186/s12864-019-5440-8
op_rights CC BY 4.0
https://creativecommons.org/licenses/by/4.0
op_rightsnorm CC-BY
op_doi https://doi.org/10.6084/m9.figshare.c.4374809
https://doi.org/10.1186/s12864-019-5440-8
_version_ 1766239914775543808