Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
Abstract Background Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the be...
Main Authors: | , , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
Figshare
2019
|
Subjects: | |
Online Access: | https://dx.doi.org/10.6084/m9.figshare.c.4374809.v1 https://springernature.figshare.com/collections/Experimental_validation_of_in_silico_predicted_RAD_locus_frequencies_using_genomic_resources_and_short_read_data_from_a_model_marine_mammal/4374809/1 |
id |
ftdatacite:10.6084/m9.figshare.c.4374809.v1 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.6084/m9.figshare.c.4374809.v1 2023-05-15T13:46:18+02:00 Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal Vendrami, David Forcada, Jaume Hoffman, Joseph 2019 https://dx.doi.org/10.6084/m9.figshare.c.4374809.v1 https://springernature.figshare.com/collections/Experimental_validation_of_in_silico_predicted_RAD_locus_frequencies_using_genomic_resources_and_short_read_data_from_a_model_marine_mammal/4374809/1 unknown Figshare https://dx.doi.org/10.1186/s12864-019-5440-8 https://dx.doi.org/10.6084/m9.figshare.c.4374809 CC BY 4.0 https://creativecommons.org/licenses/by/4.0 CC-BY Genetics FOS Biological sciences Molecular Biology 59999 Environmental Sciences not elsewhere classified FOS Earth and related environmental sciences Ecology 69999 Biological Sciences not elsewhere classified 80699 Information Systems not elsewhere classified FOS Computer and information sciences Cancer Inorganic Chemistry FOS Chemical sciences 110309 Infectious Diseases FOS Health sciences 60506 Virology Collection article 2019 ftdatacite https://doi.org/10.6084/m9.figshare.c.4374809.v1 https://doi.org/10.1186/s12864-019-5440-8 https://doi.org/10.6084/m9.figshare.c.4374809 2021-11-05T12:55:41Z Abstract Background Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies. Results PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the expectation based on in silico digestion. Conclusions Our study reveals generally high concordance between PredRAD predictions and empirical estimates of the number of RAD loci. This further supports the utility of PredRAD, while also suggesting that it may be feasible to sequence and assemble the majority of RAD loci present in an organismâ s genome. Article in Journal/Newspaper Antarc* Antarctic Antarctic Fur Seal Arctocephalus gazella DataCite Metadata Store (German National Library of Science and Technology) Antarctic The Antarctic |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
Genetics FOS Biological sciences Molecular Biology 59999 Environmental Sciences not elsewhere classified FOS Earth and related environmental sciences Ecology 69999 Biological Sciences not elsewhere classified 80699 Information Systems not elsewhere classified FOS Computer and information sciences Cancer Inorganic Chemistry FOS Chemical sciences 110309 Infectious Diseases FOS Health sciences 60506 Virology |
spellingShingle |
Genetics FOS Biological sciences Molecular Biology 59999 Environmental Sciences not elsewhere classified FOS Earth and related environmental sciences Ecology 69999 Biological Sciences not elsewhere classified 80699 Information Systems not elsewhere classified FOS Computer and information sciences Cancer Inorganic Chemistry FOS Chemical sciences 110309 Infectious Diseases FOS Health sciences 60506 Virology Vendrami, David Forcada, Jaume Hoffman, Joseph Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal |
topic_facet |
Genetics FOS Biological sciences Molecular Biology 59999 Environmental Sciences not elsewhere classified FOS Earth and related environmental sciences Ecology 69999 Biological Sciences not elsewhere classified 80699 Information Systems not elsewhere classified FOS Computer and information sciences Cancer Inorganic Chemistry FOS Chemical sciences 110309 Infectious Diseases FOS Health sciences 60506 Virology |
description |
Abstract Background Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies. Results PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the expectation based on in silico digestion. Conclusions Our study reveals generally high concordance between PredRAD predictions and empirical estimates of the number of RAD loci. This further supports the utility of PredRAD, while also suggesting that it may be feasible to sequence and assemble the majority of RAD loci present in an organismâ s genome. |
format |
Article in Journal/Newspaper |
author |
Vendrami, David Forcada, Jaume Hoffman, Joseph |
author_facet |
Vendrami, David Forcada, Jaume Hoffman, Joseph |
author_sort |
Vendrami, David |
title |
Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal |
title_short |
Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal |
title_full |
Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal |
title_fullStr |
Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal |
title_full_unstemmed |
Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal |
title_sort |
experimental validation of in silico predicted rad locus frequencies using genomic resources and short read data from a model marine mammal |
publisher |
Figshare |
publishDate |
2019 |
url |
https://dx.doi.org/10.6084/m9.figshare.c.4374809.v1 https://springernature.figshare.com/collections/Experimental_validation_of_in_silico_predicted_RAD_locus_frequencies_using_genomic_resources_and_short_read_data_from_a_model_marine_mammal/4374809/1 |
geographic |
Antarctic The Antarctic |
geographic_facet |
Antarctic The Antarctic |
genre |
Antarc* Antarctic Antarctic Fur Seal Arctocephalus gazella |
genre_facet |
Antarc* Antarctic Antarctic Fur Seal Arctocephalus gazella |
op_relation |
https://dx.doi.org/10.1186/s12864-019-5440-8 https://dx.doi.org/10.6084/m9.figshare.c.4374809 |
op_rights |
CC BY 4.0 https://creativecommons.org/licenses/by/4.0 |
op_rightsnorm |
CC-BY |
op_doi |
https://doi.org/10.6084/m9.figshare.c.4374809.v1 https://doi.org/10.1186/s12864-019-5440-8 https://doi.org/10.6084/m9.figshare.c.4374809 |
_version_ |
1766239916804538368 |