Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal

Background Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balanc...

Full description

Bibliographic Details
Published in:BMC Genomics
Main Authors: Vendrami, David L. J., Forcada, Jaume, Hoffman, Joseph I.
Format: Article in Journal/Newspaper
Language:English
Published: Springer Nature 2019
Subjects:
Online Access:http://nora.nerc.ac.uk/id/eprint/522103/
https://nora.nerc.ac.uk/id/eprint/522103/1/Vendrami.pdf
https://doi.org/10.1186/s12864-019-5440-8
id ftnerc:oai:nora.nerc.ac.uk:522103
record_format openpolar
spelling ftnerc:oai:nora.nerc.ac.uk:522103 2023-05-15T13:41:42+02:00 Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal Vendrami, David L. J. Forcada, Jaume Hoffman, Joseph I. 2019-01 text http://nora.nerc.ac.uk/id/eprint/522103/ https://nora.nerc.ac.uk/id/eprint/522103/1/Vendrami.pdf https://doi.org/10.1186/s12864-019-5440-8 en eng Springer Nature https://nora.nerc.ac.uk/id/eprint/522103/1/Vendrami.pdf Vendrami, David L. J.; Forcada, Jaume orcid:0000-0002-2115-0150 Hoffman, Joseph I. 2019 Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal. BMC Genomics, 20 (1), 72. https://doi.org/10.1186/s12864-019-5440-8 <https://doi.org/10.1186/s12864-019-5440-8> cc_by_4 CC-BY Publication - Article PeerReviewed 2019 ftnerc https://doi.org/10.1186/s12864-019-5440-8 2023-02-04T19:47:42Z Background Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies. Results PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the ... Article in Journal/Newspaper Antarc* Antarctic Antarctic Fur Seal Arctocephalus gazella Natural Environment Research Council: NERC Open Research Archive Antarctic The Antarctic BMC Genomics 20 1
institution Open Polar
collection Natural Environment Research Council: NERC Open Research Archive
op_collection_id ftnerc
language English
description Background Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies. Results PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the ...
format Article in Journal/Newspaper
author Vendrami, David L. J.
Forcada, Jaume
Hoffman, Joseph I.
spellingShingle Vendrami, David L. J.
Forcada, Jaume
Hoffman, Joseph I.
Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
author_facet Vendrami, David L. J.
Forcada, Jaume
Hoffman, Joseph I.
author_sort Vendrami, David L. J.
title Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_short Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_full Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_fullStr Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_full_unstemmed Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_sort experimental validation of in silico predicted rad locus frequencies using genomic resources and short read data from a model marine mammal
publisher Springer Nature
publishDate 2019
url http://nora.nerc.ac.uk/id/eprint/522103/
https://nora.nerc.ac.uk/id/eprint/522103/1/Vendrami.pdf
https://doi.org/10.1186/s12864-019-5440-8
geographic Antarctic
The Antarctic
geographic_facet Antarctic
The Antarctic
genre Antarc*
Antarctic
Antarctic Fur Seal
Arctocephalus gazella
genre_facet Antarc*
Antarctic
Antarctic Fur Seal
Arctocephalus gazella
op_relation https://nora.nerc.ac.uk/id/eprint/522103/1/Vendrami.pdf
Vendrami, David L. J.; Forcada, Jaume orcid:0000-0002-2115-0150
Hoffman, Joseph I. 2019 Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal. BMC Genomics, 20 (1), 72. https://doi.org/10.1186/s12864-019-5440-8 <https://doi.org/10.1186/s12864-019-5440-8>
op_rights cc_by_4
op_rightsnorm CC-BY
op_doi https://doi.org/10.1186/s12864-019-5440-8
container_title BMC Genomics
container_volume 20
container_issue 1
_version_ 1766154306202894336