Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal

Background: Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balan...

Full description

Bibliographic Details
Published in:BMC Genomics
Main Authors: Vendrami, David L. J., Forcada, Jaume, Hoffman, Joseph I.
Format: Article in Journal/Newspaper
Language:unknown
Published: Zenodo 2019
Subjects:
Online Access:https://doi.org/10.1186/s12864-019-5440-8
id ftzenodo:oai:zenodo.org:2550922
record_format openpolar
spelling ftzenodo:oai:zenodo.org:2550922 2024-09-15T17:48:26+00:00 Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal Vendrami, David L. J. Forcada, Jaume Hoffman, Joseph I. 2019-01-22 https://doi.org/10.1186/s12864-019-5440-8 unknown Zenodo https://zenodo.org/communities/fp7-bmc https://zenodo.org/communities/eu https://doi.org/10.1186/s12864-019-5440-8 oai:zenodo.org:2550922 info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode BMC Genomics, 20(1), 72, (2019-01-22) Restriction site associated DNA sequencing (RADseq) Restriction enzyme Reference genome Transcriptome assembly PredRAD Antarctic fur seal Pinniped info:eu-repo/semantics/article 2019 ftzenodo https://doi.org/10.1186/s12864-019-5440-8 2024-07-27T06:49:02Z Background: Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal ( Arctocephalus gazella ) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies. Results: PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the ... Article in Journal/Newspaper Antarc* Antarctic Antarctic Fur Seal Arctocephalus gazella Zenodo BMC Genomics 20 1
institution Open Polar
collection Zenodo
op_collection_id ftzenodo
language unknown
topic Restriction site associated DNA sequencing (RADseq)
Restriction enzyme
Reference genome
Transcriptome assembly
PredRAD
Antarctic fur seal
Pinniped
spellingShingle Restriction site associated DNA sequencing (RADseq)
Restriction enzyme
Reference genome
Transcriptome assembly
PredRAD
Antarctic fur seal
Pinniped
Vendrami, David L. J.
Forcada, Jaume
Hoffman, Joseph I.
Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
topic_facet Restriction site associated DNA sequencing (RADseq)
Restriction enzyme
Reference genome
Transcriptome assembly
PredRAD
Antarctic fur seal
Pinniped
description Background: Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal ( Arctocephalus gazella ) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies. Results: PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the ...
format Article in Journal/Newspaper
author Vendrami, David L. J.
Forcada, Jaume
Hoffman, Joseph I.
author_facet Vendrami, David L. J.
Forcada, Jaume
Hoffman, Joseph I.
author_sort Vendrami, David L. J.
title Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_short Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_full Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_fullStr Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_full_unstemmed Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_sort experimental validation of in silico predicted rad locus frequencies using genomic resources and short read data from a model marine mammal
publisher Zenodo
publishDate 2019
url https://doi.org/10.1186/s12864-019-5440-8
genre Antarc*
Antarctic
Antarctic Fur Seal
Arctocephalus gazella
genre_facet Antarc*
Antarctic
Antarctic Fur Seal
Arctocephalus gazella
op_source BMC Genomics, 20(1), 72, (2019-01-22)
op_relation https://zenodo.org/communities/fp7-bmc
https://zenodo.org/communities/eu
https://doi.org/10.1186/s12864-019-5440-8
oai:zenodo.org:2550922
op_rights info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
op_doi https://doi.org/10.1186/s12864-019-5440-8
container_title BMC Genomics
container_volume 20
container_issue 1
_version_ 1810289639028686848