Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them

Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population,...

Full description

Bibliographic Details
Main Authors: Humble, E., Martinez-Barrio, A., Forcada, J., Trathan, P.N., Thorne, M.A.S., Hoffmann, M., Wolf, J. B W., Hoffman, J.I., Hoffman, J. I., Trathan, P. N., Thorne, M. A. S., Wolf, J. B. W.
Format: Dataset
Language:English
Published: Dryad 2015
Subjects:
Online Access:https://doi.org/10.5061/dryad.8kn8c
https://doi.org/10.5061/dryad.8kn8c.1
https://doi.org/10.5061/dryad.8kn8c.2
id fttriple:oai:gotriple.eu:50|dedup_wf_001::7b88ae98bf04269958cafda5b6b1f70c
record_format openpolar
spelling fttriple:oai:gotriple.eu:50|dedup_wf_001::7b88ae98bf04269958cafda5b6b1f70c 2023-05-15T14:03:58+02:00 Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them Humble, E. Martinez-Barrio, A. Forcada, J. Trathan, P.N. Thorne, M.A.S. Hoffmann, M. Wolf, J. B W. Hoffman, J.I. Hoffman, J. I. Trathan, P. N. Thorne, M. A. S. Wolf, J. B. W. 2015-12-17 https://doi.org/10.5061/dryad.8kn8c https://doi.org/10.5061/dryad.8kn8c.1 https://doi.org/10.5061/dryad.8kn8c.2 en eng Dryad http://dx.doi.org/10.5061/dryad.8kn8c https://dx.doi.org/10.5061/dryad.8kn8c http://dx.doi.org/10.5061/dryad.8kn8c.1 https://dx.doi.org/10.5061/dryad.8kn8c.1 https://dx.doi.org/10.5061/dryad.8kn8c.2 http://dx.doi.org/10.5061/dryad.8kn8c.2 lic_creative-commons 10.5061/dryad.8kn8c 10.5061/dryad.8kn8c.1 10.5061/dryad.8kn8c.2 oai:easy.dans.knaw.nl:easy-dataset:118702 oai:easy.dans.knaw.nl:easy-dataset:93184 oai:services.nod.dans.knaw.nl:Products/dans:oai:easy.dans.knaw.nl:easy-dataset:118702 oai:services.nod.dans.knaw.nl:Products/dans:oai:easy.dans.knaw.nl:easy-dataset:93184 10|openaire____::9e3be59865b2c1c335d32dae2fe7b254 10|re3data_____::94816e6421eeb072e7742ce6a9decc5f re3data_____::r3d100000044 10|re3data_____::84e123776089ce3c7a33db98d9cd15a8 10|eurocrisdris::fe4903425d9040f680d8610d9079ea14 10|openaire____::081b82f96300b6a6e3d282bad31cb6e2 10|opendoar____::8b6dd7db9af49e67306feb59a8bdc52c Antarctic fur seal high density SNP array cross-validation Arctocephalus gazella SNP array Draft genome single nucleotide polymorphism (SNP) Life sciences medicine and health care envir info Dataset https://vocabularies.coar-repositories.org/resource_types/c_ddb1/ 2015 fttriple https://doi.org/10.5061/dryad.8kn8c https://doi.org/10.5061/dryad.8kn8c.1 https://doi.org/10.5061/dryad.8kn8c.2 2023-01-22T17:16:08Z Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population, are often very low. A number of potential causes of assay failure have been identified, but none have been explored systematically. In particular, as SNPs are often developed from transcriptomes, parameters relating to the genomic context are rarely taken into account. Here, we assembled a draft Antarctic fur seal (Arctocephalus gazella) genome (assembly size: 2.41Gb; scaffold/contig N50: 3.1Mb/27.5kb). We then used this resource to map the probe sequences of 144 putative SNPs genotyped in 480 individuals. The number of probe-to-genome mappings and alignment length together explained almost a third of the variation in validation success, indicating that sequence uniqueness and proximity to intron-exon boundaries play an important role. The same pattern was found after mapping the probe sequences to the Walrus and Weddell seal genomes, suggesting that the genomes of species divergent by as much as 23 million years can hold information relevant to SNP validation outcomes. Additionally, re-analysis of genotyping data from seven previous studies found the same two variables to be significantly associated with SNP validation success across a variety of taxa. Finally, our study reveals considerable scope for validation rates to be improved, either by simply filtering for SNPs whose flanking sequences align uniquely and completely to a reference genome, or through predictive modeling. submission.assembly.ArcGaz001_AP3.fastaDraft fur seal genome v1.0Seal_assay_SNPs.csvList of pre-validated fur seal SNPs plus variables used for modeling SNP validation success.crossvalidation.RR script to perform the k-fold cross-validation. Dataset Antarc* Antarctic Antarctic Fur Seal Arctocephalus gazella Weddell Seal walrus* Unknown Antarctic Weddell
institution Open Polar
collection Unknown
op_collection_id fttriple
language English
topic Antarctic fur seal
high density SNP array
cross-validation
Arctocephalus gazella
SNP array
Draft genome
single nucleotide polymorphism (SNP)
Life sciences
medicine and health care
envir
info
spellingShingle Antarctic fur seal
high density SNP array
cross-validation
Arctocephalus gazella
SNP array
Draft genome
single nucleotide polymorphism (SNP)
Life sciences
medicine and health care
envir
info
Humble, E.
Martinez-Barrio, A.
Forcada, J.
Trathan, P.N.
Thorne, M.A.S.
Hoffmann, M.
Wolf, J. B W.
Hoffman, J.I.
Hoffman, J. I.
Trathan, P. N.
Thorne, M. A. S.
Wolf, J. B. W.
Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
topic_facet Antarctic fur seal
high density SNP array
cross-validation
Arctocephalus gazella
SNP array
Draft genome
single nucleotide polymorphism (SNP)
Life sciences
medicine and health care
envir
info
description Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population, are often very low. A number of potential causes of assay failure have been identified, but none have been explored systematically. In particular, as SNPs are often developed from transcriptomes, parameters relating to the genomic context are rarely taken into account. Here, we assembled a draft Antarctic fur seal (Arctocephalus gazella) genome (assembly size: 2.41Gb; scaffold/contig N50: 3.1Mb/27.5kb). We then used this resource to map the probe sequences of 144 putative SNPs genotyped in 480 individuals. The number of probe-to-genome mappings and alignment length together explained almost a third of the variation in validation success, indicating that sequence uniqueness and proximity to intron-exon boundaries play an important role. The same pattern was found after mapping the probe sequences to the Walrus and Weddell seal genomes, suggesting that the genomes of species divergent by as much as 23 million years can hold information relevant to SNP validation outcomes. Additionally, re-analysis of genotyping data from seven previous studies found the same two variables to be significantly associated with SNP validation success across a variety of taxa. Finally, our study reveals considerable scope for validation rates to be improved, either by simply filtering for SNPs whose flanking sequences align uniquely and completely to a reference genome, or through predictive modeling. submission.assembly.ArcGaz001_AP3.fastaDraft fur seal genome v1.0Seal_assay_SNPs.csvList of pre-validated fur seal SNPs plus variables used for modeling SNP validation success.crossvalidation.RR script to perform the k-fold cross-validation.
format Dataset
author Humble, E.
Martinez-Barrio, A.
Forcada, J.
Trathan, P.N.
Thorne, M.A.S.
Hoffmann, M.
Wolf, J. B W.
Hoffman, J.I.
Hoffman, J. I.
Trathan, P. N.
Thorne, M. A. S.
Wolf, J. B. W.
author_facet Humble, E.
Martinez-Barrio, A.
Forcada, J.
Trathan, P.N.
Thorne, M.A.S.
Hoffmann, M.
Wolf, J. B W.
Hoffman, J.I.
Hoffman, J. I.
Trathan, P. N.
Thorne, M. A. S.
Wolf, J. B. W.
author_sort Humble, E.
title Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
title_short Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
title_full Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
title_fullStr Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
title_full_unstemmed Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
title_sort data from: a draft fur seal genome provides insights into factors affecting snp validation and how to mitigate them
publisher Dryad
publishDate 2015
url https://doi.org/10.5061/dryad.8kn8c
https://doi.org/10.5061/dryad.8kn8c.1
https://doi.org/10.5061/dryad.8kn8c.2
geographic Antarctic
Weddell
geographic_facet Antarctic
Weddell
genre Antarc*
Antarctic
Antarctic Fur Seal
Arctocephalus gazella
Weddell Seal
walrus*
genre_facet Antarc*
Antarctic
Antarctic Fur Seal
Arctocephalus gazella
Weddell Seal
walrus*
op_source 10.5061/dryad.8kn8c
10.5061/dryad.8kn8c.1
10.5061/dryad.8kn8c.2
oai:easy.dans.knaw.nl:easy-dataset:118702
oai:easy.dans.knaw.nl:easy-dataset:93184
oai:services.nod.dans.knaw.nl:Products/dans:oai:easy.dans.knaw.nl:easy-dataset:118702
oai:services.nod.dans.knaw.nl:Products/dans:oai:easy.dans.knaw.nl:easy-dataset:93184
10|openaire____::9e3be59865b2c1c335d32dae2fe7b254
10|re3data_____::94816e6421eeb072e7742ce6a9decc5f
re3data_____::r3d100000044
10|re3data_____::84e123776089ce3c7a33db98d9cd15a8
10|eurocrisdris::fe4903425d9040f680d8610d9079ea14
10|openaire____::081b82f96300b6a6e3d282bad31cb6e2
10|opendoar____::8b6dd7db9af49e67306feb59a8bdc52c
op_relation http://dx.doi.org/10.5061/dryad.8kn8c
https://dx.doi.org/10.5061/dryad.8kn8c
http://dx.doi.org/10.5061/dryad.8kn8c.1
https://dx.doi.org/10.5061/dryad.8kn8c.1
https://dx.doi.org/10.5061/dryad.8kn8c.2
http://dx.doi.org/10.5061/dryad.8kn8c.2
op_rights lic_creative-commons
op_doi https://doi.org/10.5061/dryad.8kn8c
https://doi.org/10.5061/dryad.8kn8c.1
https://doi.org/10.5061/dryad.8kn8c.2
_version_ 1766274892267782144