Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them

Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population,...

Full description

Bibliographic Details
Main Authors: Humble, E., Martinez-Barrio, A., Forcada, J., Trathan, P.N., Thorne, M.A.S., Hoffmann, M., Wolf, J. B W., Hoffman, J.I.
Language:unknown
Published: 2016
Subjects:
Online Access:http://nbn-resolving.org/urn:nbn:nl:ui:13-ci-3b5i
https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:93184
id ftdans:oai:easy.dans.knaw.nl:easy-dataset:93184
record_format openpolar
spelling ftdans:oai:easy.dans.knaw.nl:easy-dataset:93184 2023-07-02T03:29:46+02:00 Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them Humble, E. Martinez-Barrio, A. Forcada, J. Trathan, P.N. Thorne, M.A.S. Hoffmann, M. Wolf, J. B W. Hoffman, J.I. 2016-05-19T22:51:53.000+02:00 http://nbn-resolving.org/urn:nbn:nl:ui:13-ci-3b5i https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:93184 unknown doi:10.5061/dryad.8kn8c.2/1.2 doi:10.5061/dryad.8kn8c.2/2.2 doi:10.5061/dryad.8kn8c.2/3.2 doi:10.1111/1755-0998.12502 PMID:26683564 http://nbn-resolving.org/urn:nbn:nl:ui:13-ci-3b5i doi:10.5061/dryad.8kn8c.2 https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:93184 OPEN_ACCESS: The data are archived in Easy, they are accessible elsewhere through the DOI https://dans.knaw.nl/en/about/organisation-and-policy/legal-information/DANSLicence.pdf Life sciences medicine and health care 2016 ftdans https://doi.org/10.5061/dryad.8kn8c.2/1.210.5061/dryad.8kn8c.2/2.210.5061/dryad.8kn8c.2/3.210.1111/1755-0998.1250210.5061/dryad.8kn8c.2 2023-06-13T13:21:40Z Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population, are often very low. A number of potential causes of assay failure have been identified, but none have been explored systematically. In particular, as SNPs are often developed from transcriptomes, parameters relating to the genomic context are rarely taken into account. Here, we assembled a draft Antarctic fur seal (Arctocephalus gazella) genome (assembly size: 2.41Gb; scaffold/contig N50: 3.1Mb/27.5kb). We then used this resource to map the probe sequences of 144 putative SNPs genotyped in 480 individuals. The number of probe-to-genome mappings and alignment length together explained almost a third of the variation in validation success, indicating that sequence uniqueness and proximity to intron-exon boundaries play an important role. The same pattern was found after mapping the probe sequences to the Walrus and Weddell seal genomes, suggesting that the genomes of species divergent by as much as 23 million years can hold information relevant to SNP validation outcomes. Additionally, re-analysis of genotyping data from seven previous studies found the same two variables to be significantly associated with SNP validation success across a variety of taxa. Finally, our study reveals considerable scope for validation rates to be improved, either by simply filtering for SNPs whose flanking sequences align uniquely and completely to a reference genome, or through predictive modeling. Other/Unknown Material Antarc* Antarctic Antarctic Fur Seal Arctocephalus gazella Weddell Seal walrus* Data Archiving and Networked Services (DANS): EASY (KNAW - Koninklijke Nederlandse Akademie van Wetenschappen) Antarctic Weddell
institution Open Polar
collection Data Archiving and Networked Services (DANS): EASY (KNAW - Koninklijke Nederlandse Akademie van Wetenschappen)
op_collection_id ftdans
language unknown
topic Life sciences
medicine and health care
spellingShingle Life sciences
medicine and health care
Humble, E.
Martinez-Barrio, A.
Forcada, J.
Trathan, P.N.
Thorne, M.A.S.
Hoffmann, M.
Wolf, J. B W.
Hoffman, J.I.
Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
topic_facet Life sciences
medicine and health care
description Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population, are often very low. A number of potential causes of assay failure have been identified, but none have been explored systematically. In particular, as SNPs are often developed from transcriptomes, parameters relating to the genomic context are rarely taken into account. Here, we assembled a draft Antarctic fur seal (Arctocephalus gazella) genome (assembly size: 2.41Gb; scaffold/contig N50: 3.1Mb/27.5kb). We then used this resource to map the probe sequences of 144 putative SNPs genotyped in 480 individuals. The number of probe-to-genome mappings and alignment length together explained almost a third of the variation in validation success, indicating that sequence uniqueness and proximity to intron-exon boundaries play an important role. The same pattern was found after mapping the probe sequences to the Walrus and Weddell seal genomes, suggesting that the genomes of species divergent by as much as 23 million years can hold information relevant to SNP validation outcomes. Additionally, re-analysis of genotyping data from seven previous studies found the same two variables to be significantly associated with SNP validation success across a variety of taxa. Finally, our study reveals considerable scope for validation rates to be improved, either by simply filtering for SNPs whose flanking sequences align uniquely and completely to a reference genome, or through predictive modeling.
author Humble, E.
Martinez-Barrio, A.
Forcada, J.
Trathan, P.N.
Thorne, M.A.S.
Hoffmann, M.
Wolf, J. B W.
Hoffman, J.I.
author_facet Humble, E.
Martinez-Barrio, A.
Forcada, J.
Trathan, P.N.
Thorne, M.A.S.
Hoffmann, M.
Wolf, J. B W.
Hoffman, J.I.
author_sort Humble, E.
title Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
title_short Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
title_full Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
title_fullStr Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
title_full_unstemmed Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
title_sort data from: a draft fur seal genome provides insights into factors affecting snp validation and how to mitigate them
publishDate 2016
url http://nbn-resolving.org/urn:nbn:nl:ui:13-ci-3b5i
https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:93184
geographic Antarctic
Weddell
geographic_facet Antarctic
Weddell
genre Antarc*
Antarctic
Antarctic Fur Seal
Arctocephalus gazella
Weddell Seal
walrus*
genre_facet Antarc*
Antarctic
Antarctic Fur Seal
Arctocephalus gazella
Weddell Seal
walrus*
op_relation doi:10.5061/dryad.8kn8c.2/1.2
doi:10.5061/dryad.8kn8c.2/2.2
doi:10.5061/dryad.8kn8c.2/3.2
doi:10.1111/1755-0998.12502
PMID:26683564
http://nbn-resolving.org/urn:nbn:nl:ui:13-ci-3b5i
doi:10.5061/dryad.8kn8c.2
https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:93184
op_rights OPEN_ACCESS: The data are archived in Easy, they are accessible elsewhere through the DOI
https://dans.knaw.nl/en/about/organisation-and-policy/legal-information/DANSLicence.pdf
op_doi https://doi.org/10.5061/dryad.8kn8c.2/1.210.5061/dryad.8kn8c.2/2.210.5061/dryad.8kn8c.2/3.210.1111/1755-0998.1250210.5061/dryad.8kn8c.2
_version_ 1770272549257609216