Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success
Abstract Background Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of ‘putative’ SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology....
Main Authors: | , , , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
figshare
2016
|
Subjects: | |
Online Access: | https://dx.doi.org/10.6084/m9.figshare.c.3620873.v1 https://springernature.figshare.com/collections/Transcriptomic_SNP_discovery_for_custom_genotyping_arrays_impacts_of_sequence_data_SNP_calling_method_and_genotyping_technology_on_the_probability_of_validation_success/3620873/1 |
id |
ftdatacite:10.6084/m9.figshare.c.3620873.v1 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.6084/m9.figshare.c.3620873.v1 2023-05-15T13:58:34+02:00 Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success Humble, Emily Thorne, Michael Forcada, Jaume Hoffman, Joseph 2016 https://dx.doi.org/10.6084/m9.figshare.c.3620873.v1 https://springernature.figshare.com/collections/Transcriptomic_SNP_discovery_for_custom_genotyping_arrays_impacts_of_sequence_data_SNP_calling_method_and_genotyping_technology_on_the_probability_of_validation_success/3620873/1 unknown figshare https://dx.doi.org/10.1186/s13104-016-2209-x https://dx.doi.org/10.6084/m9.figshare.c.3620873 CC BY 4.0 https://creativecommons.org/licenses/by/4.0 CC-BY Genetics FOS Biological sciences 59999 Environmental Sciences not elsewhere classified FOS Earth and related environmental sciences 69999 Biological Sciences not elsewhere classified 19999 Mathematical Sciences not elsewhere classified FOS Mathematics Collection article 2016 ftdatacite https://doi.org/10.6084/m9.figshare.c.3620873.v1 https://doi.org/10.1186/s13104-016-2209-x https://doi.org/10.6084/m9.figshare.c.3620873 2021-11-05T12:55:41Z Abstract Background Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of ‘putative’ SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Results Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Conclusions Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms. Article in Journal/Newspaper Antarc* Antarctic Antarctic Fur Seal Arctocephalus gazella DataCite Metadata Store (German National Library of Science and Technology) Antarctic The Antarctic |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
Genetics FOS Biological sciences 59999 Environmental Sciences not elsewhere classified FOS Earth and related environmental sciences 69999 Biological Sciences not elsewhere classified 19999 Mathematical Sciences not elsewhere classified FOS Mathematics |
spellingShingle |
Genetics FOS Biological sciences 59999 Environmental Sciences not elsewhere classified FOS Earth and related environmental sciences 69999 Biological Sciences not elsewhere classified 19999 Mathematical Sciences not elsewhere classified FOS Mathematics Humble, Emily Thorne, Michael Forcada, Jaume Hoffman, Joseph Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success |
topic_facet |
Genetics FOS Biological sciences 59999 Environmental Sciences not elsewhere classified FOS Earth and related environmental sciences 69999 Biological Sciences not elsewhere classified 19999 Mathematical Sciences not elsewhere classified FOS Mathematics |
description |
Abstract Background Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of ‘putative’ SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Results Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Conclusions Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms. |
format |
Article in Journal/Newspaper |
author |
Humble, Emily Thorne, Michael Forcada, Jaume Hoffman, Joseph |
author_facet |
Humble, Emily Thorne, Michael Forcada, Jaume Hoffman, Joseph |
author_sort |
Humble, Emily |
title |
Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success |
title_short |
Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success |
title_full |
Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success |
title_fullStr |
Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success |
title_full_unstemmed |
Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success |
title_sort |
transcriptomic snp discovery for custom genotyping arrays: impacts of sequence data, snp calling method and genotyping technology on the probability of validation success |
publisher |
figshare |
publishDate |
2016 |
url |
https://dx.doi.org/10.6084/m9.figshare.c.3620873.v1 https://springernature.figshare.com/collections/Transcriptomic_SNP_discovery_for_custom_genotyping_arrays_impacts_of_sequence_data_SNP_calling_method_and_genotyping_technology_on_the_probability_of_validation_success/3620873/1 |
geographic |
Antarctic The Antarctic |
geographic_facet |
Antarctic The Antarctic |
genre |
Antarc* Antarctic Antarctic Fur Seal Arctocephalus gazella |
genre_facet |
Antarc* Antarctic Antarctic Fur Seal Arctocephalus gazella |
op_relation |
https://dx.doi.org/10.1186/s13104-016-2209-x https://dx.doi.org/10.6084/m9.figshare.c.3620873 |
op_rights |
CC BY 4.0 https://creativecommons.org/licenses/by/4.0 |
op_rightsnorm |
CC-BY |
op_doi |
https://doi.org/10.6084/m9.figshare.c.3620873.v1 https://doi.org/10.1186/s13104-016-2209-x https://doi.org/10.6084/m9.figshare.c.3620873 |
_version_ |
1766266917232836608 |