SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates

Motivation: Single nucleotide polymorphism (SNP) detection exploiting redundancy in expressed sequence tag (EST) collections that arises from the presence of transcripts of the same gene from different individuals has been used to generate large collections of SNPs for many species. A second source...

Full description

Bibliographic Details
Published in:Bioinformatics
Main Authors: Hayes, Ben J., Nilsen, Kjetil, Berg, Paul R., Grindflek, Eli, Lien, Sigbjørn
Format: Text
Language:English
Published: Oxford University Press 2007
Subjects:
Online Access:http://bioinformatics.oxfordjournals.org/cgi/content/short/23/13/1692
https://doi.org/10.1093/bioinformatics/btm154
id fthighwire:oai:open-archive.highwire.org:bioinfo:23/13/1692
record_format openpolar
spelling fthighwire:oai:open-archive.highwire.org:bioinfo:23/13/1692 2023-05-15T15:32:28+02:00 SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates Hayes, Ben J. Nilsen, Kjetil Berg, Paul R. Grindflek, Eli Lien, Sigbjørn 2007-07-01 00:00:00.0 text/html http://bioinformatics.oxfordjournals.org/cgi/content/short/23/13/1692 https://doi.org/10.1093/bioinformatics/btm154 en eng Oxford University Press http://bioinformatics.oxfordjournals.org/cgi/content/short/23/13/1692 http://dx.doi.org/10.1093/bioinformatics/btm154 Copyright (C) 2007, Oxford University Press SEQUENCE ANALYSIS TEXT 2007 fthighwire https://doi.org/10.1093/bioinformatics/btm154 2013-05-26T21:59:19Z Motivation: Single nucleotide polymorphism (SNP) detection exploiting redundancy in expressed sequence tag (EST) collections that arises from the presence of transcripts of the same gene from different individuals has been used to generate large collections of SNPs for many species. A second source of redundancy, namely that EST collections can contain multiple transcripts of the same gene from the same individual, can be exploited to distinguish true SNPs from sequencing error. In this article, we demonstrate with Atlantic salmon and pig EST collections that splitting the EST collection in two, detecting SNPs in both subsets, then accepting only cross-validated SNPs increases validation rates. Results: In the pig data set, 676 cross-validated putative SNPs were detected in a collection of 160 689 ESTs. When validating a subset of these by genotyping on MassARRAY 85.1% of SNPs were polymorphic in successful assays. In the salmon data set, 856 cross-validated putative SNPs were detected in a collection of 243 674 ESTs. Validation by genotyping showed that 81.0% of the cross-validated putative SNPs were polymorphic in successful assays. Availability: Cross-validated SNPs are available at dbSNP ( http://www.ncbi.nlm.nih.gov/projects/SNP/ ), ss69371838-ss69372575 for the salmon SNPs and ss69372587-ss69373226 for the pig SNPs. Contact: ben.hayes@dpi.vic.gov.au Text Atlantic salmon HighWire Press (Stanford University) Bioinformatics 23 13 1692 1693
institution Open Polar
collection HighWire Press (Stanford University)
op_collection_id fthighwire
language English
topic SEQUENCE ANALYSIS
spellingShingle SEQUENCE ANALYSIS
Hayes, Ben J.
Nilsen, Kjetil
Berg, Paul R.
Grindflek, Eli
Lien, Sigbjørn
SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
topic_facet SEQUENCE ANALYSIS
description Motivation: Single nucleotide polymorphism (SNP) detection exploiting redundancy in expressed sequence tag (EST) collections that arises from the presence of transcripts of the same gene from different individuals has been used to generate large collections of SNPs for many species. A second source of redundancy, namely that EST collections can contain multiple transcripts of the same gene from the same individual, can be exploited to distinguish true SNPs from sequencing error. In this article, we demonstrate with Atlantic salmon and pig EST collections that splitting the EST collection in two, detecting SNPs in both subsets, then accepting only cross-validated SNPs increases validation rates. Results: In the pig data set, 676 cross-validated putative SNPs were detected in a collection of 160 689 ESTs. When validating a subset of these by genotyping on MassARRAY 85.1% of SNPs were polymorphic in successful assays. In the salmon data set, 856 cross-validated putative SNPs were detected in a collection of 243 674 ESTs. Validation by genotyping showed that 81.0% of the cross-validated putative SNPs were polymorphic in successful assays. Availability: Cross-validated SNPs are available at dbSNP ( http://www.ncbi.nlm.nih.gov/projects/SNP/ ), ss69371838-ss69372575 for the salmon SNPs and ss69372587-ss69373226 for the pig SNPs. Contact: ben.hayes@dpi.vic.gov.au
format Text
author Hayes, Ben J.
Nilsen, Kjetil
Berg, Paul R.
Grindflek, Eli
Lien, Sigbjørn
author_facet Hayes, Ben J.
Nilsen, Kjetil
Berg, Paul R.
Grindflek, Eli
Lien, Sigbjørn
author_sort Hayes, Ben J.
title SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
title_short SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
title_full SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
title_fullStr SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
title_full_unstemmed SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
title_sort snp detection exploiting multiple sources of redundancy in large est collections improves validation rates
publisher Oxford University Press
publishDate 2007
url http://bioinformatics.oxfordjournals.org/cgi/content/short/23/13/1692
https://doi.org/10.1093/bioinformatics/btm154
genre Atlantic salmon
genre_facet Atlantic salmon
op_relation http://bioinformatics.oxfordjournals.org/cgi/content/short/23/13/1692
http://dx.doi.org/10.1093/bioinformatics/btm154
op_rights Copyright (C) 2007, Oxford University Press
op_doi https://doi.org/10.1093/bioinformatics/btm154
container_title Bioinformatics
container_volume 23
container_issue 13
container_start_page 1692
op_container_end_page 1693
_version_ 1766362958868250624