SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates

Motivation: Single nucleotide polymorphism (SNP) detection exploiting redundancy in expressed sequence tag (EST) collections that arises from the presence of transcripts of the same gene from different individuals has been used to generate large collections of SNPs for many species. A second source...

Full description

Bibliographic Details
Published in:Bioinformatics
Main Authors: Hayes, Ben J., Nilsen, Kjetil, Berg, Paul R., Grindflek, Eli, Lien, Sigbjorn
Format: Conference Object
Language:English
Published: 2007
Subjects:
Online Access:https://espace.library.uq.edu.au/view/UQ:398770
id ftunivqespace:oai:espace.library.uq.edu.au:UQ:398770
record_format openpolar
spelling ftunivqespace:oai:espace.library.uq.edu.au:UQ:398770 2023-05-15T15:32:29+02:00 SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates Hayes, Ben J. Nilsen, Kjetil Berg, Paul R. Grindflek, Eli Lien, Sigbjorn 2007-07-01 https://espace.library.uq.edu.au/view/UQ:398770 eng eng doi:10.1093/bioinformatics/btm154 issn:1367-4803 orcid:0000-0002-5606-3970 1303 Specialist Studies in Education 1308 Clinical Biochemistry 1312 Molecular Biology 1703 Computational Theory and Mathematics 1706 Computer Science Applications 2605 Computational Mathematics 2613 Statistics and Probability Conference Paper 2007 ftunivqespace https://doi.org/10.1093/bioinformatics/btm154 2020-08-18T02:46:27Z Motivation: Single nucleotide polymorphism (SNP) detection exploiting redundancy in expressed sequence tag (EST) collections that arises from the presence of transcripts of the same gene from different individuals has been used to generate large collections of SNPs for many species. A second source of redundancy, namely that EST collections can contain multiple transcripts of the same gene from the same individual, can be exploited to distinguish true SNPs from sequencing error. In this article, we demonstrate with Atlantic salmon and pig EST collections that splitting the EST collection in two, detecting SNPs in both subsets, then accepting only cross-validated SNPs increases validation rates. Results: In the pig data set, 676 cross-validated putative SNPs were detected in a collection of 160 689 ESTs. When validating a subset of these by genotyping on MassARRAY 85.1% of SNPs were polymorphic in successful assays. In the salmon data set, 856 cross-validated putative SNPs were detected in a collection of 243 674 ESTs. Validation by genotyping showed that 81.0% of the cross-validated putative SNPs were polymorphic in successful assays. Conference Object Atlantic salmon The University of Queensland: UQ eSpace Bioinformatics 23 13 1692 1693
institution Open Polar
collection The University of Queensland: UQ eSpace
op_collection_id ftunivqespace
language English
topic 1303 Specialist Studies in Education
1308 Clinical Biochemistry
1312 Molecular Biology
1703 Computational Theory and Mathematics
1706 Computer Science Applications
2605 Computational Mathematics
2613 Statistics and Probability
spellingShingle 1303 Specialist Studies in Education
1308 Clinical Biochemistry
1312 Molecular Biology
1703 Computational Theory and Mathematics
1706 Computer Science Applications
2605 Computational Mathematics
2613 Statistics and Probability
Hayes, Ben J.
Nilsen, Kjetil
Berg, Paul R.
Grindflek, Eli
Lien, Sigbjorn
SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
topic_facet 1303 Specialist Studies in Education
1308 Clinical Biochemistry
1312 Molecular Biology
1703 Computational Theory and Mathematics
1706 Computer Science Applications
2605 Computational Mathematics
2613 Statistics and Probability
description Motivation: Single nucleotide polymorphism (SNP) detection exploiting redundancy in expressed sequence tag (EST) collections that arises from the presence of transcripts of the same gene from different individuals has been used to generate large collections of SNPs for many species. A second source of redundancy, namely that EST collections can contain multiple transcripts of the same gene from the same individual, can be exploited to distinguish true SNPs from sequencing error. In this article, we demonstrate with Atlantic salmon and pig EST collections that splitting the EST collection in two, detecting SNPs in both subsets, then accepting only cross-validated SNPs increases validation rates. Results: In the pig data set, 676 cross-validated putative SNPs were detected in a collection of 160 689 ESTs. When validating a subset of these by genotyping on MassARRAY 85.1% of SNPs were polymorphic in successful assays. In the salmon data set, 856 cross-validated putative SNPs were detected in a collection of 243 674 ESTs. Validation by genotyping showed that 81.0% of the cross-validated putative SNPs were polymorphic in successful assays.
format Conference Object
author Hayes, Ben J.
Nilsen, Kjetil
Berg, Paul R.
Grindflek, Eli
Lien, Sigbjorn
author_facet Hayes, Ben J.
Nilsen, Kjetil
Berg, Paul R.
Grindflek, Eli
Lien, Sigbjorn
author_sort Hayes, Ben J.
title SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
title_short SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
title_full SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
title_fullStr SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
title_full_unstemmed SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates
title_sort snp detection exploiting multiple sources of redundancy in large est collections improves validation rates
publishDate 2007
url https://espace.library.uq.edu.au/view/UQ:398770
genre Atlantic salmon
genre_facet Atlantic salmon
op_relation doi:10.1093/bioinformatics/btm154
issn:1367-4803
orcid:0000-0002-5606-3970
op_doi https://doi.org/10.1093/bioinformatics/btm154
container_title Bioinformatics
container_volume 23
container_issue 13
container_start_page 1692
op_container_end_page 1693
_version_ 1766362988761055232