Data from: Development of SNP genotyping arrays in two shellfish species ...

Use of SNPs has been favored due to their abundance in plant and animal genomes, accompanied by the falling cost and rising throughput capacity for detection and genotyping. Here, we present in vitro (obtained from targeted sequencing) and in silico discovery of SNPs, and the design of medium-throug...

Full description

Bibliographic Details
Main Authors: Lapègue, Sylvie, Harrang, Estelle, Heurtebise, Serge, Flahauw, Emilie, Donnadieu, Cécile, Gayral, Philippe, Ballenghien, Marion, Genestout, Lucie, Barbotte, Laetitia, Mahla, Rachid, Haffray, Pierrick, Klopp, Christophe
Format: Dataset
Language:English
Published: Dryad 2014
Subjects:
Online Access:https://dx.doi.org/10.5061/dryad.jr233
https://datadryad.org/stash/dataset/doi:10.5061/dryad.jr233
Description
Summary:Use of SNPs has been favored due to their abundance in plant and animal genomes, accompanied by the falling cost and rising throughput capacity for detection and genotyping. Here, we present in vitro (obtained from targeted sequencing) and in silico discovery of SNPs, and the design of medium-throughput genotyping arrays for two oyster species, the Pacific oyster, Crassostrea gigas, and European flat oyster, Ostrea edulis. Two sets of 384 SNP markers were designed for two Illumina GoldenGate arrays and genotyped on more than 1000 samples for each species. In each case, oyster samples were obtained from wild and selected populations and from three-generation families segregating for traits of interest in aquaculture. The rate of successfully genotyped polymorphic SNPs was about 60% for each species. Effects of SNP origin and quality on genotyping success (Illumina functionality score) were analyzed and compared with other model and non-model species. Furthermore, a simulation was made based on a subset of the ... : Alignments of C. gigas in silico sequencesFor the in silico SNPs, we investigated in 2009 the 6th assembly of the Crassostrea gigas EST database (http://public-contigbrowser.sigenae.org:9090/Crassostrea_gigas/index.html). The database contained results of the assembly of 55,851 public ESTs from dbEST and 417 Genbank mRNA sequences. The assembly, performed with TGICL (http://compbio.dfci.harvard.edu/tgi/software/; parameters -l 60 -p 96 -s 100000 -O '-p 75 -s 500'), produced an alignment file from which 1370 SNPs were extracted. We looked for SNPs that complied with the initial criteria: a minimum depth of seven sequences, with a minimum allele count of three, and the absence of any other SNP in the 60 bp segment flanking the analyzed SNP to the left or right. As these conditions appeared too stringent, and did not produce many SNPs, we relaxed the criteria to a minimum depth of five sequences with a minimum allele count of two, and allowed there to be a SNP within 120 bp of the SNP of interest, as long as ...