Genomic variation in European Sea bass: from SNP discovery within ESTs to genome scan.

European sea bass (Dicentrarchus labrax) is an economically important marine species in European aquaculture. Althou gh sea bass population structure is well known, aquaculture does not benefit fr om selection programs, sea bass production being nearly completely based on wild-caught fishes reproduc...

Full description

Bibliographic Details
Main Author: Souche, Erika
Other Authors: Volckaert, Filip; U0006527
Format: Doctoral or Postdoctoral Thesis
Language:English
Published: 2009
Subjects:
Online Access:https://lirias.kuleuven.be/handle/1979/2645
https://lirias.kuleuven.be/bitstream/1979/2645/2//PhD_final_7.pdf
Description
Summary:European sea bass (Dicentrarchus labrax) is an economically important marine species in European aquaculture. Althou gh sea bass population structure is well known, aquaculture does not benefit fr om selection programs, sea bass production being nearly completely based on wild-caught fishes reproducing in semi-controlled conditions. More knowl edge on the sea bass genome would help the breeding progress in this species, th e study of natural populations and their evolution as well as the management of fisheries. The generation of large collections of Expr essed Sequence Tags (ESTs) would provide genomic resources for discovering new genes and new markers, identifying intron-exon boundaries and studying genes expression profiles. In this thesis, efforts were concentrated on the di scovery of Single Nucleotide Polymorphisms (SNPs) within ESTs. SNPs are the most abundant source of variation in most eukaryotic and prokaryotic genomes and can have applications both in aquaculture and natural popula tions. Approximately 30,000 ESTs from 14 libraries of five sea bass individuals have been sequenced. This large EST collection was described and compared to a similar set of ESTs generated for gilthead sea bream (S parus aurata), another economically important marine species. The processin g of ESTs led to the generation of 17,716 and 18,198 sea bass and sea bream u nique sequences, of which less than a third were common to both species. Autom atic annotation indicated that more protein coding sequences were generated f or sea bass than for sea bream. This was further confirmed by the prediction of Open Reading Frames (ORFs) and by the GC content of sea bass and sea bream un ique sequences. Gene Ontology (GO) annotation showed that the same categories were represented for both species. Six SNP discovery tools were used on sea bass ESTs and their performance was assessed by validating around 10% of the SNP candidates. This analysis demonstrated that the selection of redundant candidate SNPs (mismatches detected at least twice in the ESTs) was a good mean of improving SNP di scovery performance. The selection of SNP candidates with a minimum allele frequency greater or equal to 0.3 further enhanced SNP discovery performance although reducing the number of SNP candidates. Finally the selection of SNP candidates detect ed by several tools and the exclusion of indels were also good means of reduci ng the number of false positive candidate SNPs. Transition SNP candidates appea red to be less reliable than transversion SNP candidates due to the presence of RNA editing sites in EST collections. High quality of EST assembly and of th e flanking regions of SNP candidates revealed to be essential for an efficient SNP discovery. These conclusions led to the development of a pipeline integrating the six tes ted SNP discovery tools. This efficient and easy to use pipeline allows the detection of SNPs in any EST dataset , the selection of SNP candidates according to redundancy and/or minimum allel e frequency and the comparison of SNP candidates according to SNP discovery tool. It has been used successfully on EST collections of the fishes Dicentra rchus labrax, Sparus aurata, Anguilla anguilla and the waterflea Daphnia magna. The use of the six SNP discovery tools identified 1,072 unique SNP candidates of which a subset was validated. A total of 360 SNPs were discovered in introns and ESTs, proving that resequencing the conti gs predicted to be polymorphic was an efficient way of discovering SNPs. The nucleotide diversity of sea bass was estimated to one SNP every 137 bp and was high er in introns than in ESTs. The Mendelian inheritance was checked on 17 SNPs polymorphic on the Venezia Fbis family used to produce sea bass linkage maps. Four of them did not follow Mendelian inheritance, suggesting the presence of null alleles. Finally, 22 wild sea bass populations were successfully genotyped at 49 SNPs. This set of SNPs sufficed to confirm the established sea bass popu lation structure, namely the differentiation of Atlantic and Mediterranean samp les. Adriatic samples were shown to be genetically distinct from Western and Eastern Mediterranean samples. Selection analyses pointed to a locus that could be under natural selection in the Atlantic Ocean. In conclusion, a bioinformatic approach to discover SNPs was proven to be very valuable. Meanwhile SNP genotyping technologi es have evolved, allowing the validation of SNP candidates on the samples to be investigated. Introduction Chapter 1: Transcriptome characterisation of Expressed Sequence Tags of European sea bass and gilthead sea bream Chapter 2A: Mining for Single Nucleotide Polymorphisms in Expressed Sequence Tags of European sea bass Chapter 2B: In silico discovery of SNPs in EST data: evaluation of tools and strategies Chapter 3: Integration of SNP discovery tools in a single pipeline for automated EST mining Chapter 4: Discovery and validation of SNPs in ESTs of European sea bass Chapter 5: Assessment of population structure and detection of selection by Single Nucleotide Polymorphisms (SNPs) in European sea bass General Discussion Scientific Summary Wettenschappelijke Samenvatting Popular Summary Populaire Samenvatting status: published