Data from: Exploring a Pool-seq only approach for gaining population genomic insights in non-model species

Developing genomic insights is challenging in non-model species for which resources are often scarce and prohibitively costly. Here, we explore the potential of a recently established approach using Pool-seq data to generate a de novo genome assembly for mining exons, upon which Pool-seq data is use...

Full description

Bibliographic Details
Main Authors: Kurland, Sara, Wheat, Chris, Celorio-Mancera, Maria de la Paz, Kutschera, Verena, Hill, Jason, Andersson, Anastasia, Rubin, Carl Johan, Andersson, Leif, Ryman, Nils, Laikre, Linda
Format: Dataset
Language:unknown
Published: 2021
Subjects:
Online Access:https://zenodo.org/record/3998270
https://doi.org/10.5061/dryad.q1h4k0n
Description
Summary:Developing genomic insights is challenging in non-model species for which resources are often scarce and prohibitively costly. Here, we explore the potential of a recently established approach using Pool-seq data to generate a de novo genome assembly for mining exons, upon which Pool-seq data is used to estimate population divergence and diversity. We do this for two pairs of sympatric populations of brown trout (Salmo trutta); one naturally sympatric set of populations and another pair of populations introduced to a common environment. We validate our approach by comparing the results to those from markers previously used to describe the populations (allozymes and individual based SNPs) and from mapping the Pool-seq data to a reference genome of the closely related Atlantic salmon (Salmo salar). We find that genomic differentiation (FST) between the two introduced populations exceeds that of the naturally sympatric populations (FST = 0.13 and 0.03 between the introduced and the naturally sympatric populations, respectively), in concordance with estimates from the previously used SNPs. The same level of population divergence is found for the two genome assemblies but estimates of average genic diversity differ (π ≈0.002 and π ≈0.001 when mapping to S. trutta and S. salar, respectively), although the relationships between population values are largely consistent. This discrepancy might be attributed to biases when mapping to a haploid condensed assembly made of highly fragmented read data compared to using a high-quality reference assembly from a divergent species. We conclude that the Pool-seq only approach can be suitable for detecting and quantifying genome wide population differentiation, and for comparing genomic diversity in populations of non-model species where reference genomes are lacking. Pool-seq assembly only gene models for Salmo trutta Pool-seq de novo assembly scaffolded using the protein sequences for Salmo trutta via the MESPA pipeline. onlygenemodels.fa.gz