Data from: Genotyping-by-sequencing for estimating relatedness in non-model organisms: avoiding the trap of precise bias

There has been remarkably little attention to using the high resolution provided by genotyping-by-sequencing (i.e. RADseq and similar methods) datasets for assessing relatedness in wildlife populations. A major hurdle is the genotyping error, especially allelic dropout, often found in this type of d...

Full description

Bibliographic Details
Main Authors: Attard, Catherine R.M., Beheregaray, Luciano B., Moller, Luciana M., Attard, Catherine R. M.
Format: Dataset
Language:unknown
Published: 2017
Subjects:
Online Access:https://zenodo.org/record/4996084
https://doi.org/10.5061/dryad.t8ph5
Description
Summary:There has been remarkably little attention to using the high resolution provided by genotyping-by-sequencing (i.e. RADseq and similar methods) datasets for assessing relatedness in wildlife populations. A major hurdle is the genotyping error, especially allelic dropout, often found in this type of dataset that could lead to downward-biased, yet precise, estimates of relatedness. Here we assess the applicability of genotyping-by-sequencing datasets for relatedness inferences given their relatively high genotyping error rates. Individuals of known relatedness were simulated under genotyping error, allelic dropout, and missing data scenarios based on an empirical ddRAD dataset, and their true relatedness was compared to that estimated by seven relatedness estimators. We found that an estimator chosen through such analyses can circumvent the influence of genotyping error, with the estimator of Ritland (1996) shown to be unaffected by allelic dropout and to be the most accurate when there is genotyping error. We also found that the choice of estimator should not rely solely on the strength of correlation between estimated and true relatedness as a strong correlation does not necessarily mean estimates are close to true relatedness. We also demonstrated how even a large SNP dataset with genotyping error (allelic dropout or otherwise) or missing data still performs better than a perfectly genotyped microsatellite dataset of tens of markers. The simulation-based approach used here can be easily implemented by others on their own genotyping-by-sequencing datasets to confirm the most appropriate and powerful estimator for their dataset. SNP genotypesGenotype data in COANCESTRY format for 8,294 SNPsCOANCESTRY_input.txtMicrosatellite genotypeGenotype data for one individual at 20 microsatellites. The microsatellite genotypes for the remaining individuals are available in a previous Dryad entry, doi:10.5061/dryad.8m0t6 . The format of the data in the current Dryad entry is the same as the previous entry, except in the ...