Data from: Subsampling reveals that unbalanced sampling affects STRUCTURE results in a multi-species dataset

Studying the genetic population structure of species can reveal important insights into several key evolutionary, historical, demographic, and anthropogenic processes. One of the most important statistical tools for inferring genetic clusters is the program STRUCTURE. Recently, several papers have p...

Full description

Bibliographic Details
Main Author: Meirmans, Patrick
Format: Dataset
Language:unknown
Published: DRYAD 2018
Subjects:
geo
Online Access:https://doi.org/10.5061/DRYAD.NH4366S
id fttriple:oai:gotriple.eu:50|dedup_wf_001::0425846ed4e5bcc99347776dddb9d23c
record_format openpolar
spelling fttriple:oai:gotriple.eu:50|dedup_wf_001::0425846ed4e5bcc99347776dddb9d23c 2023-05-15T16:02:45+02:00 Data from: Subsampling reveals that unbalanced sampling affects STRUCTURE results in a multi-species dataset Meirmans, Patrick 2018-01-01 https://doi.org/10.5061/DRYAD.NH4366S undefined unknown DRYAD https://dx.doi.org/10.5061/DRYAD.NH4366S http://dx.doi.org/10.5061/dryad.nh4366s lic_creative-commons 10.5061/DRYAD.NH4366S oai:services.nod.dans.knaw.nl:Products/dans:oai:easy.dans.knaw.nl:easy-dataset:109075 oai:easy.dans.knaw.nl:easy-dataset:109075 10|openaire____::9e3be59865b2c1c335d32dae2fe7b254 re3data_____::r3d100000044 10|eurocrisdris::fe4903425d9040f680d8610d9079ea14 10|re3data_____::94816e6421eeb072e7742ce6a9decc5f 10|re3data_____::84e123776089ce3c7a33db98d9cd15a8 Bayesian clustering population structure genetic differentiation phylogeography High-alpine plants IntraBioDiv Alps Carpathians Europe Anthropocene Arabis alpina Carex sempervirens Dryas octopetala Geum montanum Geum reptans Hedysarum hedysaroides Hypochaeris uniflora Juncus trifidus Loiseleuria procumbens Luzula alpinopilosa Saxifraga stellaris Sesleria caerulea Life sciences medicine and health care geo envir Dataset https://vocabularies.coar-repositories.org/resource_types/c_ddb1/ 2018 fttriple https://doi.org/10.5061/DRYAD.NH4366S https://doi.org/10.5061/dryad.nh4366s 2023-01-22T17:16:28Z Studying the genetic population structure of species can reveal important insights into several key evolutionary, historical, demographic, and anthropogenic processes. One of the most important statistical tools for inferring genetic clusters is the program STRUCTURE. Recently, several papers have pointed out that STRUCTURE may show a bias when the sampling design is unbalanced, resulting in spurious joining of underrepresented populations and spurious separation of overrepresented populations. Suggestions to overcome this bias include subsampling and changing the ancestry model, but the performance of these two methods has not yet been tested on actual data. Here, I use a dataset of twelve high-alpine plant species to test whether unbalanced sampling affects the STRUCTURE inference of population differentiation between the European Alps and the Carpathians. For four of the twelve species, subsampling of the Alpine populations –to match the sample size between the Alps and the Carpathians– resulted in a drastically different clustering than the full dataset. On the other hand, STRUCTURE results with the alternative ancestry model were indistinguishable from the results with the default model. Based on these results, the subsampling strategy seems a more viable approach to overcome the bias than the alternative ancestry model. However, subsampling is only possible when there is an a priori expectation of what constitute the main clusters. Though these results do not mean that the use of STRUCTURE should be discarded, it does indicate that users of the software should be cautious about the interpretation of the results when sampling is unbalanced. dataThis folder contains for every species the data for the AFLP markers. These are formatted in plain text files with each AFLP marker represented by a single column. A zero (0)represents absence of an marker band and a one (1) represents presence of a marker band. Preceding the AFLP data are four columns with metadata: 1) individual: The name of the individual ... Dataset Dryas octopetala Unknown
institution Open Polar
collection Unknown
op_collection_id fttriple
language unknown
topic Bayesian clustering
population structure
genetic differentiation
phylogeography
High-alpine plants
IntraBioDiv
Alps
Carpathians
Europe
Anthropocene
Arabis alpina
Carex sempervirens
Dryas octopetala
Geum montanum
Geum reptans
Hedysarum hedysaroides
Hypochaeris uniflora
Juncus trifidus
Loiseleuria procumbens
Luzula alpinopilosa
Saxifraga stellaris
Sesleria caerulea
Life sciences
medicine and health care
geo
envir
spellingShingle Bayesian clustering
population structure
genetic differentiation
phylogeography
High-alpine plants
IntraBioDiv
Alps
Carpathians
Europe
Anthropocene
Arabis alpina
Carex sempervirens
Dryas octopetala
Geum montanum
Geum reptans
Hedysarum hedysaroides
Hypochaeris uniflora
Juncus trifidus
Loiseleuria procumbens
Luzula alpinopilosa
Saxifraga stellaris
Sesleria caerulea
Life sciences
medicine and health care
geo
envir
Meirmans, Patrick
Data from: Subsampling reveals that unbalanced sampling affects STRUCTURE results in a multi-species dataset
topic_facet Bayesian clustering
population structure
genetic differentiation
phylogeography
High-alpine plants
IntraBioDiv
Alps
Carpathians
Europe
Anthropocene
Arabis alpina
Carex sempervirens
Dryas octopetala
Geum montanum
Geum reptans
Hedysarum hedysaroides
Hypochaeris uniflora
Juncus trifidus
Loiseleuria procumbens
Luzula alpinopilosa
Saxifraga stellaris
Sesleria caerulea
Life sciences
medicine and health care
geo
envir
description Studying the genetic population structure of species can reveal important insights into several key evolutionary, historical, demographic, and anthropogenic processes. One of the most important statistical tools for inferring genetic clusters is the program STRUCTURE. Recently, several papers have pointed out that STRUCTURE may show a bias when the sampling design is unbalanced, resulting in spurious joining of underrepresented populations and spurious separation of overrepresented populations. Suggestions to overcome this bias include subsampling and changing the ancestry model, but the performance of these two methods has not yet been tested on actual data. Here, I use a dataset of twelve high-alpine plant species to test whether unbalanced sampling affects the STRUCTURE inference of population differentiation between the European Alps and the Carpathians. For four of the twelve species, subsampling of the Alpine populations –to match the sample size between the Alps and the Carpathians– resulted in a drastically different clustering than the full dataset. On the other hand, STRUCTURE results with the alternative ancestry model were indistinguishable from the results with the default model. Based on these results, the subsampling strategy seems a more viable approach to overcome the bias than the alternative ancestry model. However, subsampling is only possible when there is an a priori expectation of what constitute the main clusters. Though these results do not mean that the use of STRUCTURE should be discarded, it does indicate that users of the software should be cautious about the interpretation of the results when sampling is unbalanced. dataThis folder contains for every species the data for the AFLP markers. These are formatted in plain text files with each AFLP marker represented by a single column. A zero (0)represents absence of an marker band and a one (1) represents presence of a marker band. Preceding the AFLP data are four columns with metadata: 1) individual: The name of the individual ...
format Dataset
author Meirmans, Patrick
author_facet Meirmans, Patrick
author_sort Meirmans, Patrick
title Data from: Subsampling reveals that unbalanced sampling affects STRUCTURE results in a multi-species dataset
title_short Data from: Subsampling reveals that unbalanced sampling affects STRUCTURE results in a multi-species dataset
title_full Data from: Subsampling reveals that unbalanced sampling affects STRUCTURE results in a multi-species dataset
title_fullStr Data from: Subsampling reveals that unbalanced sampling affects STRUCTURE results in a multi-species dataset
title_full_unstemmed Data from: Subsampling reveals that unbalanced sampling affects STRUCTURE results in a multi-species dataset
title_sort data from: subsampling reveals that unbalanced sampling affects structure results in a multi-species dataset
publisher DRYAD
publishDate 2018
url https://doi.org/10.5061/DRYAD.NH4366S
genre Dryas octopetala
genre_facet Dryas octopetala
op_source 10.5061/DRYAD.NH4366S
oai:services.nod.dans.knaw.nl:Products/dans:oai:easy.dans.knaw.nl:easy-dataset:109075
oai:easy.dans.knaw.nl:easy-dataset:109075
10|openaire____::9e3be59865b2c1c335d32dae2fe7b254
re3data_____::r3d100000044
10|eurocrisdris::fe4903425d9040f680d8610d9079ea14
10|re3data_____::94816e6421eeb072e7742ce6a9decc5f
10|re3data_____::84e123776089ce3c7a33db98d9cd15a8
op_relation https://dx.doi.org/10.5061/DRYAD.NH4366S
http://dx.doi.org/10.5061/dryad.nh4366s
op_rights lic_creative-commons
op_doi https://doi.org/10.5061/DRYAD.NH4366S
https://doi.org/10.5061/dryad.nh4366s
_version_ 1766398437101666304