Data from: Applications of random forest feature selection for fine-scale genetic population assignment

Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest, and guided regularized ran...

Full description

Bibliographic Details
Main Authors: Sylvester, Emma V.A., Bentzen, Paul, Bradbury, Ian R., Clément, Marie, Pearce, Jon, Horne, John, Beiko, Robert G.
Language:unknown
Published: 2017
Subjects:
Online Access:http://nbn-resolving.org/urn:nbn:nl:ui:13-4v-kwh8
https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:98388
id ftdans:oai:easy.dans.knaw.nl:easy-dataset:98388
record_format openpolar
spelling ftdans:oai:easy.dans.knaw.nl:easy-dataset:98388 2023-07-02T03:31:42+02:00 Data from: Applications of random forest feature selection for fine-scale genetic population assignment Sylvester, Emma V.A. Bentzen, Paul Bradbury, Ian R. Clément, Marie Pearce, Jon Horne, John Beiko, Robert G. 2017-07-27T17:40:10.000+02:00 http://nbn-resolving.org/urn:nbn:nl:ui:13-4v-kwh8 https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:98388 unknown doi:10.5061/dryad.93h33/1 doi:10.5061/dryad.93h33/2 doi:10.1111/eva.12524 http://nbn-resolving.org/urn:nbn:nl:ui:13-4v-kwh8 doi:10.5061/dryad.93h33 https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:98388 OPEN_ACCESS: The data are archived in Easy, they are accessible elsewhere through the DOI https://dans.knaw.nl/en/about/organisation-and-policy/legal-information/DANSLicence.pdf Life sciences medicine and health care 2017 ftdans https://doi.org/10.5061/dryad.93h33/110.5061/dryad.93h33/210.1111/eva.1252410.5061/dryad.93h33 2023-06-13T13:25:06Z Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest, and guided regularized random forest) compared with FST ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP dataset for Atlantic salmon (Salmo salar) and a published SNP data set for Alaskan Chinook salmon (Oncorhynchus tshawytscha). In each species, we identified the minimum panel size required to obtain a self-assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest-based methods performed up to 7.8 and 11.2 percentage points better than FST-selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self-assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each dataset, respectively, a level of accuracy never reached for these species using FST-selected panels. Our results demonstrate a role for machine-learning approaches in marker selection across large genomic datasets to improve assignment for management and conservation of exploited populations. Other/Unknown Material Atlantic salmon Salmo salar Data Archiving and Networked Services (DANS): EASY (KNAW - Koninklijke Nederlandse Akademie van Wetenschappen)
institution Open Polar
collection Data Archiving and Networked Services (DANS): EASY (KNAW - Koninklijke Nederlandse Akademie van Wetenschappen)
op_collection_id ftdans
language unknown
topic Life sciences
medicine and health care
spellingShingle Life sciences
medicine and health care
Sylvester, Emma V.A.
Bentzen, Paul
Bradbury, Ian R.
Clément, Marie
Pearce, Jon
Horne, John
Beiko, Robert G.
Data from: Applications of random forest feature selection for fine-scale genetic population assignment
topic_facet Life sciences
medicine and health care
description Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest, and guided regularized random forest) compared with FST ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP dataset for Atlantic salmon (Salmo salar) and a published SNP data set for Alaskan Chinook salmon (Oncorhynchus tshawytscha). In each species, we identified the minimum panel size required to obtain a self-assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest-based methods performed up to 7.8 and 11.2 percentage points better than FST-selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self-assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each dataset, respectively, a level of accuracy never reached for these species using FST-selected panels. Our results demonstrate a role for machine-learning approaches in marker selection across large genomic datasets to improve assignment for management and conservation of exploited populations.
author Sylvester, Emma V.A.
Bentzen, Paul
Bradbury, Ian R.
Clément, Marie
Pearce, Jon
Horne, John
Beiko, Robert G.
author_facet Sylvester, Emma V.A.
Bentzen, Paul
Bradbury, Ian R.
Clément, Marie
Pearce, Jon
Horne, John
Beiko, Robert G.
author_sort Sylvester, Emma V.A.
title Data from: Applications of random forest feature selection for fine-scale genetic population assignment
title_short Data from: Applications of random forest feature selection for fine-scale genetic population assignment
title_full Data from: Applications of random forest feature selection for fine-scale genetic population assignment
title_fullStr Data from: Applications of random forest feature selection for fine-scale genetic population assignment
title_full_unstemmed Data from: Applications of random forest feature selection for fine-scale genetic population assignment
title_sort data from: applications of random forest feature selection for fine-scale genetic population assignment
publishDate 2017
url http://nbn-resolving.org/urn:nbn:nl:ui:13-4v-kwh8
https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:98388
genre Atlantic salmon
Salmo salar
genre_facet Atlantic salmon
Salmo salar
op_relation doi:10.5061/dryad.93h33/1
doi:10.5061/dryad.93h33/2
doi:10.1111/eva.12524
http://nbn-resolving.org/urn:nbn:nl:ui:13-4v-kwh8
doi:10.5061/dryad.93h33
https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:98388
op_rights OPEN_ACCESS: The data are archived in Easy, they are accessible elsewhere through the DOI
https://dans.knaw.nl/en/about/organisation-and-policy/legal-information/DANSLicence.pdf
op_doi https://doi.org/10.5061/dryad.93h33/110.5061/dryad.93h33/210.1111/eva.1252410.5061/dryad.93h33
_version_ 1770271082711875584