Applications of random forest feature selection for fine‐scale genetic population assignment

Abstract Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine‐learning algorithms (random forest, regularized random forest and guided regular...

Full description

Bibliographic Details
Published in:Evolutionary Applications
Main Authors: Sylvester, Emma V. A., Bentzen, Paul, Bradbury, Ian R., Clément, Marie, Pearce, Jon, Horne, John, Beiko, Robert G.
Other Authors: Natural Sciences and Engineering Research Council of Canada
Format: Article in Journal/Newspaper
Language:English
Published: Wiley 2017
Subjects:
Online Access:http://dx.doi.org/10.1111/eva.12524
https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Feva.12524
https://onlinelibrary.wiley.com/doi/pdf/10.1111/eva.12524
id crwiley:10.1111/eva.12524
record_format openpolar
spelling crwiley:10.1111/eva.12524 2024-09-15T17:56:12+00:00 Applications of random forest feature selection for fine‐scale genetic population assignment Sylvester, Emma V. A. Bentzen, Paul Bradbury, Ian R. Clément, Marie Pearce, Jon Horne, John Beiko, Robert G. Natural Sciences and Engineering Research Council of Canada 2017 http://dx.doi.org/10.1111/eva.12524 https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Feva.12524 https://onlinelibrary.wiley.com/doi/pdf/10.1111/eva.12524 en eng Wiley http://creativecommons.org/licenses/by/4.0/ Evolutionary Applications volume 11, issue 2, page 153-165 ISSN 1752-4571 1752-4571 journal-article 2017 crwiley https://doi.org/10.1111/eva.12524 2024-08-30T04:12:07Z Abstract Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine‐learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F ST ranking for selection of single nucleotide polymorphisms ( SNP ) for fine‐scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon ( Salmo salar ) and a published SNP data set for Alaskan Chinook salmon ( Oncorhynchus tshawytscha ). In each species, we identified the minimum panel size required to obtain a self‐assignment accuracy of at least 90% using each method to create panels of 50–700 markers Panels of SNP s identified using random forest‐based methods performed up to 7.8 and 11.2 percentage points better than F ST ‐selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self‐assignment accuracy ≥90% was obtained with panels of 670 and 384 SNP s for each data set, respectively, a level of accuracy never reached for these species using F ST ‐selected panels. Our results demonstrate a role for machine‐learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations. Article in Journal/Newspaper Atlantic salmon Salmo salar Wiley Online Library Evolutionary Applications 11 2 153 165
institution Open Polar
collection Wiley Online Library
op_collection_id crwiley
language English
description Abstract Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine‐learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F ST ranking for selection of single nucleotide polymorphisms ( SNP ) for fine‐scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon ( Salmo salar ) and a published SNP data set for Alaskan Chinook salmon ( Oncorhynchus tshawytscha ). In each species, we identified the minimum panel size required to obtain a self‐assignment accuracy of at least 90% using each method to create panels of 50–700 markers Panels of SNP s identified using random forest‐based methods performed up to 7.8 and 11.2 percentage points better than F ST ‐selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self‐assignment accuracy ≥90% was obtained with panels of 670 and 384 SNP s for each data set, respectively, a level of accuracy never reached for these species using F ST ‐selected panels. Our results demonstrate a role for machine‐learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.
author2 Natural Sciences and Engineering Research Council of Canada
format Article in Journal/Newspaper
author Sylvester, Emma V. A.
Bentzen, Paul
Bradbury, Ian R.
Clément, Marie
Pearce, Jon
Horne, John
Beiko, Robert G.
spellingShingle Sylvester, Emma V. A.
Bentzen, Paul
Bradbury, Ian R.
Clément, Marie
Pearce, Jon
Horne, John
Beiko, Robert G.
Applications of random forest feature selection for fine‐scale genetic population assignment
author_facet Sylvester, Emma V. A.
Bentzen, Paul
Bradbury, Ian R.
Clément, Marie
Pearce, Jon
Horne, John
Beiko, Robert G.
author_sort Sylvester, Emma V. A.
title Applications of random forest feature selection for fine‐scale genetic population assignment
title_short Applications of random forest feature selection for fine‐scale genetic population assignment
title_full Applications of random forest feature selection for fine‐scale genetic population assignment
title_fullStr Applications of random forest feature selection for fine‐scale genetic population assignment
title_full_unstemmed Applications of random forest feature selection for fine‐scale genetic population assignment
title_sort applications of random forest feature selection for fine‐scale genetic population assignment
publisher Wiley
publishDate 2017
url http://dx.doi.org/10.1111/eva.12524
https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Feva.12524
https://onlinelibrary.wiley.com/doi/pdf/10.1111/eva.12524
genre Atlantic salmon
Salmo salar
genre_facet Atlantic salmon
Salmo salar
op_source Evolutionary Applications
volume 11, issue 2, page 153-165
ISSN 1752-4571 1752-4571
op_rights http://creativecommons.org/licenses/by/4.0/
op_doi https://doi.org/10.1111/eva.12524
container_title Evolutionary Applications
container_volume 11
container_issue 2
container_start_page 153
op_container_end_page 165
_version_ 1810432407300472832