Applications of random forest feature selection for fine‐scale genetic population assignment
Abstract Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine‐learning algorithms (random forest, regularized random forest and guided regular...
Published in: | Evolutionary Applications |
---|---|
Main Authors: | , , , , , , |
Other Authors: | |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
Wiley
2017
|
Subjects: | |
Online Access: | http://dx.doi.org/10.1111/eva.12524 https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Feva.12524 https://onlinelibrary.wiley.com/doi/pdf/10.1111/eva.12524 |
id |
crwiley:10.1111/eva.12524 |
---|---|
record_format |
openpolar |
spelling |
crwiley:10.1111/eva.12524 2024-09-15T17:56:12+00:00 Applications of random forest feature selection for fine‐scale genetic population assignment Sylvester, Emma V. A. Bentzen, Paul Bradbury, Ian R. Clément, Marie Pearce, Jon Horne, John Beiko, Robert G. Natural Sciences and Engineering Research Council of Canada 2017 http://dx.doi.org/10.1111/eva.12524 https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Feva.12524 https://onlinelibrary.wiley.com/doi/pdf/10.1111/eva.12524 en eng Wiley http://creativecommons.org/licenses/by/4.0/ Evolutionary Applications volume 11, issue 2, page 153-165 ISSN 1752-4571 1752-4571 journal-article 2017 crwiley https://doi.org/10.1111/eva.12524 2024-08-30T04:12:07Z Abstract Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine‐learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F ST ranking for selection of single nucleotide polymorphisms ( SNP ) for fine‐scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon ( Salmo salar ) and a published SNP data set for Alaskan Chinook salmon ( Oncorhynchus tshawytscha ). In each species, we identified the minimum panel size required to obtain a self‐assignment accuracy of at least 90% using each method to create panels of 50–700 markers Panels of SNP s identified using random forest‐based methods performed up to 7.8 and 11.2 percentage points better than F ST ‐selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self‐assignment accuracy ≥90% was obtained with panels of 670 and 384 SNP s for each data set, respectively, a level of accuracy never reached for these species using F ST ‐selected panels. Our results demonstrate a role for machine‐learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations. Article in Journal/Newspaper Atlantic salmon Salmo salar Wiley Online Library Evolutionary Applications 11 2 153 165 |
institution |
Open Polar |
collection |
Wiley Online Library |
op_collection_id |
crwiley |
language |
English |
description |
Abstract Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine‐learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F ST ranking for selection of single nucleotide polymorphisms ( SNP ) for fine‐scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon ( Salmo salar ) and a published SNP data set for Alaskan Chinook salmon ( Oncorhynchus tshawytscha ). In each species, we identified the minimum panel size required to obtain a self‐assignment accuracy of at least 90% using each method to create panels of 50–700 markers Panels of SNP s identified using random forest‐based methods performed up to 7.8 and 11.2 percentage points better than F ST ‐selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self‐assignment accuracy ≥90% was obtained with panels of 670 and 384 SNP s for each data set, respectively, a level of accuracy never reached for these species using F ST ‐selected panels. Our results demonstrate a role for machine‐learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations. |
author2 |
Natural Sciences and Engineering Research Council of Canada |
format |
Article in Journal/Newspaper |
author |
Sylvester, Emma V. A. Bentzen, Paul Bradbury, Ian R. Clément, Marie Pearce, Jon Horne, John Beiko, Robert G. |
spellingShingle |
Sylvester, Emma V. A. Bentzen, Paul Bradbury, Ian R. Clément, Marie Pearce, Jon Horne, John Beiko, Robert G. Applications of random forest feature selection for fine‐scale genetic population assignment |
author_facet |
Sylvester, Emma V. A. Bentzen, Paul Bradbury, Ian R. Clément, Marie Pearce, Jon Horne, John Beiko, Robert G. |
author_sort |
Sylvester, Emma V. A. |
title |
Applications of random forest feature selection for fine‐scale genetic population assignment |
title_short |
Applications of random forest feature selection for fine‐scale genetic population assignment |
title_full |
Applications of random forest feature selection for fine‐scale genetic population assignment |
title_fullStr |
Applications of random forest feature selection for fine‐scale genetic population assignment |
title_full_unstemmed |
Applications of random forest feature selection for fine‐scale genetic population assignment |
title_sort |
applications of random forest feature selection for fine‐scale genetic population assignment |
publisher |
Wiley |
publishDate |
2017 |
url |
http://dx.doi.org/10.1111/eva.12524 https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Feva.12524 https://onlinelibrary.wiley.com/doi/pdf/10.1111/eva.12524 |
genre |
Atlantic salmon Salmo salar |
genre_facet |
Atlantic salmon Salmo salar |
op_source |
Evolutionary Applications volume 11, issue 2, page 153-165 ISSN 1752-4571 1752-4571 |
op_rights |
http://creativecommons.org/licenses/by/4.0/ |
op_doi |
https://doi.org/10.1111/eva.12524 |
container_title |
Evolutionary Applications |
container_volume |
11 |
container_issue |
2 |
container_start_page |
153 |
op_container_end_page |
165 |
_version_ |
1810432407300472832 |