Ensemble Random Forests as a tool for modeling rare occurrences

Relative to target species, priority conservation species occur rarely in fishery interactions, resulting in imbalanced, overdispersed data. We present Ensemble Random Forests (ERFs) as an intuitive extension of the Random Forest algorithm to handle rare event bias. Each Random Forest receives indiv...

Full description

Bibliographic Details
Published in:Endangered Species Research
Main Authors: Siders, ZA, Ducharme-Barth, ND, Carvalho, F, Kobayashi, D, Martin, S, Raynor, J, Jones, TT, Ahrens, RNM
Format: Article in Journal/Newspaper
Language:English
Published: Inter-Research 2020
Subjects:
Online Access:https://doi.org/10.3354/esr01060
https://doaj.org/article/ee167cb0e49844bb8305fbcaa6dddb5a
id ftdoajarticles:oai:doaj.org/article:ee167cb0e49844bb8305fbcaa6dddb5a
record_format openpolar
spelling ftdoajarticles:oai:doaj.org/article:ee167cb0e49844bb8305fbcaa6dddb5a 2023-05-15T17:03:38+02:00 Ensemble Random Forests as a tool for modeling rare occurrences Siders, ZA Ducharme-Barth, ND Carvalho, F Kobayashi, D Martin, S Raynor, J Jones, TT Ahrens, RNM 2020-10-01T00:00:00Z https://doi.org/10.3354/esr01060 https://doaj.org/article/ee167cb0e49844bb8305fbcaa6dddb5a EN eng Inter-Research https://www.int-res.com/abstracts/esr/v43/p183-197/ https://doaj.org/toc/1863-5407 https://doaj.org/toc/1613-4796 1863-5407 1613-4796 doi:10.3354/esr01060 https://doaj.org/article/ee167cb0e49844bb8305fbcaa6dddb5a Endangered Species Research, Vol 43, Pp 183-197 (2020) Zoology QL1-991 Botany QK1-989 article 2020 ftdoajarticles https://doi.org/10.3354/esr01060 2022-12-31T15:25:55Z Relative to target species, priority conservation species occur rarely in fishery interactions, resulting in imbalanced, overdispersed data. We present Ensemble Random Forests (ERFs) as an intuitive extension of the Random Forest algorithm to handle rare event bias. Each Random Forest receives individual stratified randomly sampled training/test sets, then down-samples the majority class for each decision tree. Results are averaged across Random Forests to generate an ensemble prediction. Through simulation, we show that ERFs outperform Random Forest with and without down-sampling, as well as with the synthetic minority over-sampling technique, for highly class imbalanced to balanced datasets. Spatial covariance greatly impacts ERFs’ perceived performance, as shown through simulation and case studies. In case studies from the Hawaii deep-set longline fishery, giant manta ray Mobula birostris syn. Manta birostris and scalloped hammerhead Sphyrna lewini presence had high spatial covariance and high model test performance, while false killer whale Pseudorca crassidens had low spatial covariance and low model test performance. Overall, we find ERFs have 4 advantages: (1) reduced successive partitioning effects; (2) prediction uncertainty propagation; (3) better accounting for interacting covariates through balancing; and (4) minimization of false positives, as the majority of Random Forests within the ensemble vote correctly. As ERFs can readily mitigate rare event bias without requiring large presence sample sizes or imparting considerable balancing bias, they are likely to be a valuable tool in bycatch and species distribution modeling, as well as spatial conservation planning, especially for protected species where presence can be rare. Article in Journal/Newspaper Killer Whale Directory of Open Access Journals: DOAJ Articles Endangered Species Research 43 183 197
institution Open Polar
collection Directory of Open Access Journals: DOAJ Articles
op_collection_id ftdoajarticles
language English
topic Zoology
QL1-991
Botany
QK1-989
spellingShingle Zoology
QL1-991
Botany
QK1-989
Siders, ZA
Ducharme-Barth, ND
Carvalho, F
Kobayashi, D
Martin, S
Raynor, J
Jones, TT
Ahrens, RNM
Ensemble Random Forests as a tool for modeling rare occurrences
topic_facet Zoology
QL1-991
Botany
QK1-989
description Relative to target species, priority conservation species occur rarely in fishery interactions, resulting in imbalanced, overdispersed data. We present Ensemble Random Forests (ERFs) as an intuitive extension of the Random Forest algorithm to handle rare event bias. Each Random Forest receives individual stratified randomly sampled training/test sets, then down-samples the majority class for each decision tree. Results are averaged across Random Forests to generate an ensemble prediction. Through simulation, we show that ERFs outperform Random Forest with and without down-sampling, as well as with the synthetic minority over-sampling technique, for highly class imbalanced to balanced datasets. Spatial covariance greatly impacts ERFs’ perceived performance, as shown through simulation and case studies. In case studies from the Hawaii deep-set longline fishery, giant manta ray Mobula birostris syn. Manta birostris and scalloped hammerhead Sphyrna lewini presence had high spatial covariance and high model test performance, while false killer whale Pseudorca crassidens had low spatial covariance and low model test performance. Overall, we find ERFs have 4 advantages: (1) reduced successive partitioning effects; (2) prediction uncertainty propagation; (3) better accounting for interacting covariates through balancing; and (4) minimization of false positives, as the majority of Random Forests within the ensemble vote correctly. As ERFs can readily mitigate rare event bias without requiring large presence sample sizes or imparting considerable balancing bias, they are likely to be a valuable tool in bycatch and species distribution modeling, as well as spatial conservation planning, especially for protected species where presence can be rare.
format Article in Journal/Newspaper
author Siders, ZA
Ducharme-Barth, ND
Carvalho, F
Kobayashi, D
Martin, S
Raynor, J
Jones, TT
Ahrens, RNM
author_facet Siders, ZA
Ducharme-Barth, ND
Carvalho, F
Kobayashi, D
Martin, S
Raynor, J
Jones, TT
Ahrens, RNM
author_sort Siders, ZA
title Ensemble Random Forests as a tool for modeling rare occurrences
title_short Ensemble Random Forests as a tool for modeling rare occurrences
title_full Ensemble Random Forests as a tool for modeling rare occurrences
title_fullStr Ensemble Random Forests as a tool for modeling rare occurrences
title_full_unstemmed Ensemble Random Forests as a tool for modeling rare occurrences
title_sort ensemble random forests as a tool for modeling rare occurrences
publisher Inter-Research
publishDate 2020
url https://doi.org/10.3354/esr01060
https://doaj.org/article/ee167cb0e49844bb8305fbcaa6dddb5a
genre Killer Whale
genre_facet Killer Whale
op_source Endangered Species Research, Vol 43, Pp 183-197 (2020)
op_relation https://www.int-res.com/abstracts/esr/v43/p183-197/
https://doaj.org/toc/1863-5407
https://doaj.org/toc/1613-4796
1863-5407
1613-4796
doi:10.3354/esr01060
https://doaj.org/article/ee167cb0e49844bb8305fbcaa6dddb5a
op_doi https://doi.org/10.3354/esr01060
container_title Endangered Species Research
container_volume 43
container_start_page 183
op_container_end_page 197
_version_ 1766057543560331264