Influence of Hyperparameters on Random Forest Accuracy

In this paper we present our work on the Random Forest (RF) family of classification methods. Our goal is to go one step further in the understanding of RF mechanisms by studying the parametrization of the reference algorithm Forest-RI. In this algorithm, a randomization principle is used during the...

Full description

Bibliographic Details
Main Authors: Bernard, Simon, Heutte, Laurent, Adam, Sébastien
Other Authors: Equipe Apprentissage (DocApp - LITIS), Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS), Université Le Havre Normandie (ULH), Normandie Université (NU)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Université Le Havre Normandie (ULH), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)
Format: Conference Object
Language:English
Published: HAL CCSD 2009
Subjects:
Online Access:https://hal.archives-ouvertes.fr/hal-00436358
https://hal.archives-ouvertes.fr/hal-00436358/document
https://hal.archives-ouvertes.fr/hal-00436358/file/mcs09.pdf
https://doi.org/10.1007/978-3-642-02326-2_18
id ftccsdartic:oai:HAL:hal-00436358v1
record_format openpolar
spelling ftccsdartic:oai:HAL:hal-00436358v1 2023-05-15T16:50:26+02:00 Influence of Hyperparameters on Random Forest Accuracy Bernard, Simon Heutte, Laurent Adam, Sébastien Equipe Apprentissage (DocApp - LITIS) Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS) Université Le Havre Normandie (ULH) Normandie Université (NU)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN) Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie) Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Université Le Havre Normandie (ULH) Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA) Reykjavik, Iceland 2009-06-10 https://hal.archives-ouvertes.fr/hal-00436358 https://hal.archives-ouvertes.fr/hal-00436358/document https://hal.archives-ouvertes.fr/hal-00436358/file/mcs09.pdf https://doi.org/10.1007/978-3-642-02326-2_18 en eng HAL CCSD Springer info:eu-repo/semantics/altIdentifier/doi/10.1007/978-3-642-02326-2_18 hal-00436358 https://hal.archives-ouvertes.fr/hal-00436358 https://hal.archives-ouvertes.fr/hal-00436358/document https://hal.archives-ouvertes.fr/hal-00436358/file/mcs09.pdf doi:10.1007/978-3-642-02326-2_18 info:eu-repo/semantics/OpenAccess International Workshop on Multiple Classifier Systems (MCS) https://hal.archives-ouvertes.fr/hal-00436358 International Workshop on Multiple Classifier Systems (MCS), Jun 2009, Reykjavik, Iceland. pp.171-180, ⟨10.1007/978-3-642-02326-2_18⟩ [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] info:eu-repo/semantics/conferenceObject Conference papers 2009 ftccsdartic https://doi.org/10.1007/978-3-642-02326-2_18 2021-10-24T19:25:24Z In this paper we present our work on the Random Forest (RF) family of classification methods. Our goal is to go one step further in the understanding of RF mechanisms by studying the parametrization of the reference algorithm Forest-RI. In this algorithm, a randomization principle is used during the tree induction process, that randomly selects K features at each node, among which the best split is chosen. The strength of randomization in the tree induction is thus led by the hyperparameter K which plays an important role for building accurate RF classifiers. We have decided to focus our experimental study on this hyperparameter and on its influence on classification accuracy. For that purpose, we have evaluated the Forest-RI algorithm on several machine learning problems and with different settings of K in order to understand the way it acts on RF performance. We show that default values of K traditionally used in the literature are globally near-optimal, except for some cases for which they are all significatively sub-optimal. Thus additional experiments have been led on those datasets, that highlight the crucial role played by feature relevancy in finding the optimal setting of K. Conference Object Iceland Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe) 171 180
institution Open Polar
collection Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
op_collection_id ftccsdartic
language English
topic [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
spellingShingle [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
Bernard, Simon
Heutte, Laurent
Adam, Sébastien
Influence of Hyperparameters on Random Forest Accuracy
topic_facet [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
description In this paper we present our work on the Random Forest (RF) family of classification methods. Our goal is to go one step further in the understanding of RF mechanisms by studying the parametrization of the reference algorithm Forest-RI. In this algorithm, a randomization principle is used during the tree induction process, that randomly selects K features at each node, among which the best split is chosen. The strength of randomization in the tree induction is thus led by the hyperparameter K which plays an important role for building accurate RF classifiers. We have decided to focus our experimental study on this hyperparameter and on its influence on classification accuracy. For that purpose, we have evaluated the Forest-RI algorithm on several machine learning problems and with different settings of K in order to understand the way it acts on RF performance. We show that default values of K traditionally used in the literature are globally near-optimal, except for some cases for which they are all significatively sub-optimal. Thus additional experiments have been led on those datasets, that highlight the crucial role played by feature relevancy in finding the optimal setting of K.
author2 Equipe Apprentissage (DocApp - LITIS)
Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS)
Université Le Havre Normandie (ULH)
Normandie Université (NU)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN)
Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie)
Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Université Le Havre Normandie (ULH)
Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)
format Conference Object
author Bernard, Simon
Heutte, Laurent
Adam, Sébastien
author_facet Bernard, Simon
Heutte, Laurent
Adam, Sébastien
author_sort Bernard, Simon
title Influence of Hyperparameters on Random Forest Accuracy
title_short Influence of Hyperparameters on Random Forest Accuracy
title_full Influence of Hyperparameters on Random Forest Accuracy
title_fullStr Influence of Hyperparameters on Random Forest Accuracy
title_full_unstemmed Influence of Hyperparameters on Random Forest Accuracy
title_sort influence of hyperparameters on random forest accuracy
publisher HAL CCSD
publishDate 2009
url https://hal.archives-ouvertes.fr/hal-00436358
https://hal.archives-ouvertes.fr/hal-00436358/document
https://hal.archives-ouvertes.fr/hal-00436358/file/mcs09.pdf
https://doi.org/10.1007/978-3-642-02326-2_18
op_coverage Reykjavik, Iceland
genre Iceland
genre_facet Iceland
op_source International Workshop on Multiple Classifier Systems (MCS)
https://hal.archives-ouvertes.fr/hal-00436358
International Workshop on Multiple Classifier Systems (MCS), Jun 2009, Reykjavik, Iceland. pp.171-180, ⟨10.1007/978-3-642-02326-2_18⟩
op_relation info:eu-repo/semantics/altIdentifier/doi/10.1007/978-3-642-02326-2_18
hal-00436358
https://hal.archives-ouvertes.fr/hal-00436358
https://hal.archives-ouvertes.fr/hal-00436358/document
https://hal.archives-ouvertes.fr/hal-00436358/file/mcs09.pdf
doi:10.1007/978-3-642-02326-2_18
op_rights info:eu-repo/semantics/OpenAccess
op_doi https://doi.org/10.1007/978-3-642-02326-2_18
container_start_page 171
op_container_end_page 180
_version_ 1766040578290614272