Multi‐Layer Feature Selection Incorporating Weighted Score‐Based Expert Knowledge toward Modeling Materials with Targeted Properties

Abstract Selecting proper descriptors or features is one of the central problems in exploring structure–activity relationships of materials using machine learning models. The current feature selection algorithms usually require tedious hyperparameter tuning and do not actively consider the prior kno...

Full description

Bibliographic Details
Published in:Advanced Theory and Simulations
Main Authors: Liu, Yue, Wu, Jun‐Ming, Avdeev, Maxim, Shi, Si‐Qi
Other Authors: National Basic Research Program of China
Format: Article in Journal/Newspaper
Language:English
Published: Wiley 2020
Subjects:
DML
Online Access:http://dx.doi.org/10.1002/adts.201900215
https://onlinelibrary.wiley.com/doi/pdf/10.1002/adts.201900215
https://onlinelibrary.wiley.com/doi/full-xml/10.1002/adts.201900215
id crwiley:10.1002/adts.201900215
record_format openpolar
spelling crwiley:10.1002/adts.201900215 2024-10-13T14:06:52+00:00 Multi‐Layer Feature Selection Incorporating Weighted Score‐Based Expert Knowledge toward Modeling Materials with Targeted Properties Liu, Yue Wu, Jun‐Ming Avdeev, Maxim Shi, Si‐Qi National Basic Research Program of China 2020 http://dx.doi.org/10.1002/adts.201900215 https://onlinelibrary.wiley.com/doi/pdf/10.1002/adts.201900215 https://onlinelibrary.wiley.com/doi/full-xml/10.1002/adts.201900215 en eng Wiley http://creativecommons.org/licenses/by/4.0/ Advanced Theory and Simulations volume 3, issue 2 ISSN 2513-0390 2513-0390 journal-article 2020 crwiley https://doi.org/10.1002/adts.201900215 2024-09-17T04:50:10Z Abstract Selecting proper descriptors or features is one of the central problems in exploring structure–activity relationships of materials using machine learning models. The current feature selection algorithms usually require tedious hyperparameter tuning and do not actively consider the prior knowledge of domain experts about the features. Here, this work proposes a data‐driven multi‐layer feature selection method incorporating domain expert knowledge named DML‐FS dek , which is automated, with users entering training data without manual tuning of the hyperparameters. The domain expert knowledge is quantified by means of weighted scoring and integrated into the selection process to eliminate the risk of crucial features being removed. The test studies on ten material properties datasets demonstrate the potential of the approach to automatically search for a reduced feature set with lower root mean square errors than those for the initial feature set. Essentially, the most relevant material features, the number of which is much smaller than that in the original feature set, are automatically selected to establish a closer and more accurate structure–activity relationship for the materials of interest. As a result, the method represents the targeted properties of materials with a smaller and more interpretable set of features while ensuring equal or better prediction accuracy. Article in Journal/Newspaper DML Wiley Online Library Advanced Theory and Simulations 3 2
institution Open Polar
collection Wiley Online Library
op_collection_id crwiley
language English
description Abstract Selecting proper descriptors or features is one of the central problems in exploring structure–activity relationships of materials using machine learning models. The current feature selection algorithms usually require tedious hyperparameter tuning and do not actively consider the prior knowledge of domain experts about the features. Here, this work proposes a data‐driven multi‐layer feature selection method incorporating domain expert knowledge named DML‐FS dek , which is automated, with users entering training data without manual tuning of the hyperparameters. The domain expert knowledge is quantified by means of weighted scoring and integrated into the selection process to eliminate the risk of crucial features being removed. The test studies on ten material properties datasets demonstrate the potential of the approach to automatically search for a reduced feature set with lower root mean square errors than those for the initial feature set. Essentially, the most relevant material features, the number of which is much smaller than that in the original feature set, are automatically selected to establish a closer and more accurate structure–activity relationship for the materials of interest. As a result, the method represents the targeted properties of materials with a smaller and more interpretable set of features while ensuring equal or better prediction accuracy.
author2 National Basic Research Program of China
format Article in Journal/Newspaper
author Liu, Yue
Wu, Jun‐Ming
Avdeev, Maxim
Shi, Si‐Qi
spellingShingle Liu, Yue
Wu, Jun‐Ming
Avdeev, Maxim
Shi, Si‐Qi
Multi‐Layer Feature Selection Incorporating Weighted Score‐Based Expert Knowledge toward Modeling Materials with Targeted Properties
author_facet Liu, Yue
Wu, Jun‐Ming
Avdeev, Maxim
Shi, Si‐Qi
author_sort Liu, Yue
title Multi‐Layer Feature Selection Incorporating Weighted Score‐Based Expert Knowledge toward Modeling Materials with Targeted Properties
title_short Multi‐Layer Feature Selection Incorporating Weighted Score‐Based Expert Knowledge toward Modeling Materials with Targeted Properties
title_full Multi‐Layer Feature Selection Incorporating Weighted Score‐Based Expert Knowledge toward Modeling Materials with Targeted Properties
title_fullStr Multi‐Layer Feature Selection Incorporating Weighted Score‐Based Expert Knowledge toward Modeling Materials with Targeted Properties
title_full_unstemmed Multi‐Layer Feature Selection Incorporating Weighted Score‐Based Expert Knowledge toward Modeling Materials with Targeted Properties
title_sort multi‐layer feature selection incorporating weighted score‐based expert knowledge toward modeling materials with targeted properties
publisher Wiley
publishDate 2020
url http://dx.doi.org/10.1002/adts.201900215
https://onlinelibrary.wiley.com/doi/pdf/10.1002/adts.201900215
https://onlinelibrary.wiley.com/doi/full-xml/10.1002/adts.201900215
genre DML
genre_facet DML
op_source Advanced Theory and Simulations
volume 3, issue 2
ISSN 2513-0390 2513-0390
op_rights http://creativecommons.org/licenses/by/4.0/
op_doi https://doi.org/10.1002/adts.201900215
container_title Advanced Theory and Simulations
container_volume 3
container_issue 2
_version_ 1812813066733092864