A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver

Biomarker discovery is extraordinarily important in gene expression analysis in context of toxicant exposure. Among gene selection methods, differential expression analysis is often applied because of its simplicity and interpretability. But it treats genes individually, disregarding the correlation...

Full description

Bibliographic Details
Main Authors: Xiaokang Zhang, Jonassen, Inge
Format: Article in Journal/Newspaper
Language:unknown
Published: F1000Research 2017
Subjects:
Online Access:https://dx.doi.org/10.7490/f1000research.1114608.1
https://f1000research.com/posters/6-1359
id ftdatacite:10.7490/f1000research.1114608.1
record_format openpolar
spelling ftdatacite:10.7490/f1000research.1114608.1 2023-05-15T15:27:14+02:00 A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver Xiaokang Zhang Jonassen, Inge 2017 https://dx.doi.org/10.7490/f1000research.1114608.1 https://f1000research.com/posters/6-1359 unknown F1000Research Other CreativeWork article 2017 ftdatacite https://doi.org/10.7490/f1000research.1114608.1 2021-11-05T12:55:41Z Biomarker discovery is extraordinarily important in gene expression analysis in context of toxicant exposure. Among gene selection methods, differential expression analysis is often applied because of its simplicity and interpretability. But it treats genes individually, disregarding the correlation between them. So some multivariate feature selection methods are proposed for biomarker discovery. We compared three methods that stem from different theories, namely Significance Analysis of Microarrays (SAM) which finds out the differentially expressed genes, minimum Redundancy Maximum Relevance (mRMR) based on information theory, and Characteristic Direction (GeoDE) from a geometrical aspect, according to the stability and classification accuracy. The stability of feature selection methods is measured based on the overlap of selected features from different sampling steps. Using the subsets of selected features from 3 feature selection methods, we trained 4 classifiers, namely Random Forest, Support Vector Machine, RIDGE regression, LASSO, and then test the prediction accuracy to see how well the subsets can improve it. Based on these two aspects, we studied the performance of 3 feature selection methods. Tested on the gene expression data from two toxicant exposure experiments on Atlantic Cod liver, we found that GeoDE is more stable, and can give higher prediction accuracy in low-dose condition. Article in Journal/Newspaper atlantic cod Gadus morhua DataCite Metadata Store (German National Library of Science and Technology)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
description Biomarker discovery is extraordinarily important in gene expression analysis in context of toxicant exposure. Among gene selection methods, differential expression analysis is often applied because of its simplicity and interpretability. But it treats genes individually, disregarding the correlation between them. So some multivariate feature selection methods are proposed for biomarker discovery. We compared three methods that stem from different theories, namely Significance Analysis of Microarrays (SAM) which finds out the differentially expressed genes, minimum Redundancy Maximum Relevance (mRMR) based on information theory, and Characteristic Direction (GeoDE) from a geometrical aspect, according to the stability and classification accuracy. The stability of feature selection methods is measured based on the overlap of selected features from different sampling steps. Using the subsets of selected features from 3 feature selection methods, we trained 4 classifiers, namely Random Forest, Support Vector Machine, RIDGE regression, LASSO, and then test the prediction accuracy to see how well the subsets can improve it. Based on these two aspects, we studied the performance of 3 feature selection methods. Tested on the gene expression data from two toxicant exposure experiments on Atlantic Cod liver, we found that GeoDE is more stable, and can give higher prediction accuracy in low-dose condition.
format Article in Journal/Newspaper
author Xiaokang Zhang
Jonassen, Inge
spellingShingle Xiaokang Zhang
Jonassen, Inge
A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver
author_facet Xiaokang Zhang
Jonassen, Inge
author_sort Xiaokang Zhang
title A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver
title_short A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver
title_full A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver
title_fullStr A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver
title_full_unstemmed A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver
title_sort comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (gadus morhua) liver
publisher F1000Research
publishDate 2017
url https://dx.doi.org/10.7490/f1000research.1114608.1
https://f1000research.com/posters/6-1359
genre atlantic cod
Gadus morhua
genre_facet atlantic cod
Gadus morhua
op_doi https://doi.org/10.7490/f1000research.1114608.1
_version_ 1766357680843128832