A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver
Biomarker discovery is extraordinarily important in gene expression analysis in context of toxicant exposure. Among gene selection methods, differential expression analysis is often applied because of its simplicity and interpretability. But it treats genes individually, disregarding the correlation...
Main Authors: | , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
F1000Research
2017
|
Subjects: | |
Online Access: | https://dx.doi.org/10.7490/f1000research.1114608.1 https://f1000research.com/posters/6-1359 |
id |
ftdatacite:10.7490/f1000research.1114608.1 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.7490/f1000research.1114608.1 2023-05-15T15:27:14+02:00 A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver Xiaokang Zhang Jonassen, Inge 2017 https://dx.doi.org/10.7490/f1000research.1114608.1 https://f1000research.com/posters/6-1359 unknown F1000Research Other CreativeWork article 2017 ftdatacite https://doi.org/10.7490/f1000research.1114608.1 2021-11-05T12:55:41Z Biomarker discovery is extraordinarily important in gene expression analysis in context of toxicant exposure. Among gene selection methods, differential expression analysis is often applied because of its simplicity and interpretability. But it treats genes individually, disregarding the correlation between them. So some multivariate feature selection methods are proposed for biomarker discovery. We compared three methods that stem from different theories, namely Significance Analysis of Microarrays (SAM) which finds out the differentially expressed genes, minimum Redundancy Maximum Relevance (mRMR) based on information theory, and Characteristic Direction (GeoDE) from a geometrical aspect, according to the stability and classification accuracy. The stability of feature selection methods is measured based on the overlap of selected features from different sampling steps. Using the subsets of selected features from 3 feature selection methods, we trained 4 classifiers, namely Random Forest, Support Vector Machine, RIDGE regression, LASSO, and then test the prediction accuracy to see how well the subsets can improve it. Based on these two aspects, we studied the performance of 3 feature selection methods. Tested on the gene expression data from two toxicant exposure experiments on Atlantic Cod liver, we found that GeoDE is more stable, and can give higher prediction accuracy in low-dose condition. Article in Journal/Newspaper atlantic cod Gadus morhua DataCite Metadata Store (German National Library of Science and Technology) |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
description |
Biomarker discovery is extraordinarily important in gene expression analysis in context of toxicant exposure. Among gene selection methods, differential expression analysis is often applied because of its simplicity and interpretability. But it treats genes individually, disregarding the correlation between them. So some multivariate feature selection methods are proposed for biomarker discovery. We compared three methods that stem from different theories, namely Significance Analysis of Microarrays (SAM) which finds out the differentially expressed genes, minimum Redundancy Maximum Relevance (mRMR) based on information theory, and Characteristic Direction (GeoDE) from a geometrical aspect, according to the stability and classification accuracy. The stability of feature selection methods is measured based on the overlap of selected features from different sampling steps. Using the subsets of selected features from 3 feature selection methods, we trained 4 classifiers, namely Random Forest, Support Vector Machine, RIDGE regression, LASSO, and then test the prediction accuracy to see how well the subsets can improve it. Based on these two aspects, we studied the performance of 3 feature selection methods. Tested on the gene expression data from two toxicant exposure experiments on Atlantic Cod liver, we found that GeoDE is more stable, and can give higher prediction accuracy in low-dose condition. |
format |
Article in Journal/Newspaper |
author |
Xiaokang Zhang Jonassen, Inge |
spellingShingle |
Xiaokang Zhang Jonassen, Inge A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver |
author_facet |
Xiaokang Zhang Jonassen, Inge |
author_sort |
Xiaokang Zhang |
title |
A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver |
title_short |
A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver |
title_full |
A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver |
title_fullStr |
A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver |
title_full_unstemmed |
A comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (Gadus morhua) liver |
title_sort |
comparative analysis of feature selection methods for biomarker discovery in study of toxicant-treated atlantic cod (gadus morhua) liver |
publisher |
F1000Research |
publishDate |
2017 |
url |
https://dx.doi.org/10.7490/f1000research.1114608.1 https://f1000research.com/posters/6-1359 |
genre |
atlantic cod Gadus morhua |
genre_facet |
atlantic cod Gadus morhua |
op_doi |
https://doi.org/10.7490/f1000research.1114608.1 |
_version_ |
1766357680843128832 |