Biomarker Discovery Using Statistical and Machine Learning Approaches on Gene Expression Data

My PhD is affiliated with the dCod 1.0 project (https://www.uib.no/en/dcod): decoding the systems toxicology of Atlantic cod (Gadus morhua), which aims to better understand how cods adapt and react to the stressors in the environment. One of the research topics is to discover the biomarkers which di...

Full description

Bibliographic Details
Main Author: Zhang, Xiaokang
Other Authors: orcid:0000-0003-4684-317X
Format: Doctoral or Postdoctoral Thesis
Language:English
Published: The University of Bergen 2020
Subjects:
Online Access:https://hdl.handle.net/1956/24159
id ftunivbergen:oai:bora.uib.no:1956/24159
record_format openpolar
institution Open Polar
collection University of Bergen: Bergen Open Research Archive (BORA-UiB)
op_collection_id ftunivbergen
language English
description My PhD is affiliated with the dCod 1.0 project (https://www.uib.no/en/dcod): decoding the systems toxicology of Atlantic cod (Gadus morhua), which aims to better understand how cods adapt and react to the stressors in the environment. One of the research topics is to discover the biomarkers which discriminate the fish under normal biological status and the ones that are exposed to toxicants. A biomarker, or biological marker, is an indicator of a biological state in response to an intervention, which can be for example toxic exposure (in toxicology), disease (for example cancer), or drug response (in precision medicine). Biomarker discovery is a very important research topic in toxicology, cancer research, and so on. A good set of biomarkers can give insight into the disease / toxicant response mechanisms and be useful to find if the person has the disease / the fish has been exposed to the toxicant. On the molecular level, a biomarker could be "genotype" - for instance a single nucleotide variant linked with a particular disease or susceptibility; another biomarker could be the level of expression of a gene or a set of genes. In this thesis we focus on the latter one, aiming to find out the informative genes that can help to distinguish samples from different groups from the gene expression profiling. Several transcriptomics technologies can be used to generate the necessary data, and among them, DNA microarray and RNA sequencing (RNA-Seq) have become the most useful methods for whole transcriptome gene expression profiling. Especially RNA-Seq has become an attractive alternative to microarrays since it was introduced. Prior to analysis of gene expression, the RNA-Seq data needs to go through a series of processing steps, so a workflow which can automate the process is highly required. Even though many workflows have been proposed to facilitate this process, their application is usually limited to such as model organisms, high-performance computers, computer fluent users, and so on. To fill these gaps, we ...
author2 orcid:0000-0003-4684-317X
format Doctoral or Postdoctoral Thesis
author Zhang, Xiaokang
spellingShingle Zhang, Xiaokang
Biomarker Discovery Using Statistical and Machine Learning Approaches on Gene Expression Data
author_facet Zhang, Xiaokang
author_sort Zhang, Xiaokang
title Biomarker Discovery Using Statistical and Machine Learning Approaches on Gene Expression Data
title_short Biomarker Discovery Using Statistical and Machine Learning Approaches on Gene Expression Data
title_full Biomarker Discovery Using Statistical and Machine Learning Approaches on Gene Expression Data
title_fullStr Biomarker Discovery Using Statistical and Machine Learning Approaches on Gene Expression Data
title_full_unstemmed Biomarker Discovery Using Statistical and Machine Learning Approaches on Gene Expression Data
title_sort biomarker discovery using statistical and machine learning approaches on gene expression data
publisher The University of Bergen
publishDate 2020
url https://hdl.handle.net/1956/24159
genre atlantic cod
Gadus morhua
genre_facet atlantic cod
Gadus morhua
op_relation Paper I: Yadetie, F., Zhang, X., Hanna, E. M., Aranguren-Abadía, L., Eide, M., Blaser, N., Brun, M., Jonassen, I., Goksøyr, A., & Karlsen, O. A. (2018). RNA-Seq analysis of transcriptome responses in Atlantic cod (Gadus morhua) precisioncut liver slices exposed to benzo[a]pyrene and 17α-ethynylestradiol. Aquatic Toxicology, 201, 174-186. The article is available in the main thesis. The article is also available at: https://doi.org/10.1016/j.aquatox.2018.06.003
Paper II: Zhang, X., & Jonassen, I. (2020). RASflow: an RNA-Seq analysis workflow with Snakemake. BMC Bioinformatics, 21(1), 1-9. The article is available in the main thesis. The article is also available at: https://doi.org/10.1186/s12859-020-3433-x
Paper III: Zhang, X., & Jonassen, I. (2019). A Comparative Analysis of Feature Selection Methods for Biomarker Discovery in Study of Toxicant-Treated Atlantic Cod (Gadus morhua) Liver. In Symposium of the Norwegian AI Society, Communications in Computer and Information Science (pp. 114-123). Springer, Cham. An accepted version of the article is available at: http://hdl.handle.net/1956/21642
Paper IV: Zhang, X., & Jonassen, I. (2019). An Ensemble Feature Selection Framework Integrating Stability. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 2792-2798). IEEE. An accepted version of the article is available at: http://hdl.handle.net/1956/22457
container/64/c8/95/28/64c89528-9830-4e1c-8f1a-f3b8ff1e87f1
urn:isbn:9788230860526
urn:isbn:9788230857113
https://hdl.handle.net/1956/24159
op_rights Attribution-NonCommercial (CC BY-NC). This item's Creative Commons-license does not apply to the included articles in the thesis.
https://creativecommons.org/licenses/by-nc/4.0/
Copyright the Author.
_version_ 1766358202479280128
spelling ftunivbergen:oai:bora.uib.no:1956/24159 2023-05-15T15:27:47+02:00 Biomarker Discovery Using Statistical and Machine Learning Approaches on Gene Expression Data Zhang, Xiaokang orcid:0000-0003-4684-317X 2020-10-12T16:32:59.951Z application/pdf https://hdl.handle.net/1956/24159 eng eng The University of Bergen Paper I: Yadetie, F., Zhang, X., Hanna, E. M., Aranguren-Abadía, L., Eide, M., Blaser, N., Brun, M., Jonassen, I., Goksøyr, A., & Karlsen, O. A. (2018). RNA-Seq analysis of transcriptome responses in Atlantic cod (Gadus morhua) precisioncut liver slices exposed to benzo[a]pyrene and 17α-ethynylestradiol. Aquatic Toxicology, 201, 174-186. The article is available in the main thesis. The article is also available at: https://doi.org/10.1016/j.aquatox.2018.06.003 Paper II: Zhang, X., & Jonassen, I. (2020). RASflow: an RNA-Seq analysis workflow with Snakemake. BMC Bioinformatics, 21(1), 1-9. The article is available in the main thesis. The article is also available at: https://doi.org/10.1186/s12859-020-3433-x Paper III: Zhang, X., & Jonassen, I. (2019). A Comparative Analysis of Feature Selection Methods for Biomarker Discovery in Study of Toxicant-Treated Atlantic Cod (Gadus morhua) Liver. In Symposium of the Norwegian AI Society, Communications in Computer and Information Science (pp. 114-123). Springer, Cham. An accepted version of the article is available at: http://hdl.handle.net/1956/21642 Paper IV: Zhang, X., & Jonassen, I. (2019). An Ensemble Feature Selection Framework Integrating Stability. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 2792-2798). IEEE. An accepted version of the article is available at: http://hdl.handle.net/1956/22457 container/64/c8/95/28/64c89528-9830-4e1c-8f1a-f3b8ff1e87f1 urn:isbn:9788230860526 urn:isbn:9788230857113 https://hdl.handle.net/1956/24159 Attribution-NonCommercial (CC BY-NC). This item's Creative Commons-license does not apply to the included articles in the thesis. https://creativecommons.org/licenses/by-nc/4.0/ Copyright the Author. Doctoral thesis 2020 ftunivbergen 2023-03-14T17:40:35Z My PhD is affiliated with the dCod 1.0 project (https://www.uib.no/en/dcod): decoding the systems toxicology of Atlantic cod (Gadus morhua), which aims to better understand how cods adapt and react to the stressors in the environment. One of the research topics is to discover the biomarkers which discriminate the fish under normal biological status and the ones that are exposed to toxicants. A biomarker, or biological marker, is an indicator of a biological state in response to an intervention, which can be for example toxic exposure (in toxicology), disease (for example cancer), or drug response (in precision medicine). Biomarker discovery is a very important research topic in toxicology, cancer research, and so on. A good set of biomarkers can give insight into the disease / toxicant response mechanisms and be useful to find if the person has the disease / the fish has been exposed to the toxicant. On the molecular level, a biomarker could be "genotype" - for instance a single nucleotide variant linked with a particular disease or susceptibility; another biomarker could be the level of expression of a gene or a set of genes. In this thesis we focus on the latter one, aiming to find out the informative genes that can help to distinguish samples from different groups from the gene expression profiling. Several transcriptomics technologies can be used to generate the necessary data, and among them, DNA microarray and RNA sequencing (RNA-Seq) have become the most useful methods for whole transcriptome gene expression profiling. Especially RNA-Seq has become an attractive alternative to microarrays since it was introduced. Prior to analysis of gene expression, the RNA-Seq data needs to go through a series of processing steps, so a workflow which can automate the process is highly required. Even though many workflows have been proposed to facilitate this process, their application is usually limited to such as model organisms, high-performance computers, computer fluent users, and so on. To fill these gaps, we ... Doctoral or Postdoctoral Thesis atlantic cod Gadus morhua University of Bergen: Bergen Open Research Archive (BORA-UiB)