Convergence, connectivity, and continuity: topological perspectives for mining novel biological information from ‘omics data

In this thesis, we will explore possible applications of topological data analysis to `omics data. More specifically, we apply the topologically-based data visualisation technique, Mapper, to gene expression data coming from the fish, Arctic charr (\textit{Salvelinus alpinus}). The fish samples come...

Full description

Bibliographic Details
Main Author: Chen, Mel
Format: Thesis
Language:English
Published: 2020
Subjects:
Online Access:http://theses.gla.ac.uk/78978/
https://theses.gla.ac.uk/78978/7/2020chenphd.pdf
https://eleanor.lib.gla.ac.uk/record=b3378033
Description
Summary:In this thesis, we will explore possible applications of topological data analysis to `omics data. More specifically, we apply the topologically-based data visualisation technique, Mapper, to gene expression data coming from the fish, Arctic charr (\textit{Salvelinus alpinus}). The fish samples come from the wild, from lakes in Scotland and Russia. Furthermore, the Arctic charr is an interesting study species, since it commonly occurs in two morphs, a bottom/bank-dwelling benthic morph, and an open-water pelagic morph. In general, these morphs share features which are common across lakes, and so provide an opportunity to study a subspecies-level split which is replicated across different populations. This gives an example of parallelism in evolution, and the fact that the split is replicated allows us to test if there are common underlying changes leading to this split, at the level of identical genes, or sets of genes, or genes involved in the same pathways. We provide an overview of the Mapper algorithm, and also show its application to a breast cancer gene expression dataset, which was the inspiration for our PhD project. When applying Mapper to the Arctic charr, we also investigate the effect of sample size by subsampling the breast cancer data. As well as applying Mapper, we also use a more mathematical view of the gene expression data to provide a new perspective for looking at the commonly used gene analysis techniques in evolutionary biology, namely, differential gene expression, and gene co-expression analysis. Finally, we provide an experiment which could be done in the future, assuming the cost of sequencing continues to fall. This experiment incorporates ideas of optimal transport in trying to reconstruct the developmental landscape of Arctic charr. We also discuss other avenues for future work, and current difficulties with applying topological data analysis to gene expression data from wild samples.