Distance Measures in Bioinformatics

Many bioinformatics applications rely on the computation of similarities between objects. Distance and similarity measures applied to vectors of characteristics are essential to problems such as classification, clustering and information retrieval. This study explores the usefulness of distance and...

Full description

Bibliographic Details
Main Author: Xiong, Feiyu
Other Authors: Kam, Moshe, Hrebien, Leonid, 1949-
Format: Thesis
Language:English
Published: Drexel University 2015
Subjects:
DML
Online Access:http://hdl.handle.net/1860/idea:6403
id ftdrexeluniv:oai:idea.library.drexel.edu:idea_6403
record_format openpolar
spelling ftdrexeluniv:oai:idea.library.drexel.edu:idea_6403 2023-05-15T16:01:36+02:00 Distance Measures in Bioinformatics Xiong, Feiyu Kam, Moshe Hrebien, Leonid, 1949- 2015-01-01- http://hdl.handle.net/1860/idea:6403 eng eng Drexel University idea:6403 http://hdl.handle.net/1860/idea:6403 Electrical engineering Bioinformatics Computer science Thesis Text 2015 ftdrexeluniv 2019-03-23T23:52:39Z Many bioinformatics applications rely on the computation of similarities between objects. Distance and similarity measures applied to vectors of characteristics are essential to problems such as classification, clustering and information retrieval. This study explores the usefulness of distance and similarity measures in several bioinformatics applications. These applications are in two categories. (1) Estimation of the adverse reaction severity of unknown pharmaceutical treatments, based on the severity of known treatments, in order to provide guidance for testing of the unknown treatments in clinical trials. (2) Classification of cancer tissue types and estimation of cancer stages, based on high-dimensional microarray data, in order to support clinical decisions making. To address the first category, we studied several clustering and classification approaches for binary severity estimation of Cytokine Release Syndrome (CRS). We developed a Severity Estimation using Distance Metric Learning (SE-DML) approach to get graded severity estimation. With binary estimation we were able to identify treatments that caused the most severe response and then built prediction models for CRS. Using the SE-DML approach, we evaluated four known data sets and showed that SE-DML outperformed other widely used methods on these data sets. For the second category, we presented Kernelized Information-Theoretic Metric Learning (KITML) algorithms that optimize distance metrics and effectively handle high-dimensional data. This learned metric by KITML is used to improve the performance of $k$-nearest neighbor classification for cancer tissue microarray data. We evaluated our approach on fourteen (14) cancer microarray data sets and compared our results with other state-of-the-art approaches. We achieved the best overall performance for the classification task. In addition we tested the KITML algorithm in estimating the severity stages of cancer samples, with accurate results. Ph.D., Electrical Engineering -- Drexel University, 2015 Thesis DML Drexel University: iDEA - Drexel Libraries E-Repository And Archives
institution Open Polar
collection Drexel University: iDEA - Drexel Libraries E-Repository And Archives
op_collection_id ftdrexeluniv
language English
topic Electrical engineering
Bioinformatics
Computer science
spellingShingle Electrical engineering
Bioinformatics
Computer science
Xiong, Feiyu
Distance Measures in Bioinformatics
topic_facet Electrical engineering
Bioinformatics
Computer science
description Many bioinformatics applications rely on the computation of similarities between objects. Distance and similarity measures applied to vectors of characteristics are essential to problems such as classification, clustering and information retrieval. This study explores the usefulness of distance and similarity measures in several bioinformatics applications. These applications are in two categories. (1) Estimation of the adverse reaction severity of unknown pharmaceutical treatments, based on the severity of known treatments, in order to provide guidance for testing of the unknown treatments in clinical trials. (2) Classification of cancer tissue types and estimation of cancer stages, based on high-dimensional microarray data, in order to support clinical decisions making. To address the first category, we studied several clustering and classification approaches for binary severity estimation of Cytokine Release Syndrome (CRS). We developed a Severity Estimation using Distance Metric Learning (SE-DML) approach to get graded severity estimation. With binary estimation we were able to identify treatments that caused the most severe response and then built prediction models for CRS. Using the SE-DML approach, we evaluated four known data sets and showed that SE-DML outperformed other widely used methods on these data sets. For the second category, we presented Kernelized Information-Theoretic Metric Learning (KITML) algorithms that optimize distance metrics and effectively handle high-dimensional data. This learned metric by KITML is used to improve the performance of $k$-nearest neighbor classification for cancer tissue microarray data. We evaluated our approach on fourteen (14) cancer microarray data sets and compared our results with other state-of-the-art approaches. We achieved the best overall performance for the classification task. In addition we tested the KITML algorithm in estimating the severity stages of cancer samples, with accurate results. Ph.D., Electrical Engineering -- Drexel University, 2015
author2 Kam, Moshe
Hrebien, Leonid, 1949-
format Thesis
author Xiong, Feiyu
author_facet Xiong, Feiyu
author_sort Xiong, Feiyu
title Distance Measures in Bioinformatics
title_short Distance Measures in Bioinformatics
title_full Distance Measures in Bioinformatics
title_fullStr Distance Measures in Bioinformatics
title_full_unstemmed Distance Measures in Bioinformatics
title_sort distance measures in bioinformatics
publisher Drexel University
publishDate 2015
url http://hdl.handle.net/1860/idea:6403
genre DML
genre_facet DML
op_relation idea:6403
http://hdl.handle.net/1860/idea:6403
_version_ 1766397387513790464