Predicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learning

The advent of inexpensive and high-throughput genome sequencing technologies has facilitated the acquisition of patient exome and genome sequences at a vast scale. One of the primary challenges of such data is its functional interpretation, and specifically, the ability to distinguish functionally i...

Full description

Bibliographic Details
Main Author: Saha Mandal, Arnab
Other Authors: De Koning, A. P. Jason, Bernier, François P., Wasmuth, James D., Rodrigue, Nicolas
Format: Doctoral or Postdoctoral Thesis
Language:English
Published: Cumming School of Medicine 2019
Subjects:
Online Access:http://hdl.handle.net/1880/110303
https://doi.org/10.11575/PRISM/36479
id ftunivcalgary:oai:prism.ucalgary.ca:1880/110303
record_format openpolar
spelling ftunivcalgary:oai:prism.ucalgary.ca:1880/110303 2023-08-27T04:12:18+02:00 Predicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learning Saha Mandal, Arnab De Koning, A. P. Jason Bernier, François P. Wasmuth, James D. Rodrigue, Nicolas 2019-04-30 application/pdf http://hdl.handle.net/1880/110303 https://doi.org/10.11575/PRISM/36479 eng eng Cumming School of Medicine University of Calgary Saha Mandal, A. (2019). Predicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learning (Unpublished doctoral thesis). University of Calgary, Calgary, AB. http://dx.doi.org/10.11575/PRISM/36479 http://hdl.handle.net/1880/110303 University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. bioinformatics genomics machine learning genomic variants classification pathogenic benign rare disease genetics Artificial Intelligence Computer Science doctoral thesis 2019 ftunivcalgary https://doi.org/10.11575/PRISM/36479 2023-08-06T06:23:34Z The advent of inexpensive and high-throughput genome sequencing technologies has facilitated the acquisition of patient exome and genome sequences at a vast scale. One of the primary challenges of such data is its functional interpretation, and specifically, the ability to distinguish functionally important, deleterious, and pathogenic variants from neutral or benign variants (“variant impact prediction” or VIP). Over the last two decades, many approaches have been proposed for VIP, which utilize data from patterns of evolutionary conservation, population genomics, protein structures and other sources to inform machine learning classification algorithms. However, existing approaches are fraught with limitations, especially when they are trained on databases of putatively pathogenic variants that may have been identified with reference to existing prediction methods (a type of ‘circularity’). This dissertation identifies shortcomings of existing variant impact prediction methods and discusses how they can be better understood (Chapter 1). Approaches to overcome these shortcomings are presented (Chapter 2), and a new method, TAIGA (Transformation and Integration of Genomic Annotations), is developed. The utility of this method and its accompanying refinements are evaluated (Chapter 3) and later scrutinized (Chapter 4). As part of this work, I have produced TAIGA scores for all protein coding positions of the human genome, and I show these have substantially superior performance in distinguishing known pathogenic variations from neutral variations in a number of high-quality datasets. Variant prediction scores from TAIGA are later integrated with clinical information from human phenotypes (Chapter 5) and this extension demonstrated the highest sensitivity and smallest candidate gene search space over a large set of rare genetic disorders. It is my hope that TAIGA will aide clinicians and researchers alike in the new era of personalized genomic medicine in which we find ourselves. Doctoral or Postdoctoral Thesis taiga PRISM - University of Calgary Digital Repository
institution Open Polar
collection PRISM - University of Calgary Digital Repository
op_collection_id ftunivcalgary
language English
topic bioinformatics
genomics
machine learning
genomic variants
classification
pathogenic
benign
rare disease
genetics
Artificial Intelligence
Computer Science
spellingShingle bioinformatics
genomics
machine learning
genomic variants
classification
pathogenic
benign
rare disease
genetics
Artificial Intelligence
Computer Science
Saha Mandal, Arnab
Predicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learning
topic_facet bioinformatics
genomics
machine learning
genomic variants
classification
pathogenic
benign
rare disease
genetics
Artificial Intelligence
Computer Science
description The advent of inexpensive and high-throughput genome sequencing technologies has facilitated the acquisition of patient exome and genome sequences at a vast scale. One of the primary challenges of such data is its functional interpretation, and specifically, the ability to distinguish functionally important, deleterious, and pathogenic variants from neutral or benign variants (“variant impact prediction” or VIP). Over the last two decades, many approaches have been proposed for VIP, which utilize data from patterns of evolutionary conservation, population genomics, protein structures and other sources to inform machine learning classification algorithms. However, existing approaches are fraught with limitations, especially when they are trained on databases of putatively pathogenic variants that may have been identified with reference to existing prediction methods (a type of ‘circularity’). This dissertation identifies shortcomings of existing variant impact prediction methods and discusses how they can be better understood (Chapter 1). Approaches to overcome these shortcomings are presented (Chapter 2), and a new method, TAIGA (Transformation and Integration of Genomic Annotations), is developed. The utility of this method and its accompanying refinements are evaluated (Chapter 3) and later scrutinized (Chapter 4). As part of this work, I have produced TAIGA scores for all protein coding positions of the human genome, and I show these have substantially superior performance in distinguishing known pathogenic variations from neutral variations in a number of high-quality datasets. Variant prediction scores from TAIGA are later integrated with clinical information from human phenotypes (Chapter 5) and this extension demonstrated the highest sensitivity and smallest candidate gene search space over a large set of rare genetic disorders. It is my hope that TAIGA will aide clinicians and researchers alike in the new era of personalized genomic medicine in which we find ourselves.
author2 De Koning, A. P. Jason
Bernier, François P.
Wasmuth, James D.
Rodrigue, Nicolas
format Doctoral or Postdoctoral Thesis
author Saha Mandal, Arnab
author_facet Saha Mandal, Arnab
author_sort Saha Mandal, Arnab
title Predicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learning
title_short Predicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learning
title_full Predicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learning
title_fullStr Predicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learning
title_full_unstemmed Predicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learning
title_sort predicting the evolutionary and medical significance of human genetic variations with machine learning
publisher Cumming School of Medicine
publishDate 2019
url http://hdl.handle.net/1880/110303
https://doi.org/10.11575/PRISM/36479
genre taiga
genre_facet taiga
op_relation Saha Mandal, A. (2019). Predicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learning (Unpublished doctoral thesis). University of Calgary, Calgary, AB.
http://dx.doi.org/10.11575/PRISM/36479
http://hdl.handle.net/1880/110303
op_rights University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
op_doi https://doi.org/10.11575/PRISM/36479
_version_ 1775356296414887936