Gene prediction by combining outputs from ExonHunter and SGP2

Thesis (M.Sc.)--Memorial University of Newfoundland, 2009. Computational Science Programme Includes bibliographical references (leaves 101-115). Recently gene prediction has become a critical research area in computational biology. This thesis introduces our research on predicting genes in human DNA...

Full description

Bibliographic Details
Main Author: Kuai, Yujing.
Other Authors: Memorial University of Newfoundland. Computational Science Programme
Format: Thesis
Language:English
Published: 2009
Subjects:
Online Access:http://collections.mun.ca/cdm/ref/collection/theses4/id/59778
Description
Summary:Thesis (M.Sc.)--Memorial University of Newfoundland, 2009. Computational Science Programme Includes bibliographical references (leaves 101-115). Recently gene prediction has become a critical research area in computational biology. This thesis introduces our research on predicting genes in human DNA sequences. We present two algorithms to predict human genes by combining two chosen gene finders. One gene finder uses combination methods and another applies cross-species comparative sequence analysis. Based on these algorithms, a client-friendly gene finder can be developed to accurately predict human genes and thus to help discover genetic reasons of incurable human diseases. -- Combination methods and cross-species comparative sequence analysis are two methods which become increasingly helpful. This thesis first summarizes and classifies main algorithms applied in these two methods, respectively. To be specific, we study two gene finders using comparative sequence analysis and three gene finders applying combination methods. Their architectures and experiments are reviewed separately and overall comparisons are done. According to our survey, currently many gene finders can predict genes with an sophisticated accuracy, but either the methods that gene finders apply have limitations, or the application of these gene finders is difficult for biologists and researchers in medicine. Aiming at these two disadvantages, we develop two algorithms to combine outputs of gene finders using combination methods and cross-species comparative sequence analysis. By comparing the genomes of Mus musculus and Canis familiars, the algorithms are firstly tested on the HMR195 dataset and then on the sequence between the markers D3S1259 and D3S3659 on human chromosome 3p25. The results show that to some extent our algorithms improve the performance of the gene finder using either comparative sequence analysis or combination methods, demonstrating their own advantages on predicting different genetic information. Additionally, our work shows an inspiring perspective of developing a gene finder with a more friendly interface.