Statistical machine learning for data mining and collaborative multimedia retrieval.

Another issue studied in the framework is Distance Metric Learning (DML). Learning distance metrics is critical to many machine learning tasks, especially when contextual information is available. To learn effective metrics from pairwise contextual constraints, two novel methods, Discriminative Comp...

Full description

Bibliographic Details
Other Authors: Hoi, Chu Hong., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
Format: Thesis
Language:English
Chinese
Published: 2006
Subjects:
DML
Online Access:http://library.cuhk.edu.hk/record=b6074301
https://repository.lib.cuhk.edu.hk/en/item/cuhk-343930
Description
Summary:Another issue studied in the framework is Distance Metric Learning (DML). Learning distance metrics is critical to many machine learning tasks, especially when contextual information is available. To learn effective metrics from pairwise contextual constraints, two novel methods, Discriminative Component Analysis (DCA) and Kernel DCA, are proposed to learn both linear and nonlinear distance metrics. Empirical results on data clustering validate the advantages of the algorithms. Based on this unified learning framework, a novel scheme is suggested for learning Unified Kernel Machines (UKM). The UKM scheme combines supervised kernel machine learning, unsupervised kernel de sign, semi-supervised kernel learning, and active learning in an effective fashion. A key component in the UKM scheme is to learn kernels from both labeled and unlabeled data. To this purpose; a new Spectral Kernel Learning (SKL) algorithm is proposed, which is related to a quadratic program. Empirical results show that the UKM technique is promising for classification tasks. In addition to the above methodologies, this thesis also addresses some practical issues in applying machine learning techniques to real-world applications. For example, in a time-dependent data mining application, in order to design a domain-specific kernel, marginalized kernel techniques are suggested to formulate an effective kernel aimed at web data mining tasks. Last, the thesis investigates statistical machine learning techniques with applications to multimedia retrieval and addresses some practical issues, such as robustness to noise and scalability. To bridge semantic gap issues of multimedia retrieval, a Collaborative Multimedia Retrieval (CMR) scheme is proposed to exploit historical log data of users' relevance feedback for improving retrieval tasks. Two types of learning tasks in the CMR scheme are identified and two innovative algorithms are proposed to effectively solve the problems respectively. Statistical machine learning techniques have been widely ...