Distance metric learning for multi-modal image retrieval and annotation

With the rapid growth of digital cameras and photo sharing websites, content-based image retrieval (CBIR) and search-based image annotation are important techniques for many real-world multimedia applications. They remain open challenges today, despite being studied extensively for a few decades in...

Full description

Bibliographic Details
Main Author: Wu, Pengcheng
Other Authors: Hoi Chu Hong, School of Computer Engineering, Centre for Computational Intelligence
Format: Thesis
Language:English
Published: 2014
Subjects:
DML
Online Access:http://hdl.handle.net/10356/60499
Description
Summary:With the rapid growth of digital cameras and photo sharing websites, content-based image retrieval (CBIR) and search-based image annotation are important techniques for many real-world multimedia applications. They remain open challenges today, despite being studied extensively for a few decades in several communities, including multimedia, signal processing, and computer vision. One key challenge of CBIR is to find an effective similarity search scheme to accurately retrieve a short list of most similar images from a massive collection of images. The conventional CBIR approaches usually adopt rigid measures to evaluate similarity of images, such as the classical Euclidean distance or cosine similarity, which are often limited despite being widely used in many applications. In this thesis, we investigate Distance Metric Learning (DML) techniques to improve visual similarity search in multimedia information retrieval tasks. In particular, we propose three kinds of novel machine learning algorithms to tackle the challenges of content-based image retrieval and search-based image annotation. Firstly, we present a novel Unified Distance Metric Learning (UDML) scheme for mining social images towards automated image annotation. To effectively discover knowledge from social images that are often associated with multimedia contents (including visual images and textual tags), UDML not only exploits both visual and textual contents of social images, but also effectively unifies both inductive and transductive metric learning techniques in a systematic learning framework. The UDML task is formulated as a convex optimization problem, i.e., a Semi-Definite Program (SDP) which is in general difficult to solve. To overcome the challenging optimization task of UDML, we develop an efficient stochastic gradient descent algorithm for solving the optimization task and prove the convergence of the proposed algorithm. By applying the UDML technique to the search-based image annotation task on a large real-world testbed in our ...