Large Scale Distributed Distance Metric Learning

In large scale machine learning and data mining problems with high feature dimensionality, the Euclidean distance between data points can be uninformative, and Distance Metric Learning (DML) is often desired to learn a proper similarity measure (using side information such as example data pairs bein...

Full description

Bibliographic Details
Main Authors: Xie, Pengtao, Xing, Eric
Format: Text
Language:unknown
Published: 2014
Subjects:
DML
Online Access:http://arxiv.org/abs/1412.5949
id ftarxivpreprints:oai:arXiv.org:1412.5949
record_format openpolar
spelling ftarxivpreprints:oai:arXiv.org:1412.5949 2023-09-05T13:19:04+02:00 Large Scale Distributed Distance Metric Learning Xie, Pengtao Xing, Eric 2014-12-18 http://arxiv.org/abs/1412.5949 unknown http://arxiv.org/abs/1412.5949 Computer Science - Machine Learning text 2014 ftarxivpreprints 2023-08-16T13:31:19Z In large scale machine learning and data mining problems with high feature dimensionality, the Euclidean distance between data points can be uninformative, and Distance Metric Learning (DML) is often desired to learn a proper similarity measure (using side information such as example data pairs being similar or dissimilar). However, high dimensionality and large volume of pairwise constraints in modern big data can lead to prohibitive computational cost for both the original DML formulation in Xing et al. (2002) and later extensions. In this paper, we present a distributed algorithm for DML, and a large-scale implementation on a parameter server architecture. Our approach builds on a parallelizable reformulation of Xing et al. (2002), and an asynchronous stochastic gradient descent optimization procedure. To our knowledge, this is the first distributed solution to DML, and we show that, on a system with 256 CPU cores, our program is able to complete a DML task on a dataset with 1 million data points, 22-thousand features, and 200 million labeled data pairs, in 15 hours; and the learned metric shows great effectiveness in properly measuring distances. Text DML ArXiv.org (Cornell University Library)
institution Open Polar
collection ArXiv.org (Cornell University Library)
op_collection_id ftarxivpreprints
language unknown
topic Computer Science - Machine Learning
spellingShingle Computer Science - Machine Learning
Xie, Pengtao
Xing, Eric
Large Scale Distributed Distance Metric Learning
topic_facet Computer Science - Machine Learning
description In large scale machine learning and data mining problems with high feature dimensionality, the Euclidean distance between data points can be uninformative, and Distance Metric Learning (DML) is often desired to learn a proper similarity measure (using side information such as example data pairs being similar or dissimilar). However, high dimensionality and large volume of pairwise constraints in modern big data can lead to prohibitive computational cost for both the original DML formulation in Xing et al. (2002) and later extensions. In this paper, we present a distributed algorithm for DML, and a large-scale implementation on a parameter server architecture. Our approach builds on a parallelizable reformulation of Xing et al. (2002), and an asynchronous stochastic gradient descent optimization procedure. To our knowledge, this is the first distributed solution to DML, and we show that, on a system with 256 CPU cores, our program is able to complete a DML task on a dataset with 1 million data points, 22-thousand features, and 200 million labeled data pairs, in 15 hours; and the learned metric shows great effectiveness in properly measuring distances.
format Text
author Xie, Pengtao
Xing, Eric
author_facet Xie, Pengtao
Xing, Eric
author_sort Xie, Pengtao
title Large Scale Distributed Distance Metric Learning
title_short Large Scale Distributed Distance Metric Learning
title_full Large Scale Distributed Distance Metric Learning
title_fullStr Large Scale Distributed Distance Metric Learning
title_full_unstemmed Large Scale Distributed Distance Metric Learning
title_sort large scale distributed distance metric learning
publishDate 2014
url http://arxiv.org/abs/1412.5949
genre DML
genre_facet DML
op_relation http://arxiv.org/abs/1412.5949
_version_ 1776199883599904768