Large Scale Distributed Distance Metric Learning
In large scale machine learning and data mining problems with high feature dimensionality, the Euclidean distance between data points can be uninformative, and Distance Metric Learning (DML) is often desired to learn a proper similarity measure (using side information such as example data pairs bein...
Main Authors: | , |
---|---|
Format: | Text |
Language: | unknown |
Published: |
2014
|
Subjects: | |
Online Access: | http://arxiv.org/abs/1412.5949 |
id |
ftarxivpreprints:oai:arXiv.org:1412.5949 |
---|---|
record_format |
openpolar |
spelling |
ftarxivpreprints:oai:arXiv.org:1412.5949 2023-09-05T13:19:04+02:00 Large Scale Distributed Distance Metric Learning Xie, Pengtao Xing, Eric 2014-12-18 http://arxiv.org/abs/1412.5949 unknown http://arxiv.org/abs/1412.5949 Computer Science - Machine Learning text 2014 ftarxivpreprints 2023-08-16T13:31:19Z In large scale machine learning and data mining problems with high feature dimensionality, the Euclidean distance between data points can be uninformative, and Distance Metric Learning (DML) is often desired to learn a proper similarity measure (using side information such as example data pairs being similar or dissimilar). However, high dimensionality and large volume of pairwise constraints in modern big data can lead to prohibitive computational cost for both the original DML formulation in Xing et al. (2002) and later extensions. In this paper, we present a distributed algorithm for DML, and a large-scale implementation on a parameter server architecture. Our approach builds on a parallelizable reformulation of Xing et al. (2002), and an asynchronous stochastic gradient descent optimization procedure. To our knowledge, this is the first distributed solution to DML, and we show that, on a system with 256 CPU cores, our program is able to complete a DML task on a dataset with 1 million data points, 22-thousand features, and 200 million labeled data pairs, in 15 hours; and the learned metric shows great effectiveness in properly measuring distances. Text DML ArXiv.org (Cornell University Library) |
institution |
Open Polar |
collection |
ArXiv.org (Cornell University Library) |
op_collection_id |
ftarxivpreprints |
language |
unknown |
topic |
Computer Science - Machine Learning |
spellingShingle |
Computer Science - Machine Learning Xie, Pengtao Xing, Eric Large Scale Distributed Distance Metric Learning |
topic_facet |
Computer Science - Machine Learning |
description |
In large scale machine learning and data mining problems with high feature dimensionality, the Euclidean distance between data points can be uninformative, and Distance Metric Learning (DML) is often desired to learn a proper similarity measure (using side information such as example data pairs being similar or dissimilar). However, high dimensionality and large volume of pairwise constraints in modern big data can lead to prohibitive computational cost for both the original DML formulation in Xing et al. (2002) and later extensions. In this paper, we present a distributed algorithm for DML, and a large-scale implementation on a parameter server architecture. Our approach builds on a parallelizable reformulation of Xing et al. (2002), and an asynchronous stochastic gradient descent optimization procedure. To our knowledge, this is the first distributed solution to DML, and we show that, on a system with 256 CPU cores, our program is able to complete a DML task on a dataset with 1 million data points, 22-thousand features, and 200 million labeled data pairs, in 15 hours; and the learned metric shows great effectiveness in properly measuring distances. |
format |
Text |
author |
Xie, Pengtao Xing, Eric |
author_facet |
Xie, Pengtao Xing, Eric |
author_sort |
Xie, Pengtao |
title |
Large Scale Distributed Distance Metric Learning |
title_short |
Large Scale Distributed Distance Metric Learning |
title_full |
Large Scale Distributed Distance Metric Learning |
title_fullStr |
Large Scale Distributed Distance Metric Learning |
title_full_unstemmed |
Large Scale Distributed Distance Metric Learning |
title_sort |
large scale distributed distance metric learning |
publishDate |
2014 |
url |
http://arxiv.org/abs/1412.5949 |
genre |
DML |
genre_facet |
DML |
op_relation |
http://arxiv.org/abs/1412.5949 |
_version_ |
1776199883599904768 |