Metric learning for multivariate time series analysis using DTW: application to remote sensing and software engineering

In the context of growing availability of data, Time Series are essential for extracting and understanding the evolution of underlying natural, artificial, social or economic phenomena. The related literature has extensively shown that the Dynamic Time Warping, in conjunction with some local/base di...

Full description

Bibliographic Details
Main Author: Salaou, Abdoul-Djawadou
Other Authors: Damian, Daniela, Gançarski, Pierre
Format: Thesis
Language:English
Published: 2020
Subjects:
DTW
DML
Online Access:http://hdl.handle.net/1828/12575
Description
Summary:In the context of growing availability of data, Time Series are essential for extracting and understanding the evolution of underlying natural, artificial, social or economic phenomena. The related literature has extensively shown that the Dynamic Time Warping, in conjunction with some local/base distance D (e.g. Euclidean distance ), is an effective similarity measure when univariate TS are considered. However, possible statistical coupling among different dimensions make the generalization of this metric to the multivariate case all but obvious. In practice, multivariate TS are describe by \emph{heterogeneous} features which usually highlight different patterns (correlated, noisy, missing or irrelevant features). Therefore, to obtain a "fair" comparison of the data, DTW needs a D which "understands" the space of the data. Indeed, as the complexity of the data increases, defining such a satisfactory base distance/similarity D becomes very difficult. It seems totally unrealistic to define D manually or on the sole basis of an expert opinion. This has ignited our interest in new distance definition capable of capturing such inter-dimension dependencies by leveraging Distance Metric Learning. DML is to learn a distance metric to better discriminate the data by accentuating the distance relation among objects that are considered as (strongly) similar, or conversely (strongly) dissimilar. This information about (dis)similarity is often provided using must-link and cannot-link constraints between objects. However, in the case of voluminous and complex data, providing such constraints remains an open problem. Therefore, we propose a method, based on canopy clustering, to automatically extract the constraints from the dataset. Graduate