Investigating Distance Metric Learning in Semi-supervised Fuzzy c-means Clustering
Abstract — The idea behind distance metric learning (DML) is to accentuate the distance relations found in the training data, maintaining whether the data patterns are similar or dissimilar. In this paper, we investigate in using DML (GDML, LMNN, MCML and NCA) in semi-supervised Fuzzy c-means cluste...
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Text |
Language: | English |
Subjects: | |
Online Access: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.646.7599 http://ima.ac.uk/wp-content/uploads/2014/08/Lai2014b.pdf |
Summary: | Abstract — The idea behind distance metric learning (DML) is to accentuate the distance relations found in the training data, maintaining whether the data patterns are similar or dissimilar. In this paper, we investigate in using DML (GDML, LMNN, MCML and NCA) in semi-supervised Fuzzy c-means clustering and apply them on a real, biomedical dataset and on UCI datasets. We used a cross validation setting with varying amount of labelled data to test our methodology. Out of eight datasets, statistical significant improvement was found on five datasets using ssFCM with DML. This shows that DML can improve ssFCM clustering for some datasets. Further analysis using 2D PCA projection and sum of squared distances before and after DML transformation of the original data are carried out. Interestingly, DML was found to worsen ssFCM clustering in the NTBC dataset with hierarchical clusters. I. |
---|