SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
Recognizing an activity with a single reference sample using metric learning approaches is a promising research field. The majority of few-shot methods focus on object recognition or face-identification. We propose a metric learning approach to reduce the action recognition problem to a nearest neig...
Main Authors: | , , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
arXiv
2020
|
Subjects: | |
Online Access: | https://dx.doi.org/10.48550/arxiv.2004.11085 https://arxiv.org/abs/2004.11085 |
id |
ftdatacite:10.48550/arxiv.2004.11085 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.48550/arxiv.2004.11085 2023-05-15T16:02:01+02:00 SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition Memmesheimer, Raphael Theisen, Nick Paulus, Dietrich 2020 https://dx.doi.org/10.48550/arxiv.2004.11085 https://arxiv.org/abs/2004.11085 unknown arXiv Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 CC-BY Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences Article CreativeWork article Preprint 2020 ftdatacite https://doi.org/10.48550/arxiv.2004.11085 2022-03-10T16:15:46Z Recognizing an activity with a single reference sample using metric learning approaches is a promising research field. The majority of few-shot methods focus on object recognition or face-identification. We propose a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space. We encode signals into images and extract features using a deep residual CNN. Using triplet loss, we learn a feature embedding. The resulting encoder transforms features into an embedding space in which closer distances encode similar actions while higher distances encode different actions. Our approach is based on a signal level formulation and remains flexible across a variety of modalities. It further outperforms the baseline on the large scale NTU RGB+D 120 dataset for the One-Shot action recognition protocol by 5.6%. With just 60% of the training data, our approach still outperforms the baseline approach by 3.7%. With 40% of the training data, our approach performs comparably well to the second follow up. Further, we show that our approach generalizes well in experiments on the UTD-MHAD dataset for inertial, skeleton and fused data and the Simitate dataset for motion capturing data. Furthermore, our inter-joint and inter-sensor experiments suggest good capabilities on previously unseen setups. : 8 pages, 6 figures, 7 tables Article in Journal/Newspaper DML DataCite Metadata Store (German National Library of Science and Technology) |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences |
spellingShingle |
Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences Memmesheimer, Raphael Theisen, Nick Paulus, Dietrich SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition |
topic_facet |
Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences |
description |
Recognizing an activity with a single reference sample using metric learning approaches is a promising research field. The majority of few-shot methods focus on object recognition or face-identification. We propose a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space. We encode signals into images and extract features using a deep residual CNN. Using triplet loss, we learn a feature embedding. The resulting encoder transforms features into an embedding space in which closer distances encode similar actions while higher distances encode different actions. Our approach is based on a signal level formulation and remains flexible across a variety of modalities. It further outperforms the baseline on the large scale NTU RGB+D 120 dataset for the One-Shot action recognition protocol by 5.6%. With just 60% of the training data, our approach still outperforms the baseline approach by 3.7%. With 40% of the training data, our approach performs comparably well to the second follow up. Further, we show that our approach generalizes well in experiments on the UTD-MHAD dataset for inertial, skeleton and fused data and the Simitate dataset for motion capturing data. Furthermore, our inter-joint and inter-sensor experiments suggest good capabilities on previously unseen setups. : 8 pages, 6 figures, 7 tables |
format |
Article in Journal/Newspaper |
author |
Memmesheimer, Raphael Theisen, Nick Paulus, Dietrich |
author_facet |
Memmesheimer, Raphael Theisen, Nick Paulus, Dietrich |
author_sort |
Memmesheimer, Raphael |
title |
SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition |
title_short |
SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition |
title_full |
SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition |
title_fullStr |
SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition |
title_full_unstemmed |
SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition |
title_sort |
sl-dml: signal level deep metric learning for multimodal one-shot action recognition |
publisher |
arXiv |
publishDate |
2020 |
url |
https://dx.doi.org/10.48550/arxiv.2004.11085 https://arxiv.org/abs/2004.11085 |
genre |
DML |
genre_facet |
DML |
op_rights |
Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 |
op_rightsnorm |
CC-BY |
op_doi |
https://doi.org/10.48550/arxiv.2004.11085 |
_version_ |
1766397660413034496 |