SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

Recognizing an activity with a single reference sample using metric learning approaches is a promising research field. The majority of few-shot methods focus on object recognition or face-identification. We propose a metric learning approach to reduce the action recognition problem to a nearest neig...

Full description

Bibliographic Details
Main Authors: Memmesheimer, Raphael, Theisen, Nick, Paulus, Dietrich
Format: Article in Journal/Newspaper
Language:unknown
Published: arXiv 2020
Subjects:
DML
Online Access:https://dx.doi.org/10.48550/arxiv.2004.11085
https://arxiv.org/abs/2004.11085
id ftdatacite:10.48550/arxiv.2004.11085
record_format openpolar
spelling ftdatacite:10.48550/arxiv.2004.11085 2023-05-15T16:02:01+02:00 SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition Memmesheimer, Raphael Theisen, Nick Paulus, Dietrich 2020 https://dx.doi.org/10.48550/arxiv.2004.11085 https://arxiv.org/abs/2004.11085 unknown arXiv Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 CC-BY Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences Article CreativeWork article Preprint 2020 ftdatacite https://doi.org/10.48550/arxiv.2004.11085 2022-03-10T16:15:46Z Recognizing an activity with a single reference sample using metric learning approaches is a promising research field. The majority of few-shot methods focus on object recognition or face-identification. We propose a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space. We encode signals into images and extract features using a deep residual CNN. Using triplet loss, we learn a feature embedding. The resulting encoder transforms features into an embedding space in which closer distances encode similar actions while higher distances encode different actions. Our approach is based on a signal level formulation and remains flexible across a variety of modalities. It further outperforms the baseline on the large scale NTU RGB+D 120 dataset for the One-Shot action recognition protocol by 5.6%. With just 60% of the training data, our approach still outperforms the baseline approach by 3.7%. With 40% of the training data, our approach performs comparably well to the second follow up. Further, we show that our approach generalizes well in experiments on the UTD-MHAD dataset for inertial, skeleton and fused data and the Simitate dataset for motion capturing data. Furthermore, our inter-joint and inter-sensor experiments suggest good capabilities on previously unseen setups. : 8 pages, 6 figures, 7 tables Article in Journal/Newspaper DML DataCite Metadata Store (German National Library of Science and Technology)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Computer Vision and Pattern Recognition cs.CV
FOS Computer and information sciences
spellingShingle Computer Vision and Pattern Recognition cs.CV
FOS Computer and information sciences
Memmesheimer, Raphael
Theisen, Nick
Paulus, Dietrich
SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
topic_facet Computer Vision and Pattern Recognition cs.CV
FOS Computer and information sciences
description Recognizing an activity with a single reference sample using metric learning approaches is a promising research field. The majority of few-shot methods focus on object recognition or face-identification. We propose a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space. We encode signals into images and extract features using a deep residual CNN. Using triplet loss, we learn a feature embedding. The resulting encoder transforms features into an embedding space in which closer distances encode similar actions while higher distances encode different actions. Our approach is based on a signal level formulation and remains flexible across a variety of modalities. It further outperforms the baseline on the large scale NTU RGB+D 120 dataset for the One-Shot action recognition protocol by 5.6%. With just 60% of the training data, our approach still outperforms the baseline approach by 3.7%. With 40% of the training data, our approach performs comparably well to the second follow up. Further, we show that our approach generalizes well in experiments on the UTD-MHAD dataset for inertial, skeleton and fused data and the Simitate dataset for motion capturing data. Furthermore, our inter-joint and inter-sensor experiments suggest good capabilities on previously unseen setups. : 8 pages, 6 figures, 7 tables
format Article in Journal/Newspaper
author Memmesheimer, Raphael
Theisen, Nick
Paulus, Dietrich
author_facet Memmesheimer, Raphael
Theisen, Nick
Paulus, Dietrich
author_sort Memmesheimer, Raphael
title SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
title_short SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
title_full SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
title_fullStr SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
title_full_unstemmed SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
title_sort sl-dml: signal level deep metric learning for multimodal one-shot action recognition
publisher arXiv
publishDate 2020
url https://dx.doi.org/10.48550/arxiv.2004.11085
https://arxiv.org/abs/2004.11085
genre DML
genre_facet DML
op_rights Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
cc-by-4.0
op_rightsnorm CC-BY
op_doi https://doi.org/10.48550/arxiv.2004.11085
_version_ 1766397660413034496