1264 - SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

ICPR Browser Link: https://ailb-web.ing.unimore.it/icpr/paper/875/nn Abstract: Recognizing an activity with a single reference sample using metric learning approaches is a promising research field. The majority of few-shot methods focus on object recognition or face-identification. We propose a metr...

Full description

Bibliographic Details
Main Authors: 25th International Conference on Pattern Recognition 2021, Memmesheimer, Raphael
Format: Article in Journal/Newspaper
Language:unknown
Published: Underline Science Inc. 2020
Subjects:
DML
Online Access:https://dx.doi.org/10.48448/k0ba-nv96
https://underline.io/lecture/11411-1264---sl-dml-signal-level-deep-metric-learning-for-multimodal-one-shot-action-recognition
id ftdatacite:10.48448/k0ba-nv96
record_format openpolar
spelling ftdatacite:10.48448/k0ba-nv96 2023-05-15T16:02:01+02:00 1264 - SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition 25th International Conference on Pattern Recognition 2021 Memmesheimer, Raphael 2020 https://dx.doi.org/10.48448/k0ba-nv96 https://underline.io/lecture/11411-1264---sl-dml-signal-level-deep-metric-learning-for-multimodal-one-shot-action-recognition unknown Underline Science Inc. Computer vision Pattern recognition Machine Learning MediaObject article Conference talk Audiovisual 2020 ftdatacite https://doi.org/10.48448/k0ba-nv96 2022-02-09T11:27:25Z ICPR Browser Link: https://ailb-web.ing.unimore.it/icpr/paper/875/nn Abstract: Recognizing an activity with a single reference sample using metric learning approaches is a promising research field. The majority of few-shot methods focus on object recognition or face-identification. We propose a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space. We encode signals into images and extract features using a deep residual CNN. Using triplet loss, we learn a feature embedding. The resulting encoder transforms features into an embedding space in which closer distances encode similar actions while higher distances encode different actions. Our approach is based on a signal level formulation and remains flexible across a variety of modalities. It further outperforms the baseline on the large scale NTU RGB+D 120 dataset for the One-Shot action recognition protocol by 5.6%. With just 60% of the training data, our approach still outperforms the baseline approach by 3.7%. With 40% of the training data, our approach performs comparably well to the second follow up. Further, we show that our approach generalizes well in experiments on the UTD-MHAD dataset for inertial, skeleton and fused data and the Simitate dataset for motion capturing data. Furthermore, our inter-joint and inter-sensor experiments suggest good capabilities on previously unseen setups. Article in Journal/Newspaper DML DataCite Metadata Store (German National Library of Science and Technology)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Computer vision
Pattern recognition
Machine Learning
spellingShingle Computer vision
Pattern recognition
Machine Learning
25th International Conference on Pattern Recognition 2021
Memmesheimer, Raphael
1264 - SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
topic_facet Computer vision
Pattern recognition
Machine Learning
description ICPR Browser Link: https://ailb-web.ing.unimore.it/icpr/paper/875/nn Abstract: Recognizing an activity with a single reference sample using metric learning approaches is a promising research field. The majority of few-shot methods focus on object recognition or face-identification. We propose a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space. We encode signals into images and extract features using a deep residual CNN. Using triplet loss, we learn a feature embedding. The resulting encoder transforms features into an embedding space in which closer distances encode similar actions while higher distances encode different actions. Our approach is based on a signal level formulation and remains flexible across a variety of modalities. It further outperforms the baseline on the large scale NTU RGB+D 120 dataset for the One-Shot action recognition protocol by 5.6%. With just 60% of the training data, our approach still outperforms the baseline approach by 3.7%. With 40% of the training data, our approach performs comparably well to the second follow up. Further, we show that our approach generalizes well in experiments on the UTD-MHAD dataset for inertial, skeleton and fused data and the Simitate dataset for motion capturing data. Furthermore, our inter-joint and inter-sensor experiments suggest good capabilities on previously unseen setups.
format Article in Journal/Newspaper
author 25th International Conference on Pattern Recognition 2021
Memmesheimer, Raphael
author_facet 25th International Conference on Pattern Recognition 2021
Memmesheimer, Raphael
author_sort 25th International Conference on Pattern Recognition 2021
title 1264 - SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
title_short 1264 - SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
title_full 1264 - SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
title_fullStr 1264 - SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
title_full_unstemmed 1264 - SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
title_sort 1264 - sl-dml: signal level deep metric learning for multimodal one-shot action recognition
publisher Underline Science Inc.
publishDate 2020
url https://dx.doi.org/10.48448/k0ba-nv96
https://underline.io/lecture/11411-1264---sl-dml-signal-level-deep-metric-learning-for-multimodal-one-shot-action-recognition
genre DML
genre_facet DML
op_doi https://doi.org/10.48448/k0ba-nv96
_version_ 1766397660240019456