MPOSE2021: a Dataset for Short-Time Pose-Based Human Action Recognition

This repository contains the MPOSE2021 Dataset for short-time pose-based Human Action Recognition (HAR). MPOSE2021 is specifically designed to perform short-time Human Action Recognition. MPOSE2021 is developed as an evolution of the MPOSE Dataset [1-3]. It is made by human pose data detected by Ope...

Full description

Bibliographic Details
Main Authors:	Mazzia, Vittorio, Angarano, Simone, Salvetti, Francesco, Angelini, Federico, Chiaberge, Marcello
Format:	Dataset
Language:	English
Published:	Zenodo 2021
Subjects:	Deep Learning, Action Recognition, Graph, Classification, 2D Pose, OpenPose, PoseNet Hilton laptev
Online Access:	https://dx.doi.org/10.5281/zenodo.5801993 https://zenodo.org/record/5801993

id	ftdatacite:10.5281/zenodo.5801993
record_format	openpolar
institution	Open Polar
collection	DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id	ftdatacite
language	English
topic	Deep Learning, Action Recognition, Graph, Classification, 2D Pose, OpenPose, PoseNet
spellingShingle	Deep Learning, Action Recognition, Graph, Classification, 2D Pose, OpenPose, PoseNet Mazzia, Vittorio Angarano, Simone Salvetti, Francesco Angelini, Federico Chiaberge, Marcello MPOSE2021: a Dataset for Short-Time Pose-Based Human Action Recognition
topic_facet	Deep Learning, Action Recognition, Graph, Classification, 2D Pose, OpenPose, PoseNet
description	This repository contains the MPOSE2021 Dataset for short-time pose-based Human Action Recognition (HAR). MPOSE2021 is specifically designed to perform short-time Human Action Recognition. MPOSE2021 is developed as an evolution of the MPOSE Dataset [1-3]. It is made by human pose data detected by OpenPose [4] and Posenet [11] on popular datasets for HAR, i.e. Weizmann [5], i3DPost [6], IXMAS [7], KTH [8], UTKinetic-Action3D (RGB only) [9] and UTD-MHAD (RGB only) [10], alongside original video datasets, i.e. ISLD and ISLD-Additional-Sequences [1]. Since these datasets have heterogenous action labels, each dataset labels are remapped to a common and homogeneous list of actions. Generated sequences have a number of frames between 20 and 30. Sequences are obtained by cutting the so-called Precursor videos (video from the above-mentioned datasets), with non-overlapping sliding windows. Frames where OpenPose/PoseNet cannot detect any subject are automatically discarded. Resulting samples contain one subject at the time, performing a fraction of a single action. Overall, MPOSE2021 contains 15429 samples, divided into 20 actions, performed by 100 subjects. More information about the dataset can be found in the MPOSE2021 repository, also providing a user-friendly Python package to import and use the dataset by just running the command pip install mpose Data Structure The repository contains 3 datasets for each pose extractor (namely 1, 2 and 3) which consist of the same data divided in different train/test splits. Each dataset contains X and y numpy arrays for both training and testing. X has the following shape: (B, T, K, C) where B is the batch number; T (= 30) is the duration of the sequences in frames (zero-padded in the case of shorter sequences); K (= 17 for PoseNet and 25 for OpenPose) is the number of pose keypoints; C (= 3) is the number of channels, comprehending 2D keypoint coordinates (x,y) in the original video reference frame and the keypoint confidence (p <= 1) The .txt files specifying the metadata associated with the split samples are also included. References MPOSE2021 is part of a paper published by the Pattern Recognition Journal (Elsevier), and is intended for scientific research purposes. If you want to use MPOSE2021 for your research work, please also cite [1-11]. @article{mazzia2021action, title={Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition}, author={Mazzia, Vittorio and Angarano, Simone and Salvetti, Francesco and Angelini, Federico and Chiaberge, Marcello}, journal={Pattern Recognition}, pages={108487}, year={2021}, publisher={Elsevier} } [1] Angelini, F., Fu, Z., Long, Y., Shao, L., & Naqvi, S. M. (2019). 2D Pose-Based Real-Time Human Action Recognition With Occlusion-Handling. IEEE Transactions on Multimedia, 22(6), 1433-1446. [2] Angelini, F., Yan, J., & Naqvi, S. M. (2019, May). Privacy-preserving Online Human Behaviour Anomaly Detection Based on Body Movements and Objects Positions. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8444-8448). IEEE. [3] Angelini, F., & Naqvi, S. M. (2019, July). Joint RGB-Pose Based Human Action Recognition for Anomaly Detection Applications. In 2019 22th International Conference on Information Fusion (FUSION) (pp. 1-7). IEEE. [4] Cao, Z., Hidalgo, G., Simon, T., Wei, S. E., & Sheikh, Y. (2019). OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence, 43(1), 172-186. [5] Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as Space-Time Shapes. IEEE transactions on pattern analysis and machine intelligence, 29(12), 2247-2253. [6] Starck, J., & Hilton, A. (2007). Surface Capture for Performance-Based Animation. IEEE computer graphics and applications, 27(3), 21-31. [7] Weinland, D., Özuysal, M., & Fua, P. (2010, September). Making Action Recognition Robust to Occlusions and Viewpoint Changes. In European Conference on Computer Vision (pp. 635-648). Springer, Berlin, Heidelberg. [8] Schuldt, C., Laptev, I., & Caputo, B. (2004, August). Recognizing Human Actions: a Local SVM Approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. (Vol. 3, pp. 32-36). IEEE. [9] Xia, L., Chen, C. C., & Aggarwal, J. K. (2012, June). View Invariant Human Action Recognition using Histograms of 3D Joints. In 2012 IEEE computer society conference on computer vision and pattern recognition workshops (pp. 20-27). IEEE. [10] Chen, C., Jafari, R., & Kehtarnavaz, N. (2015, September). UTD-MHAD: A Multimodal Dataset for Human Action Recognition utilizing a Depth Camera and a Wearable Inertial Sensor. In 2015 IEEE International conference on image processing (ICIP) (pp. 168-172). IEEE. [11] Papandreou, G., Zhu, T., Chen, L. C., Gidaris, S., Tompson, J., & Murphy, K. (2018). Personlab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 269-286). : {"references": ["Angelini, F., Fu, Z., Long, Y., Shao, L., & Naqvi, S. M. (2019). 2D Pose-Based Real-Time Human Action Recognition With Occlusion-Handling. IEEE Transactions on Multimedia, 22(6), 1433-1446.", "Angelini, F., Yan, J., & Naqvi, S. M. (2019, May). Privacy-preserving Online Human Behaviour Anomaly Detection Based on Body Movements and Objects Positions. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8444-8448). IEEE.", "Angelini, F., & Naqvi, S. M. (2019, July). Joint RGB-Pose Based Human Action Recognition for Anomaly Detection Applications. In 2019 22th International Conference on Information Fusion (FUSION) (pp. 1-7). IEEE.", "Cao, Z., Hidalgo, G., Simon, T., Wei, S. E., & Sheikh, Y. (2019). OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence, 43(1), 172-186.", "Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as Space-Time Shapes. IEEE transactions on pattern analysis and machine intelligence, 29(12), 2247-2253.", "Starck, J., & Hilton, A. (2007). Surface Capture for Performance-Based Animation. IEEE computer graphics and applications, 27(3), 21-31.", "Weinland, D., \u00d6zuysal, M., & Fua, P. (2010, September). Making Action Recognition Robust to Occlusions and Viewpoint Changes. In European Conference on Computer Vision (pp. 635-648). Springer, Berlin, Heidelberg.", "Schuldt, C., Laptev, I., & Caputo, B. (2004, August). Recognizing Human Actions: a Local SVM Approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. (Vol. 3, pp. 32-36). IEEE.", "Xia, L., Chen, C. C., & Aggarwal, J. K. (2012, June). View Invariant Human Action Recognition using Histograms of 3D Joints. In 2012 IEEE computer society conference on computer vision and pattern recognition workshops (pp. 20-27). IEEE.", "Chen, C., Jafari, R., & Kehtarnavaz, N. (2015, September). UTD-MHAD: A Multimodal Dataset for Human Action Recognition utilizing a Depth Camera and a Wearable Inertial Sensor. In 2015 IEEE International conference on image processing (ICIP) (pp. 168-172). IEEE.", "Papandreou, G., Zhu, T., Chen, L. C., Gidaris, S., Tompson, J., & Murphy, K. (2018). Personlab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 269-286)."]}
format	Dataset
author	Mazzia, Vittorio Angarano, Simone Salvetti, Francesco Angelini, Federico Chiaberge, Marcello
author_facet	Mazzia, Vittorio Angarano, Simone Salvetti, Francesco Angelini, Federico Chiaberge, Marcello
author_sort	Mazzia, Vittorio
title	MPOSE2021: a Dataset for Short-Time Pose-Based Human Action Recognition
title_short	MPOSE2021: a Dataset for Short-Time Pose-Based Human Action Recognition
title_full	MPOSE2021: a Dataset for Short-Time Pose-Based Human Action Recognition
title_fullStr	MPOSE2021: a Dataset for Short-Time Pose-Based Human Action Recognition
title_full_unstemmed	MPOSE2021: a Dataset for Short-Time Pose-Based Human Action Recognition
title_sort	mpose2021: a dataset for short-time pose-based human action recognition
publisher	Zenodo
publishDate	2021
url	https://dx.doi.org/10.5281/zenodo.5801993 https://zenodo.org/record/5801993
long_lat	ENVELOPE(-61.333,-61.333,-72.000,-72.000)
geographic	Hilton
geographic_facet	Hilton
genre	laptev
genre_facet	laptev
op_relation	https://dx.doi.org/10.5281/zenodo.5506688
op_rights	Open Access Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 info:eu-repo/semantics/openAccess
op_rightsnorm	CC-BY
op_doi	https://doi.org/10.5281/zenodo.5801993 https://doi.org/10.5281/zenodo.5506688
_version_	1766062688293617664
spelling	ftdatacite:10.5281/zenodo.5801993 2023-05-15T17:07:19+02:00 MPOSE2021: a Dataset for Short-Time Pose-Based Human Action Recognition Mazzia, Vittorio Angarano, Simone Salvetti, Francesco Angelini, Federico Chiaberge, Marcello 2021 https://dx.doi.org/10.5281/zenodo.5801993 https://zenodo.org/record/5801993 en eng Zenodo https://dx.doi.org/10.5281/zenodo.5506688 Open Access Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 info:eu-repo/semantics/openAccess CC-BY Deep Learning, Action Recognition, Graph, Classification, 2D Pose, OpenPose, PoseNet dataset Dataset 2021 ftdatacite https://doi.org/10.5281/zenodo.5801993 https://doi.org/10.5281/zenodo.5506688 2022-02-08T18:04:30Z This repository contains the MPOSE2021 Dataset for short-time pose-based Human Action Recognition (HAR). MPOSE2021 is specifically designed to perform short-time Human Action Recognition. MPOSE2021 is developed as an evolution of the MPOSE Dataset [1-3]. It is made by human pose data detected by OpenPose [4] and Posenet [11] on popular datasets for HAR, i.e. Weizmann [5], i3DPost [6], IXMAS [7], KTH [8], UTKinetic-Action3D (RGB only) [9] and UTD-MHAD (RGB only) [10], alongside original video datasets, i.e. ISLD and ISLD-Additional-Sequences [1]. Since these datasets have heterogenous action labels, each dataset labels are remapped to a common and homogeneous list of actions. Generated sequences have a number of frames between 20 and 30. Sequences are obtained by cutting the so-called Precursor videos (video from the above-mentioned datasets), with non-overlapping sliding windows. Frames where OpenPose/PoseNet cannot detect any subject are automatically discarded. Resulting samples contain one subject at the time, performing a fraction of a single action. Overall, MPOSE2021 contains 15429 samples, divided into 20 actions, performed by 100 subjects. More information about the dataset can be found in the MPOSE2021 repository, also providing a user-friendly Python package to import and use the dataset by just running the command pip install mpose Data Structure The repository contains 3 datasets for each pose extractor (namely 1, 2 and 3) which consist of the same data divided in different train/test splits. Each dataset contains X and y numpy arrays for both training and testing. X has the following shape: (B, T, K, C) where B is the batch number; T (= 30) is the duration of the sequences in frames (zero-padded in the case of shorter sequences); K (= 17 for PoseNet and 25 for OpenPose) is the number of pose keypoints; C (= 3) is the number of channels, comprehending 2D keypoint coordinates (x,y) in the original video reference frame and the keypoint confidence (p <= 1) The .txt files specifying the metadata associated with the split samples are also included. References MPOSE2021 is part of a paper published by the Pattern Recognition Journal (Elsevier), and is intended for scientific research purposes. If you want to use MPOSE2021 for your research work, please also cite [1-11]. @article{mazzia2021action, title={Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition}, author={Mazzia, Vittorio and Angarano, Simone and Salvetti, Francesco and Angelini, Federico and Chiaberge, Marcello}, journal={Pattern Recognition}, pages={108487}, year={2021}, publisher={Elsevier} } [1] Angelini, F., Fu, Z., Long, Y., Shao, L., & Naqvi, S. M. (2019). 2D Pose-Based Real-Time Human Action Recognition With Occlusion-Handling. IEEE Transactions on Multimedia, 22(6), 1433-1446. [2] Angelini, F., Yan, J., & Naqvi, S. M. (2019, May). Privacy-preserving Online Human Behaviour Anomaly Detection Based on Body Movements and Objects Positions. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8444-8448). IEEE. [3] Angelini, F., & Naqvi, S. M. (2019, July). Joint RGB-Pose Based Human Action Recognition for Anomaly Detection Applications. In 2019 22th International Conference on Information Fusion (FUSION) (pp. 1-7). IEEE. [4] Cao, Z., Hidalgo, G., Simon, T., Wei, S. E., & Sheikh, Y. (2019). OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence, 43(1), 172-186. [5] Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as Space-Time Shapes. IEEE transactions on pattern analysis and machine intelligence, 29(12), 2247-2253. [6] Starck, J., & Hilton, A. (2007). Surface Capture for Performance-Based Animation. IEEE computer graphics and applications, 27(3), 21-31. [7] Weinland, D., Özuysal, M., & Fua, P. (2010, September). Making Action Recognition Robust to Occlusions and Viewpoint Changes. In European Conference on Computer Vision (pp. 635-648). Springer, Berlin, Heidelberg. [8] Schuldt, C., Laptev, I., & Caputo, B. (2004, August). Recognizing Human Actions: a Local SVM Approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. (Vol. 3, pp. 32-36). IEEE. [9] Xia, L., Chen, C. C., & Aggarwal, J. K. (2012, June). View Invariant Human Action Recognition using Histograms of 3D Joints. In 2012 IEEE computer society conference on computer vision and pattern recognition workshops (pp. 20-27). IEEE. [10] Chen, C., Jafari, R., & Kehtarnavaz, N. (2015, September). UTD-MHAD: A Multimodal Dataset for Human Action Recognition utilizing a Depth Camera and a Wearable Inertial Sensor. In 2015 IEEE International conference on image processing (ICIP) (pp. 168-172). IEEE. [11] Papandreou, G., Zhu, T., Chen, L. C., Gidaris, S., Tompson, J., & Murphy, K. (2018). Personlab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 269-286). : {"references": ["Angelini, F., Fu, Z., Long, Y., Shao, L., & Naqvi, S. M. (2019). 2D Pose-Based Real-Time Human Action Recognition With Occlusion-Handling. IEEE Transactions on Multimedia, 22(6), 1433-1446.", "Angelini, F., Yan, J., & Naqvi, S. M. (2019, May). Privacy-preserving Online Human Behaviour Anomaly Detection Based on Body Movements and Objects Positions. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8444-8448). IEEE.", "Angelini, F., & Naqvi, S. M. (2019, July). Joint RGB-Pose Based Human Action Recognition for Anomaly Detection Applications. In 2019 22th International Conference on Information Fusion (FUSION) (pp. 1-7). IEEE.", "Cao, Z., Hidalgo, G., Simon, T., Wei, S. E., & Sheikh, Y. (2019). OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence, 43(1), 172-186.", "Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as Space-Time Shapes. IEEE transactions on pattern analysis and machine intelligence, 29(12), 2247-2253.", "Starck, J., & Hilton, A. (2007). Surface Capture for Performance-Based Animation. IEEE computer graphics and applications, 27(3), 21-31.", "Weinland, D., \u00d6zuysal, M., & Fua, P. (2010, September). Making Action Recognition Robust to Occlusions and Viewpoint Changes. In European Conference on Computer Vision (pp. 635-648). Springer, Berlin, Heidelberg.", "Schuldt, C., Laptev, I., & Caputo, B. (2004, August). Recognizing Human Actions: a Local SVM Approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. (Vol. 3, pp. 32-36). IEEE.", "Xia, L., Chen, C. C., & Aggarwal, J. K. (2012, June). View Invariant Human Action Recognition using Histograms of 3D Joints. In 2012 IEEE computer society conference on computer vision and pattern recognition workshops (pp. 20-27). IEEE.", "Chen, C., Jafari, R., & Kehtarnavaz, N. (2015, September). UTD-MHAD: A Multimodal Dataset for Human Action Recognition utilizing a Depth Camera and a Wearable Inertial Sensor. In 2015 IEEE International conference on image processing (ICIP) (pp. 168-172). IEEE.", "Papandreou, G., Zhu, T., Chen, L. C., Gidaris, S., Tompson, J., & Murphy, K. (2018). Personlab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 269-286)."]} Dataset laptev DataCite Metadata Store (German National Library of Science and Technology) Hilton ENVELOPE(-61.333,-61.333,-72.000,-72.000)

MPOSE2021: a Dataset for Short-Time Pose-Based Human Action Recognition

Similar Items