MPOSE2021: a Dataset for Short-time Pose-based Human Action Recognition

MPOSE2021 MPOSE2021 is a Dataset for short-time pose-based Human Action Recognition (HAR). MPOSE2021 is specifically designed to perform short-time Human Action Recognition, as presented in [12]. MPOSE2021 is developed as an evolution of the MPOSE Dataset [1-3]. It is made by human pose data detecte...

Full description

Bibliographic Details
Main Authors: Mazzia, Vittorio, Angarano, Simone, Salvetti, Francesco, Angelini, Federico, Chiaberge, Marcello
Format: Dataset
Language:English
Published: Zenodo 2021
Subjects:
Online Access:https://dx.doi.org/10.5281/zenodo.5507363
https://zenodo.org/record/5507363
Description
Summary:MPOSE2021 MPOSE2021 is a Dataset for short-time pose-based Human Action Recognition (HAR). MPOSE2021 is specifically designed to perform short-time Human Action Recognition, as presented in [12]. MPOSE2021 is developed as an evolution of the MPOSE Dataset [1-3]. It is made by human pose data detected by OpenPose [4] and Posenet [11] on popular datasets for HAR, i.e. Weizmann [5], i3DPost [6], IXMAS [7], KTH [8], UTKinetic-Action3D (RGB only) [9] and UTD-MHAD (RGB only) [10], alongside original video datasets, i.e. ISLD and ISLD-Additional-Sequences [1]. Since these datasets have heterogenous action labels, each dataset labels is remapped to a common and homogeneous list of actions. To properly use MPOSE2021 and all the functionalities developed by the authors, we recommend using the official repository MPOSE2021_Dataset. Dataset Description The repository contains 3 datasets (namely 1, 2 and 3) which consist of the same data divided in different train/test splits. Each dataset contains X and y numpy arrays for both training and testing. X has the following shape: (number_of_samples, time_window, number_of_keypoints, x_y_p) where time_window = 30 number_of_keypoints = 17 (PoseNet) or 13 (OpenPose) x_y_p contains 2D keypoint coordinates (x,y) in the original video reference frame and the keypoint confidence (p <= 1) References [1] F. Angelini, Z. Fu, Y. Long, L. Shao and S. M. Naqvi, "2D Pose-based Real-time Human Action Recognition with Occlusion-handling," in IEEE Transactions on Multimedia. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8853267&isnumber=4456689 [2] F. Angelini, J. Yan and S. M. Naqvi, "Privacy-preserving Online Human Behaviour Anomaly Detection Based on Body Movements and Objects Positions," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 8444-8448. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8683026&isnumber=8682151 [3] F. Angelini and S. M. Naqvi, "Joint RGB-Pose Based Human Action Recognition for Anomaly Detection Applications," 2019 22th International Conference on Information Fusion (FUSION), Ottawa, ON, Canada, 2019, pp. 1-7. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9011277&isnumber=9011156 [4] Cao, Zhe, et al. "OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields." IEEE transactions on pattern analysis and machine intelligence 43.1 (2019): 172-186. [5] Gorelick, Lena, et al. "Actions as space-time shapes." IEEE transactions on pattern analysis and machine intelligence 29.12 (2007): 2247-2253. [6] Starck, Jonathan, and Adrian Hilton. "Surface capture for performance-based animation." IEEE computer graphics and applications 27.3 (2007): 21-31. [7] Weinland, Daniel, Mustafa Özuysal, and Pascal Fua. "Making action recognition robust to occlusions and viewpoint changes." European Conference on Computer Vision. Springer, Berlin, Heidelberg, 2010. [8] Schuldt, Christian, Ivan Laptev, and Barbara Caputo. "Recognizing human actions: a local SVM approach." Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004. Vol. 3. IEEE, 2004. [9] L. Xia, C.C. Chen and JK Aggarwal. "View invariant human action recognition using histograms of 3D joints", 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 20-27, 2012. [10] C. Chen, R. Jafari, and N. Kehtarnavaz. "UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor". Proceedings of IEEE International Conference on Image Processing, Canada, 2015. [11] G. Papandreou, T. Zhu, L.C. Chen, S. Gidaris, J. Tompson, K. Murphy. "PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model". Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 269-286 [12] V. Mazzia, S. Angarano, F. Salvetti, F. Angelini, M. Chiaberge. "Action Transformer: A Self-Attention Model for Short-Time Human Action Recognition". arXiv preprint (https://arxiv.org/abs/2107.00606), 2021. : {"references": ["F. Angelini, Z. Fu, Y. Long, L. Shao and S. M. Naqvi, \"2D Pose-based Real-time Human Action Recognition with Occlusion-handling,\" in IEEE Transactions on Multimedia. URL:\u00a0http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8853267&isnumber=4456689", "F. Angelini, J. Yan and S. M. Naqvi, \"Privacy-preserving Online Human Behaviour Anomaly Detection Based on Body Movements and Objects Positions,\" ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 8444-8448. URL:\u00a0http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8683026&isnumber=8682151", "F. Angelini and S. M. Naqvi, \"Joint RGB-Pose Based Human Action Recognition for Anomaly Detection Applications,\" 2019 22th International Conference on Information Fusion (FUSION), Ottawa, ON, Canada, 2019, pp. 1-7. URL:\u00a0http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9011277&isnumber=9011156", "Cao, Zhe, et al. \"OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields.\" IEEE transactions on pattern analysis and machine intelligence 43.1 (2019): 172-186", "Gorelick, Lena, et al. \"Actions as space-time shapes.\" IEEE transactions on pattern analysis and machine intelligence 29.12 (2007): 2247-2253", "Starck, Jonathan, and Adrian Hilton. \"Surface capture for performance-based animation.\" IEEE computer graphics and applications 27.3 (2007): 21-31", "Weinland, Daniel, Mustafa \u00d6zuysal, and Pascal Fua. \"Making action recognition robust to occlusions and viewpoint changes.\" European Conference on Computer Vision. Springer, Berlin, Heidelberg, 2010", "Schuldt, Christian, Ivan Laptev, and Barbara Caputo. \"Recognizing human actions: a local SVM approach.\" Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004. Vol. 3. IEEE, 2004", "L. Xia, C.C. Chen and JK Aggarwal. \"View invariant human action recognition using histograms of 3D joints\", 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 20-27, 2012", "C. Chen, R. Jafari, and N. Kehtarnavaz. \"UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor\". Proceedings of IEEE International Conference on Image Processing, Canada, 2015", "G. Papandreou, T. Zhu, L.C.\u00a0Chen, S.\u00a0Gidaris, J.\u00a0Tompson, K. Murphy.\u00a0\"PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model\". Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 269-286", "V. Mazzia, S. Angarano, F. Salvetti, F. Angelini, M. Chiaberge. \"Action Transformer: A Self-Attention Model for Short-Time Human Action Recognition\". arXiv preprint (https://arxiv.org/abs/2107.00606), 2021"]}