ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation

Humans intuitively understand that inanimate objects do not move by themselves, but that state changes are typically caused by human manipulation (e.g., the opening of a book). This is not yet the case for machines. In part this is because there exist no datasets with ground-truth 3D annotations for...

Full description

Bibliographic Details
Main Authors: Fan, Z., Taheri, O., Tzionas, D., Kocabas, M., Kaufmann, M., Black, M.J., Hilliges, O.
Format: Article in Journal/Newspaper
Language:English
Published: IEEE Computer Society 2023
Subjects:
Online Access:https://dare.uva.nl/personal/pure/en/publications/arctic-a-dataset-for-dexterous-bimanual-handobject-manipulation(8fc92adf-12ca-479e-af6a-99468eb60b0b).html
https://doi.org/10.48550/arXiv.2204.13662
https://hdl.handle.net/11245.1/8fc92adf-12ca-479e-af6a-99468eb60b0b
https://pure.uva.nl/ws/files/141827700/2204.13662.pdf
https://arctic.is.tue.mpg.de
https://www.proceedings.com/70184.html
https://openaccess.thecvf.com/content/CVPR2023/html/Fan_ARCTIC_A_Dataset_for_Dexterous_Bimanual_Hand-Object_Manipulation_CVPR_2023_paper.html
Description
Summary:Humans intuitively understand that inanimate objects do not move by themselves, but that state changes are typically caused by human manipulation (e.g., the opening of a book). This is not yet the case for machines. In part this is because there exist no datasets with ground-truth 3D annotations for the study of physically consistent and synchronised motion of hands and articulated objects. To this end, we introduce ARCTIC -- a dataset of two hands that dexterously manipulate objects, containing 2.1M video frames paired with accurate 3D hand and object meshes and detailed, dynamic contact information. It contains bi-manual articulation of objects such as scissors or laptops, where hand poses and object states evolve jointly in time. We propose two novel articulated hand-object interaction tasks: (1) Consistent motion reconstruction: Given a monocular video, the goal is to reconstruct two hands and articulated objects in 3D, so that their motions are spatio-temporally consistent. (2) Interaction field estimation: Dense relative hand-object distances must be estimated from images. We introduce two baselines ArcticNet and InterField, respectively and evaluate them qualitatively and quantitatively on ARCTIC.