ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation
Humans intuitively understand that inanimate objects do not move by themselves, but that state changes are typically caused by human manipulation (e.g., the opening of a book). This is not yet the case for machines. In part this is because there exist no datasets with ground-truth 3D annotations for...
Main Authors: | , , , , , , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
IEEE Computer Society
2023
|
Subjects: | |
Online Access: | https://dare.uva.nl/personal/pure/en/publications/arctic-a-dataset-for-dexterous-bimanual-handobject-manipulation(8fc92adf-12ca-479e-af6a-99468eb60b0b).html https://doi.org/10.48550/arXiv.2204.13662 https://hdl.handle.net/11245.1/8fc92adf-12ca-479e-af6a-99468eb60b0b https://pure.uva.nl/ws/files/141827700/2204.13662.pdf https://arctic.is.tue.mpg.de https://www.proceedings.com/70184.html https://openaccess.thecvf.com/content/CVPR2023/html/Fan_ARCTIC_A_Dataset_for_Dexterous_Bimanual_Hand-Object_Manipulation_CVPR_2023_paper.html |
Summary: | Humans intuitively understand that inanimate objects do not move by themselves, but that state changes are typically caused by human manipulation (e.g., the opening of a book). This is not yet the case for machines. In part this is because there exist no datasets with ground-truth 3D annotations for the study of physically consistent and synchronised motion of hands and articulated objects. To this end, we introduce ARCTIC -- a dataset of two hands that dexterously manipulate objects, containing 2.1M video frames paired with accurate 3D hand and object meshes and detailed, dynamic contact information. It contains bi-manual articulation of objects such as scissors or laptops, where hand poses and object states evolve jointly in time. We propose two novel articulated hand-object interaction tasks: (1) Consistent motion reconstruction: Given a monocular video, the goal is to reconstruct two hands and articulated objects in 3D, so that their motions are spatio-temporally consistent. (2) Interaction field estimation: Dense relative hand-object distances must be estimated from images. We introduce two baselines ArcticNet and InterField, respectively and evaluate them qualitatively and quantitatively on ARCTIC. |
---|