Charades Dataset

Charades Backgorund

Charades is dataset composed of 9848 videos of daily indoors activities collected through Amazon Mechanical Turk. 267 different users were presented with a sentence, that includes objects and actions from a fixed vocabulary, and they recorded a video acting out the sentence (like in a game of Charades). The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos. This work was presented at ECCV2016.

Each video has been exhaustively annotated using consensus from 4 workers on the training set, and from 8 workers on the test set. Please refer to the updated accompanying publication for details. Please contact vision.amt@allenai.org for questions about the dataset.

Classification Performance

AlexNet 11.2% mAP
C3D 10.9% mAP
Two-Stream 14.2% mAP
IDT 17.2% mAP
Combined 18.6% mAP
Asynchronous Temporal Fields 22.4% mAP [*]

(Uses the official Charades_v1_classify.m evaluation code. More details may be found in the README and the papers below.)


Localization Performance

Random 2.42% mAP
VGG-16 7.89
Two-Stream 8.94% mAP
LSTM 9.60% mAP
LSTM w/ post-processing 10.4% mAP
Two-Stream w/ post-processing 10.9% mAP
Asynchronous Temporal Fields 12.8% mAP [*]

(Uses the official Charades_v1_localize.m evaluation code. More details may be found in the README and the papers below.)

Papers

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta ECCV  2016

Much Ado About Time: Exhaustive Annotation of Temporal Data

Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, and Abhinav Gupta HCOMP  2016

Asynchronous Temporal Fields for Action Recognition

Gunnar A. Sigurdsson, Santosh Divvala, Ali Farhadi, and Abhinav Gupta CVPR  2017

What Actions are Needed for Understanding Human Actions in Videos?

Gunnar A. Sigurdsson, Olga Russakovsky, and Abhinav Gupta IEEE  2017

Video