A new dataset with both first and third-person videos—Charades-Ego—is now available.
Charades is dataset composed of 9848 videos of daily indoors activities collected through Amazon Mechanical Turk. 267 different users were presented with a sentence, that includes objects and actions from a fixed vocabulary, and they recorded a video acting out the sentence (like in a game of Charades). The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos. This work was presented at ECCV2016.
Each video has been exhaustively annotated using consensus from 4 workers on the training set, and from 8 workers on the test set. Please refer to the updated accompanying publication for details. Please contact vision.amt@allenai.org for questions about the dataset.
AlexNet | 11.2% mAP |
---|---|
C3D | 10.9% mAP |
Two-Stream | 14.2% mAP |
IDT | 17.2% mAP |
Combined | 18.6% mAP |
Asynchronous Temporal Fields | 22.4% mAP [*] |
Random | 2.42% mAP |
---|---|
VGG-16 | 7.89 |
Two-Stream | 8.94% mAP |
LSTM | 9.60% mAP |
LSTM w/ post-processing | 10.4% mAP |
Two-Stream w/ post-processing | 10.9% mAP |
Asynchronous Temporal Fields | 12.8% mAP [*] |