Perceptual Reasoning and Interaction Research

Benchmarking Progress to Infant-Level Physical Reasoning in AI

Oct, 2022

5.7k real-world videos, 75k simulated videos, 3 physical reasoning principles, 3 viewpoints, ~30 objects

The InfLevel benchmark contains thousands of real-world, and simulated, videos designed to test the core physical reasoning abilities of modern AI systems...

🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

June, 2022

10K Interactive Houses, Procedurally Generate Infinitely More Environments

ProcTHOR is a platform to massively scale Embodied AI by procedurally generating realistic, interactive, and diverse simulated 3D houses and environments

Human Activity with Video, IMUs and Eye Tracking Dataset(HAVIET)

September, 2020

77 hours, 35 participants, 10 IMUs, Eye Tracker

The HAVIET contains diverse recordings of participants' observations, movements and gaze while performing variety of diverse tasks.

ArmPointNav Dataset (APND)

March, 2021

30 scenes, 72k object locations, 12 object categories

The APND dataset contains diverse set of locations of the objects in the room.

VidSitu

April, 2021

29K 10-second Movie Clips, 145K Events with verbs, semantic role labels, entitycoreferences, and event relations.

VidSitu is a large-scale dataset containing diverse 10-second videos from movies depicting complex situations.

SWiG

March, 2020

126,102 situations, 278,336 bounding boxes

The SWiG dataset contains situations describing the primary action in the image with groundings for all involved agents/objects.

ALFRED

Jan, 2020

25K natural language directives

A benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.

OK-VQA

June, 2019

More than 14000 questions

The OKVQA dataset is composed of questions that require outside knowledge to be answered.

Part Labeling

June, 2018

8,290 images, 154,420 pairs, 76,642 annotations

Three part-annotated datasets of images and pairs for a part labeling task.

DYCE

May, 2018

Images from 11 indoor scenes from AI2-Thor, ~60 objects per scene

A dataset of synthetic occluded objects. This is a synthetic dataset with photo-realistic images and natural configuration of objects in scenes.

DECADE

May, 2018

380 video clips (24,500 frames) with corresponding joint information

A dataset of ego-centric dog video and joint movements.

Charades-Ego

April, 2018

7,860 videos, 68,536 temporal annotations, 157 action classes

A dataset of daily indoors activities filmed from third and first person with temporal annotations for various action classes.

IQUAD V1

April, 2018

75,000 questions, each paired with a unique scene configuration

IQUAD V1 pairs unique scene configurations in the AI2-THOR environment with questions corresponding to those environments.

COQE

Sep, 2017

More than 5,000 images of 10,000 liquid containers in context

COQE contains images of liquid containers labelled with volume, amount of content, bounding box annotation, and 3D CAD models.

TQA

April, 2017

1 thousand textbook lessons, 26k questions, 6k images

The TextbookQuestionAnswering (TQA) dataset is drawn from middle school science curricula.

FigureSeer

September 2016

60k figures extracted from 20k papers

Figures from 20k papers annotated as scatterplot, flowchart, etc. Over 600 figures were given futher detailed annotations (i.e., axes, legends, plot data,...