About TQA

TQA Overview

The TQA dataset encourages work on the task of Multi-Modal Machine Comprehension (M3C) task. The M3C task builds on the popular Visual Question Answering (VQA) and Machine Comprehension (MC) paradigms by framing question answering as a machine comprehension task, where the context needed to answer questions is provided and composed of both text and images. The dataset constructed to showcase this task has been built from a middle school science curriculum that pairs a given question to a limited span of knowledge needed to answer it.

TQA Paper

Are You Smarter Than A Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

Aniruddha Kembhavi, Minjoon Seo, Dustin Schwenk, Jonghyun Choi, Hannaneh Hajishirzi, and Ali Farhadi CVPR  2017



TQA was constructed in large part from material freely available as part of ck12's (http://www.ck12.org) open-source science curriculum. It is distributed under a Creative Commons Attribution – Non-Commerical 3.0 license.

Instructional Material

Lessons in TQA each have a limited amount of multimodal material that contains the information needed to answer that lesson's text and diagram questions. Textual material is broken down by subtopic within the lesson. Figures and instructional diagrams are linked from the topic in which they appear.


TQA is split into a training, validation and test set at lesson level. The training set consists of 666 lessons and 15,154 questions, the validation set On Care has been taken to group these lessons before splitting the data, so as to minimize the concept overlap between splits.

Questions are separated by type in the dataset (nested under diagramQuestions and nonDiagramQuestions), and are also distinguished by global id prefixes indicating an associated diagram (DQ_ and NDQ_).


Images are divided into four types:

  • question_images (diagrams referred to in diagram questions)
  • abc_question_images (question diagrams generated with letter labels)
  • teaching_images (diagrams which have detailed descriptions)
  • textbook_images (images from the instructional material)
  • Images are referenced in the dataset as relative paths to one of these directories.