Following the remarkable progress in ML and AI generally over the past several years, there is a growing sentiment that future progress will require moving beyond our reliance on static, datasets to train AI systems. The domain of Embodied AI is centered around the belief that intelligent agents learn best through exploration and interaction with their environment (be it real or simulated). PRIOR feels that this promising area of research can benefit greatly from interdisciplinary collaboration at this stage of development. To foster this collaboration, our lecture series aims to bring together researchers from a variety of fields that touch on Embodied AI—computer vision, robotics, NLP, ML, neuroscience and cognitive psychology amongst others.
These lectures will be interactive and accessible to all, and we encourage the audience to ask questions and participate in discussions. Recordings will be available after the conclusion of individual lectures in the series (conditioned on consent of the speaker).
Please subscribe to our mailing list to receive invitations to this lecture series containing links to attend live events.
We encourage lively participation in our live discussion that remains considerate of the speaker and other participants. Researchers interested in giving a lecture, or anyone with suggestions for future topics or speakers should get in touch with us here.
The embodiment hypothesis is the idea that “intelligence emerges in the interaction of an agent with an environment and as a result of sensorimotor activity”. Imagine walking up to a home robot and asking “Hey robot – can you go check if my laptop is on my desk? And if so, bring it to me”. Or asking... an egocentric AI assistant (operating on your smart glasses): “Hey – where did I last see my keys?”. In order to be successful, such an embodied agent would need a range of skills – visual perception (to recognize & map scenes and objects), language understanding (to translate questions and instructions into actions), and action (to move and find things in a changing environment). I will first give an overview of work happening at Georgia Tech and FAIR building up to this grand goal of embodied AI. Next, I will dive into a recent project where we asked if machines – specifically, navigation agents – build cognitive maps. Specifically, we train 'blind’ AI agents – with sensing limited to only egomotion – to perform PointGoal navigation (‘go to delta-x, delta-y relative to start’) via reinforcement learning. We find that blind AI agents are surprisingly effective navigators in unseen environments (~95% success). Further still, we find that (1) these blind AI agents utilize memory over long horizons (remembering ~1,000 steps of past experience in an episode); (2) this memory enables them to take shortcuts, i.e. efficiently travel through previously unexplored parts of the environment; (3) there is emergence of maps in this memory, i.e. a detailed occupancy grid of the environment can be decoded from the agent memory; and (4) the emergent maps are selective and task dependent – the agent forgets unnecessary excursions and only remembers the end points of such detours. Overall, our experiments and analysis show that blind AI agents take shortcuts and build cognitive maps purely from learning to navigate, suggesting that cognitive maps may be a natural solution to the problem of navigation and shedding light on the internal workings of AI navigation agents.
Most computer vision algorithms are built with the goal to understand the physical world. Yet, as reflected in standard vision benchmarks and datasets, these algorithms continue to assume the role of a passive observer -- only watching static images or videos, without the ability to interact with th... e environment. This assumption becomes a fundamental limitation for applications in robotics, where systems are intrinsically built to actively engage with the physical world. In this talk, I will present some recent work from my group that demonstrates how we can enable robots to leverage their ability to interact with the environment in order to better understand what they see: from discovering objects' identity and 3D geometry to discovering physical properties of novel objects through different dynamic interactions. We will demonstrate how the learned knowledge can be used to facilitate downstream manipulation tasks. Finally, I will discuss a few open research directions in the area of active scene understanding.
Embodied Cognition posits that the body of an agent is not only a vessel to contain the mind, but meaningfully influences the agent's brain and contributes to its intelligent behavior through morphological computation. In this talk, I'll introduce a system for studying the role of complex brains and... bodies in soft robotics, demonstrate how this system may exhibit morphological computation, and describe a particular challenge that occurs when attempting to employ machine learning to optimize embodied machines and their behavior. I'll argue that simply considering and accounting for the co-dependencies suggested by embodied cognition can help us to overcome this challenge, and suggest that this approach may be helpful to the optimization of structure and function in machine learning domains outside of soft robotics.
Current robots are either expensive or make significant compromises on sensory richness, computational power, and communication capabilities. We propose to leverage smartphones to equip robots with extensive sensor suites, powerful computational abilities, state-of-the-art communication channels, an... d access to a thriving software ecosystem. We design a small electric vehicle that costs $50 and serves as a robot body for standard Android smartphones. We develop a software stack that allows smartphones to use this body for mobile operation and demonstrate that the system is sufficiently powerful to support advanced robotics workloads such as person following and real-time autonomous navigation in unstructured environments. Controlled experiments demonstrate that the presented approach is robust across different smartphones and robot bodies.
Despite recent progress in the capabilities of autonomous robots, especially learned robot skills, there remain significant challenges in building robust, scalable, and general-purpose systems for service robots. This talk will present our recent work to answer the following question: how can symbol... ic planning and reinforcement learning be combined to create general-purpose service robots that reason about high-level actions and adapt to the real world? The problem will be approached from two directions. First, I will introduce planning algorithms that adapt to the environment by learning and exchanging knowledge with other agents. These methods allow robots to plan in open-world scenarios, to plan around other robots while avoiding conflicts and realizing synergies, and to learn action costs throughout executions in the real world. Second, I will present reinforcement learning (RL) methods that leverage reasoning and planning, in order to address the challenges of maximizing the long-term average reward in continuing service robot tasks.
Current evidence for the ability of some animals to plan—imagining some future set of possibilities and picking the one assessed to have the highest value—is restricted to birds and mammals. Nonetheless, all animals have had just as long to evolve what seems to be a useful capacity. In this talk, I... review some work we have done to get at the question of why planning may be useless to many animals, but useful to a select few. We use a variety of algorithms for this work, from reinforcement learning-based methods to POMDPs, and now are testing predictions using live mammals in complex reprogrammable habitats with a robot predator.