Following the remarkable progress in ML and AI generally over the past several years, there is a growing sentiment that future progress will require moving beyond our reliance on static datasets to train AI systems. The domain of Embodied AI is centered around the belief that intelligent agents learn best through exploration and interaction with their environment (be it real or simulated). PRIOR feels that this promising area of research can benefit greatly from interdisciplinary collaboration at this stage of development. To foster this collaboration, our lecture series aims to bring together researchers from a variety of fields that touch on Embodied AI—computer vision, robotics, NLP, ML, neuroscience and cognitive psychology amongst others.
These lectures will be interactive and accessible to all, and we encourage the audience to ask questions and participate in discussions. Recordings will be available after the conclusion of individual lectures in the series (conditioned on consent of the speaker).
Please subscribe to our mailing list to receive invitations to this lecture series containing links to attend live events.
We encourage participation in live discussions that remains considerate of the speaker and other participants. Researchers interested in giving a lecture, or anyone with suggestions for future topics or speakers, should get in touch with us here.
Embodied Cognition posits that the body of an agent is not only a vessel to contain the mind, but meaningfully influences the agent's brain and contributes to its intelligent behavior through morphological computation. In this talk, I'll introduce a system for studying the role of complex brains and... bodies in soft robotics, demonstrate how this system may exhibit morphological computation, and describe a particular challenge that occurs when attempting to employ machine learning to optimize embodied machines and their behavior. I'll argue that simply considering and accounting for the co-dependencies suggested by embodied cognition can help us to overcome this challenge, and suggest that this approach may be helpful to the optimization of structure and function in machine learning domains outside of soft robotics.
Current robots are either expensive or make significant compromises on sensory richness, computational power, and communication capabilities. We propose to leverage smartphones to equip robots with extensive sensor suites, powerful computational abilities, state-of-the-art communication channels, an... d access to a thriving software ecosystem. We design a small electric vehicle that costs $50 and serves as a robot body for standard Android smartphones. We develop a software stack that allows smartphones to use this body for mobile operation and demonstrate that the system is sufficiently powerful to support advanced robotics workloads such as person following and real-time autonomous navigation in unstructured environments. Controlled experiments demonstrate that the presented approach is robust across different smartphones and robot bodies.
Despite recent progress in the capabilities of autonomous robots, especially learned robot skills, there remain significant challenges in building robust, scalable, and general-purpose systems for service robots. This talk will present our recent work to answer the following question: how can symbol... ic planning and reinforcement learning be combined to create general-purpose service robots that reason about high-level actions and adapt to the real world? The problem will be approached from two directions. First, I will introduce planning algorithms that adapt to the environment by learning and exchanging knowledge with other agents. These methods allow robots to plan in open-world scenarios, to plan around other robots while avoiding conflicts and realizing synergies, and to learn action costs throughout executions in the real world. Second, I will present reinforcement learning (RL) methods that leverage reasoning and planning, in order to address the challenges of maximizing the long-term average reward in continuing service robot tasks.
Spatial perception —the robot’s ability to sense and understand the surrounding environment— is a key enabler for autonomous systems operating in complex environments, including self-driving cars and unmanned aerial vehicles. Recent advances in perception algorithms and systems have enabled robots t... o detect objects and create large-scale maps of an unknown environment, which are crucial capabilities for navigation, manipulation, and human-robot interaction. Despite these advances, researchers and practitioners are well aware of the brittleness of existing perception systems, and a large gap still separates robot and human perception. This talk discusses two efforts targeted at bridging this gap. The first effort targets high-level understanding. While humans are able to quickly grasp both geometric, semantic, and physical aspects of a scene, high-level scene understanding remains a challenge for robotics. I present our work on real-time metric-semantic understanding and 3D Dynamic Scene Graphs. I introduce the first generation of Spatial Perception Engines, that extend the traditional notions of mapping and SLAM, and allow a robot to build a “mental model” of the environment, including spatial concepts (e.g., humans, objects, rooms, buildings) and their relations at multiple levels of abstraction. The second effort focuses on robustness. I present recent advances in the design of certifiable perception algorithms that are robust to extreme amounts of noise and outliers and afford performance guarantees. I present fast certifiable algorithms for object pose estimation: our algorithms are “hard to break” (e.g., are robust to 99% outliers) and succeed in localizing objects where an average human would fail. Moreover, they come with a “contract” that guarantees their input-output performance. Certifiable algorithms and real-time high-level understanding are key enablers for the next generation of autonomous systems, that are trustworthy, understand and execute high-level human instructions, and operate in large dynamic environments and over and extended period of time.
Recent advancements in embodied intelligence have shown exciting results in adapting to diverse and complex external environments. However, much work remains incognizant of the agents' internal hardware (i.e., the embodiment), which often plays a critical role in determining the system's overall fun... ctionality and performance. In this talk, we revisit the role of “embodiment” in embodied intelligence, specifically, in the context of robotic manipulation. The key idea behind ''self-adaptive manipulation'' is to treat a robot's hardware as an integral part of its behavior --- the learned manipulation policies should be conditioned on their hardware and also inform how hardware should be improved. I will use two of our recent works to illustrate both aspects: AdaGrasp for learning a unified policy for using different and novel gripper hardware, and Fit2Form for generating a new gripper hardware design that optimizes for the target task.
The embodiment hypothesis is the idea that “intelligence emerges in the interaction of an agent with an environment and as a result of sensorimotor activity”. Imagine walking up to a home robot and asking “Hey robot – can you go check if my laptop is on my desk? And if so, bring it to me”. Or asking... an egocentric AI assistant (operating on your smart glasses): “Hey – where did I last see my keys?”. In order to be successful, such an embodied agent would need a range of skills – visual perception (to recognize & map scenes and objects), language understanding (to translate questions and instructions into actions), and action (to move and find things in a changing environment). I will first give an overview of work happening at Georgia Tech and FAIR building up to this grand goal of embodied AI. Next, I will dive into a recent project where we asked if machines – specifically, navigation agents – build cognitive maps. Specifically, we train 'blind’ AI agents – with sensing limited to only egomotion – to perform PointGoal navigation (‘go to delta-x, delta-y relative to start’) via reinforcement learning. We find that blind AI agents are surprisingly effective navigators in unseen environments (~95% success). Further still, we find that (1) these blind AI agents utilize memory over long horizons (remembering ~1,000 steps of past experience in an episode); (2) this memory enables them to take shortcuts, i.e. efficiently travel through previously unexplored parts of the environment; (3) there is emergence of maps in this memory, i.e. a detailed occupancy grid of the environment can be decoded from the agent memory; and (4) the emergent maps are selective and task dependent – the agent forgets unnecessary excursions and only remembers the end points of such detours. Overall, our experiments and analysis show that blind AI agents take shortcuts and build cognitive maps purely from learning to navigate, suggesting that cognitive maps may be a natural solution to the problem of navigation and shedding light on the internal workings of AI navigation agents.
Current evidence for the ability of some animals to plan—imagining some future set of possibilities and picking the one assessed to have the highest value—is restricted to birds and mammals. Nonetheless, all animals have had just as long to evolve what seems to be a useful capacity. In this talk, I... review some work we have done to get at the question of why planning may be useless to many animals, but useful to a select few. We use a variety of algorithms for this work, from reinforcement learning-based methods to POMDPs, and now are testing predictions using live mammals in complex reprogrammable habitats with a robot predator.