Advances in Embodied Intelligence and Object-Centric Reasoning

The field of embodied intelligence is moving towards more sophisticated and scalable approaches to object-centric reasoning, enabling agents to better understand and interact with complex environments. Recent developments have focused on improving the ability of agents to perceive, track, and reason about individual object instances over time, particularly in tasks requiring sequenced interactions with visually similar objects. This has led to the development of novel frameworks and models that can handle non-Markovian settings and partial observability. Notably, the use of slot-centric and object-centric representations has shown promise in improving the efficiency and accuracy of decision-making in embodied agents. Some noteworthy papers include: Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective, which introduces a non-Markovian task suite and a slot-centric VLA framework for temporal scalability. PIGEON: VLM-Driven Object Navigation via Points of Interest Selection, which proposes a method for object navigation using a large Visual-Language Model to select Points of Interest. Run, Ruminate, and Regulate: A Dual-process Thinking System for Vision-and-Language Navigation, which presents a dual-process thinking framework integrating large language models with VLN-specific expertise. Object-Centric World Models for Causality-Aware Reinforcement Learning, which proposes a unified framework using object-centric Transformers as the world model and causality-aware policy and value networks.

Sources

Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective

PIGEON: VLM-Driven Object Navigation via Points of Interest Selection

Run, Ruminate, and Regulate: A Dual-process Thinking System for Vision-and-Language Navigation

Object-Centric World Models for Causality-Aware Reinforcement Learning

Built with on top of