Advances in Embodied World Models for Robotics

The field of embodied world models for robotics is rapidly advancing, with a focus on developing more accurate and persistent models that can simulate the effects of future actions on the world. This enables agents to anticipate the effects of their actions and make plans accordingly, which is crucial for intelligent embodied agents. Recent work has explored the use of occupancy world models, video diffusion models, and action-conditional world models to generate future visual observations and predict scene evolutions. These models have shown promising results in downstream embodied applications, such as planning and policy learning. Notable papers in this area include:

  • The introduction of RoboOccWorld, which uses a combined spatio-temporal receptive field and guided autoregressive transformer to forecast scene evolutions in indoor scenes.
  • The development of FlowDreamer, which adopts 3D scene flow as explicit motion representations for RGB-D world models and achieves better performance compared to other baseline models.

Sources

Occupancy World Model for Robots

Learning 3D Persistent Embodied World Models

EnerVerse-AC: Envisioning Embodied Environments with Action Condition

FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation

Built with on top of