Advances in Robot Learning and Interaction

The field of robot learning and interaction is rapidly evolving, with a focus on developing more sophisticated and generalizable models for embodied agents. Researchers are exploring new approaches to learn from human videos, object-centric 3D motion fields, and language models to improve robot control policies and action understanding. Notably, innovative methods are being proposed to address challenges such as heterogeneous skeleton-based action representation learning, zero-shot temporal interaction localization, and handle-based mesh deformation guided by vision language models.

Some noteworthy papers in this area include:

  • Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction, which proposes a novel bimanual foundation policy by fine-tuning text-to-video models to predict robot trajectories.
  • InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing, which presents a framework for zero-shot 3D human object interaction generation without training on specific datasets.
  • Rodrigues Network for Learning Robot Actions, which introduces a novel neural architecture specialized for processing actions by injecting kinematics-aware inductive bias into neural computation.

Sources

Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction

InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing

Rodrigues Network for Learning Robot Actions

ORV: 4D Occupancy-centric Robot Video Generation

CamCloneMaster: Enabling Reference-based Camera Control for Video Generation

Heterogeneous Skeleton-Based Action Representation Learning

Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision

Zero-Shot Temporal Interaction Localization for Egocentric Videos

Object-centric 3D Motion Field for Robot Learning from Human Videos

Handle-based Mesh Deformation Guided By Vision Language Model

DemoSpeedup: Accelerating Visuomotor Policies via Entropy-Guided Demonstration Acceleration

Built with on top of