Advances in Robotic Manipulation and Learning

The field of robotic manipulation is moving towards more generalizable and versatile solutions, with a focus on learning from diverse data sources and adapting to new environments. Recent developments have seen the integration of large pre-trained models, such as language models and object detectors, to enhance robotic perception and manipulation capabilities. The use of diffusion-based models and vision-language-action frameworks has also shown promise in enabling robots to learn complex tasks from limited demonstrations. Furthermore, the development of hierarchical operational models and latent policy steering methods has improved the efficiency and effectiveness of robotic learning. Notable papers in this area include the introduction of GraspGen, a diffusion-based framework for 6-DOF grasping, and VITA, a vision-to-action flow matching policy that eliminates the need for separate conditioning mechanisms. Other noteworthy papers include the proposal of MP1, a mean flow-based policy learning method, and EquiContact, a hierarchical SE(3) vision-to-force equivariant policy for spatially generalizable contact-rich tasks.

Sources

CuriosAI Submission to the EgoExo4D Proficiency Estimation Challenge 2025

Learning human-to-robot handovers through 3D scene reconstruction

AdvGrasp: Adversarial Attacks on Robotic Grasping from a Physical Perspective

Probabilistic Human Intent Prediction for Mobile Manipulation: An Evaluation with Human-Inspired Constraints

MP1: Mean Flow Tames Policy Learning in 1-step for Robotic Manipulation

Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection

Object-Centric Mobile Manipulation through SAM2-Guided Perception and Imitation Learning

EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks

Enhancing Autonomous Manipulator Control with Human-in-loop for Uncertain Assembly Environments

Task-Oriented Human Grasp Synthesis via Context- and Task-Aware Diffusers

Diffusion-Based Imaginative Coordination for Bimanual Manipulation

Acting and Planning with Hierarchical Operational Models on a Mobile Robot: A Study with RAE+UPOM

A Multi-Level Similarity Approach for Single-View Object Grasping: Matching, Planning, and Fine-Tuning

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation

Generalist Bimanual Manipulation via Foundation Video Diffusion Models

GraspGen: A Diffusion-based Framework for 6-DOF Grasping with On-Generator Training

VITA: Vision-to-Action Flow Matching Policy

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains

Latent Policy Steering with Embodiment-Agnostic Pretrained World Models