The field of robotics is moving towards the development of generalist robots that can perform a wide range of tasks in various environments. Recent research has focused on training robots to imitate human actions, conditioned on sensor observations and textual instructions, and to learn from large-scale human videos. This has led to significant improvements in the ability of robots to generalize to novel objects, environments, and instructions involving abstract concepts. Notable papers in this area include:
- Reinforcement Learning for Flow-Matching Policies, which introduces a new approach to training flow-matching policies via reinforcement learning.
- GR-3 Technical Report, which presents a large-scale vision-language-action model that showcases exceptional capabilities in generalizing to novel objects and environments.
- Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos, which proposes a novel training paradigm that combines large-scale VLA pretraining from human videos with physical space alignment and post-training adaptation for robotic tasks.