The field of computer vision and robotics is moving towards more generalizable and scalable solutions for object pose estimation and robotic control. Researchers are shifting their focus from instance-level methods to category-level and open-set methods, which can handle unknown textures, shapes, and sizes. This shift requires the development of more robust algorithms and datasets that can bridge the gap between category-level and open-set object pose estimation. Notable papers in this area include:
- PRISM-DP, which enables compact diffusion policy learning directly from spatial poses of task-relevant objects, eliminating the need for manual mesh processing or creation.
- PRISM, which proposes an integrated real-to-sim-to-real pipeline for scene-aware robotic control with few demonstrations.
- A zero-shot part assembly method that leverages pretrained diffusion models to guide the manipulation of parts and form realistic shapes.