Advancements in Dexterous Manipulation and Multimodal Perception

The field of robotics is witnessing significant advancements in dexterous manipulation and multimodal perception, enabling robots to interact with their environment in a more human-like manner. Recent developments focus on integrating tactile and visual sensing to improve fine-grained control and adaptability in unstructured settings. Researchers are exploring innovative architectures and techniques, such as cross-modal representation learning, contrastive representations, and mixture-of-experts frameworks, to enhance the accuracy and robustness of robotic manipulation. Noteworthy papers include ViTacFormer, which achieves state-of-the-art performance in dexterous manipulation tasks, and Reimagination with Test-time Observation Interventions, which improves the robustness of world models to novel visual distractors. Additionally, the introduction of novel robotic hands, such as the MOTIF hand, and perception frameworks, like Robotic Perception with a Large Tactile-Vision-Language Model, are expanding the capabilities of robots to safely and efficiently interact with their environment. These advancements have the potential to transform various applications, including collaborative robotics, laboratory automation, and object handover.

Sources

ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation

Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control

Learning Dexterous Object Handover

Multimodal Anomaly Detection with a Mixture-of-Experts

The MOTIF Hand: A Robotic Hand for Multimodal Observations with Thermal, Inertial, and Force Sensors

Robotic Perception with a Large Tactile-Vision-Language Model for Physical Property Inference

UniTac-NV: A Unified Tactile Representation For Non-Vision-Based Tactile Sensors

Multimodal Behaviour Trees for Robotic Laboratory Task Automation

ConViTac: Aligning Visual-Tactile Fusion with Contrastive Representations

Built with on top of