Advances in Embodied Intelligence and Human-Robot Collaboration

The field of embodied intelligence and human-robot collaboration is rapidly advancing, with a focus on developing robots that can understand and respond to natural language instructions, adapt to changing environments, and learn from experience. Recent research has explored the use of vision-language models, scene graphs, and proactive replanning to improve robot autonomy and resilience. Notable papers have demonstrated the effectiveness of these approaches in various applications, including object retrieval, navigation, and manipulation. Noteworthy papers include: OVSegDT, which introduces a lightweight transformer policy for open-vocabulary object goal navigation, achieving state-of-the-art results on the HM3D-OVON dataset. Embodied-R1, which pioneers the use of pointing as a unified intermediate representation for embodied reasoning, achieving robust zero-shot generalization on 11 embodied spatial and pointing benchmarks. DEXTER-LLM, which integrates large language models with model-based assignment methods for dynamic task planning in unknown environments, demonstrating exceptional performance in experimental evaluations.

Sources

Utilizing Vision-Language Models as Action Models for Intent Recognition and Assistance

Scene Graph-Guided Proactive Replanning for Failure-Resilient Embodied Agent

OVSegDT: Segmenting Transformer for Open-Vocabulary Object Goal Navigation

Using Natural Language for Human-Robot Collaboration in the Real World

ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models

Toward General Physical Intelligence for Resilient Agile Manufacturing Automation

Improving Pre-Trained Vision-Language-Action Policies with Model-Based Search

Mechanical Automation with Vision: A Design for Rubik's Cube Solver

RoboRetriever: Single-Camera Robot Object Retrieval via Active and Interactive Perception with Dynamic Scene Graph

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

Grounding Actions in Camera Space: Observation-Centric Vision-Language-Action Policy

Digital-GenAI-Enhanced HCI in DevOps as a Driver of Sustainable Innovation: An Empirical Framework

A Surveillance Based Interactive Robot

CAST: Counterfactual Labels Improve Instruction Following in Vision-Language-Action Models

ROVER: Robust Loop Closure Verification with Trajectory Prior in Repetitive Environments

CrafterDojo: A Suite of Foundation Models for Building Open-Ended Embodied Agents in Crafter

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

DEXTER-LLM: Dynamic and Explainable Coordination of Multi-Robot Systems in Unknown Environments via Large Language Models

PB-IAD: Utilizing multimodal foundation models for semantic industrial anomaly detection in dynamic manufacturing environments

Towards AI-based Sustainable and XR-based human-centric manufacturing: Implementation of ISO 23247 for digital twins of production systems

Survey of Vision-Language-Action Models for Embodied Manipulation

Lang2Lift: A Framework for Language-Guided Pallet Detection and Pose Estimation Integrated in Autonomous Outdoor Forklift Operation

LLM-Driven Self-Refinement for Embodied Drone Task Planning

NiceWebRL: a Python library for human subject experiments with reinforcement learning environments