Advances in Human-Robot Interaction and Motion Generation

The field of human-robot interaction and motion generation is rapidly evolving, with a focus on developing more sophisticated and context-aware systems. Recent research has emphasized the importance of integrating multiple modalities, such as vision, language, and spatial information, to enable robots to better understand and respond to human behavior.

Notable advancements include the development of novel ontologies and knowledge graphs to represent tasks, environments, and robot capabilities, as well as innovative approaches to motion generation, such as diffusion-based models and Laban movement analysis. These advancements have the potential to significantly improve the effectiveness and naturalness of human-robot interaction, enabling robots to provide more personalized and supportive assistance in various settings.

Some particularly noteworthy papers in this area include: MINT-RVAE, which proposes a novel RGB-only pipeline for predicting human interaction intent with high accuracy. SIG-Chat, which presents a full-stack solution for spatial intent-guided conversational gesture generation, enabling more context-aware and interactive robot behavior. MoReact, which introduces a diffusion-based method for generating realistic motion sequences that respond to textual descriptions of interaction scenarios. LUMA, which proposes a text-to-motion diffusion model that incorporates dual-path anchoring to enhance semantic alignment and achieve state-of-the-art performance.

Sources

An Ontology for Unified Modeling of Tasks, Actions, Environments, and Capabilities in Personal Service Robotics

Ontological foundations for contrastive explanatory narration of robot plans

MINT-RVAE: Multi-Cues Intention Prediction of Human-Robot Interaction using Human Pose and Emotion Information from RGB-only Camera Data

SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where

MoReact: Generating Reactive Motion from Textual Descriptions

Unified Multi-Modal Interactive & Reactive 3D Motion Generation via Rectified Flow

LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation

Real-time Recognition of Human Interactions from a Single RGB-D Camera for Socially-Aware Robot Navigation

Social 3D Scene Graphs: Modeling Human Actions and Relations for Interactive Service Robots

LUMA: Low-Dimension Unified Motion Alignment with Dual-Path Anchoring for Text-to-Motion Diffusion Model

Motion In-Betweening for Densely Interacting Characters

MultiModal Action Conditioned Video Generation

Built with on top of