Advancements in Multimodal Interaction and Tactile Perception

The field of human-computer interaction and robotics is moving towards a more integrated and multimodal approach, incorporating tactile, visual, and auditory cues to enhance emotional expression and communication. Recent developments have focused on the development of large language models that can process and adapt to sensory data, enabling more effective human-robot interaction and operational skill acquisition. Notably, the use of kinesthetic feedback and haptic signals is becoming increasingly important in areas such as virtual reality, accessibility, and rehabilitation.

Some noteworthy papers in this area include: Hand by Hand: LLM Driving EMS Assistant for Operational Skill Learning, which explores the potential of collaborative human-LLM action in operational skill acquisition. HapticLLaMA: A Multimodal Sensory Language Model for Haptic Captioning, which demonstrates the capability of large language models to interpret haptic vibration signals. OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing, which proposes a novel architecture involving tactile sensing and achieves substantial improvements over state-of-the-art VLA baselines.

Sources

Hand by Hand: LLM Driving EMS Assistant for Operational Skill Learning

HapticLLaMA: A Multimodal Sensory Language Model for Haptic Captioning

Surformer v1: Transformer-Based Surface Classification Using Tactile and Vision Features

Touch Speaks, Sound Feels: A Multimodal Approach to Affective and Social Touch from Robots to Humans

OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing

Embodied Tactile Perception of Soft Objects Properties

Built with on top of