The field of human-computer interaction and robotics is moving towards a more integrated and multimodal approach, incorporating tactile, visual, and auditory cues to enhance emotional expression and communication. Recent developments have focused on the development of large language models that can process and adapt to sensory data, enabling more effective human-robot interaction and operational skill acquisition. Notably, the use of kinesthetic feedback and haptic signals is becoming increasingly important in areas such as virtual reality, accessibility, and rehabilitation.
Some noteworthy papers in this area include: Hand by Hand: LLM Driving EMS Assistant for Operational Skill Learning, which explores the potential of collaborative human-LLM action in operational skill acquisition. HapticLLaMA: A Multimodal Sensory Language Model for Haptic Captioning, which demonstrates the capability of large language models to interpret haptic vibration signals. OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing, which proposes a novel architecture involving tactile sensing and achieves substantial improvements over state-of-the-art VLA baselines.