The field of human-centric AI is rapidly advancing, with a focus on developing models that can understand and predict human behavior, intentions, and emotions. Recent research has led to the development of innovative frameworks and datasets that enable more accurate forecasting of human navigation, hand movements, and gaze. Notably, the incorporation of multimodal sensing and fusion of visual, auditory, and sensory cues has improved the performance of various AI systems. Furthermore, the development of power-efficient autonomous mobile robots and socially-aware embodied navigation models has significant implications for real-world applications.
Some noteworthy papers in this area include: EgoCogNav, which proposes a multimodal egocentric navigation framework that predicts perceived path uncertainty and forecasts trajectories and head motion. SFHand, a streaming framework for language-guided 3D hand forecasting that achieves state-of-the-art results and outperforms prior work by a significant margin. SocialNav, a foundational model for socially-aware navigation that achieves strong gains in both navigation performance and social compliance. GazeProphetV2, a multimodal approach to VR gaze prediction that combines temporal gaze patterns, head movement data, and visual scene information to predict gaze behavior in virtual reality environments.