Human-Object Interaction Detection Advances

The field of human-object interaction (HOI) detection is rapidly advancing with a focus on improving recognition accuracy and understanding complex interactions. Recent research has emphasized the importance of incorporating scene information, multi-task learning, and graph neural networks to enhance HOI detection. The development of new datasets and frameworks, such as those for online HOI generation and perception, is also driving progress in this area. Noteworthy papers include:

  • Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach, which proposes a methodology to utilize human action recognition performance by considering fixed object information in the environment and following a multi-task learning approach.
  • Explicit Multimodal Graph Modeling for Human-Object Interaction Detection, which proposes a multimodal graph network framework that explicitly models the HOI task in a four-stage graph structure.

Sources

Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach

FPI-Det: a face--phone Interaction Dataset for phone-use detection and understanding

Pragmatic Frames Evoked by Gestures: A FrameNet Brasil Approach to Multimodality in Turn Organization

OnlineHOI: Towards Online Human-Object Interaction Generation and Perception

Beyond Gaze Overlap: Analyzing Joint Visual Attention Dynamics Using Egocentric Data

Explicit Multimodal Graph Modeling for Human-Object Interaction Detection

Modeling the Multivariate Relationship with Contextualized Representations for Effective Human-Object Interaction Detection

Built with on top of