Zero-Shot Action Recognition Advances

The field of zero-shot action recognition is moving towards more effective and robust methods for recognizing unseen actions. Recent developments focus on improving the alignment between visual and semantic representations, capturing fine-grained action patterns, and mitigating the adverse impact of distribution discrepancies. Frequency-enhanced semantic features and structured language priors have shown great promise in enabling robust differentiation of visually and semantically similar action clusters. Additionally, prototype-guided feature alignment and few-shot inspired generative approaches have demonstrated significant improvements in recognition accuracy.

Noteworthy papers include:

  • Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition, which proposes a frequency-based enhancement module to enrich skeletal semantics learning.
  • ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment, which achieves state-of-the-art results on the ActionAtlas benchmark using a sequence alignment approach.
  • Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment, which introduces a prototype-guided feature alignment paradigm to improve skeleton-text alignment.
  • Few-Shot Inspired Generative Zero-Shot Learning, which reduces reliance on large-scale feature synthesis using a few-shot inspired generative framework.

Sources

Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition

ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment

Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment

Few-Shot Inspired Generative Zero-Shot Learning

Built with on top of