Emotion-Driven Talking Head Generation and Beyond

The field of talking head generation is moving towards more nuanced and emotionally expressive models. Recent developments focus on disentangling identity from emotion, allowing for more realistic and correlated emotional expressions. This is achieved through novel frameworks that incorporate audio-visual emotional cues, learnable emotion banks, and emotion discrimination objectives. Another key area of research is the introduction of uncertainty learning, which enhances the performance and robustness of talking face video generation models. Furthermore, there is a growing interest in de-identified emotion recognition and reasoning, which enables emotion understanding without compromising identity privacy. Noteworthy papers in this area include:

  • Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation, which proposes a novel framework for disentangling identity with emotion.
  • Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning, which introduces a Joint Uncertainty Learning Network for high-quality talking face video generation.

Sources

Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation

Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning

DEEMO: De-identity Multimodal Emotion Recognition and Reasoning

Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion

VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction

Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline

KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

Built with on top of