The field of talking head generation is moving towards more nuanced and emotionally expressive models. Recent developments focus on disentangling identity from emotion, allowing for more realistic and correlated emotional expressions. This is achieved through novel frameworks that incorporate audio-visual emotional cues, learnable emotion banks, and emotion discrimination objectives. Another key area of research is the introduction of uncertainty learning, which enhances the performance and robustness of talking face video generation models. Furthermore, there is a growing interest in de-identified emotion recognition and reasoning, which enables emotion understanding without compromising identity privacy. Noteworthy papers in this area include:
- Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation, which proposes a novel framework for disentangling identity with emotion.
- Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning, which introduces a Joint Uncertainty Learning Network for high-quality talking face video generation.