The field of personalized image and animation generation is rapidly advancing, with a focus on improving the quality, diversity, and controllability of generated content. Recent developments have centered around addressing the challenges of preserving identity, maintaining consistency, and enabling fine-grained control over facial attributes and expressions. Notably, researchers have proposed innovative solutions to prevent shortcuts in adapter training, ensuring that models learn disentangled representations of target attributes. Additionally, there have been significant improvements in audio-driven talking face generation, speech-driven facial animation, and music accompaniment generation, with models achieving superior performance in terms of visual quality, identity preservation, and synchronization accuracy. Furthermore, the integration of AI technologies into artistic workflows has led to the creation of immersive sound installations and human-AI co-creative sound artworks, highlighting the potential for AI to expand the boundaries of artistic expression. Overall, the field is moving towards more sophisticated and controllable generation models that can produce high-quality, personalized content with precise control over attributes and expressions.
Noteworthy papers include: PSTF-AttControl, which enables precise control over facial attributes in a per-subject-tuning-free way. LSF-Animation, which proposes a novel framework for label-free speech-driven facial animation via implicit feature representation. MAGIC-Talk, which achieves temporally stable talking face generation with customizable identity control.