Advances in Personalized Image and Animation Generation

The field of personalized image and animation generation is rapidly advancing, with a focus on improving the quality, diversity, and controllability of generated content. Recent developments have centered around addressing the challenges of preserving identity, maintaining consistency, and enabling fine-grained control over facial attributes and expressions. Notably, researchers have proposed innovative solutions to prevent shortcuts in adapter training, ensuring that models learn disentangled representations of target attributes. Additionally, there have been significant improvements in audio-driven talking face generation, speech-driven facial animation, and music accompaniment generation, with models achieving superior performance in terms of visual quality, identity preservation, and synchronization accuracy. Furthermore, the integration of AI technologies into artistic workflows has led to the creation of immersive sound installations and human-AI co-creative sound artworks, highlighting the potential for AI to expand the boundaries of artistic expression. Overall, the field is moving towards more sophisticated and controllable generation models that can produce high-quality, personalized content with precise control over attributes and expressions.

Noteworthy papers include: PSTF-AttControl, which enables precise control over facial attributes in a per-subject-tuning-free way. LSF-Animation, which proposes a novel framework for label-free speech-driven facial animation via implicit feature representation. MAGIC-Talk, which achieves temporally stable talking face generation with customizable identity control.

Sources

Preventing Shortcuts in Adapter Training via Providing the Shortcuts

Face-MakeUpV2: Facial Consistency Learning for Controllable Text-to-Image Generation

LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation

Streaming Generation for Music Accompaniment

MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control

Awakening Facial Emotional Expressions in Human-Robot

FreeFuse: Multi-Subject LoRA Fusion via Auto Masking at Test Time

Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation

PSTF-AttControl: Per-Subject-Tuning-Free Personalized Image Generation with Controllable Face Attributes

Studies for : A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model

Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation

Built with on top of