The field of image generation is moving towards achieving greater control and consistency in generating images of specific identities. This is evident in the development of new frameworks and models that prioritize character consistency, semantic alignment, and temporal coherence. Researchers are exploring innovative approaches to address the challenges of maintaining identity consistency across varying prompts, scenes, and editing tasks. Notable developments include the use of composable adapters, multimodal reasoning, and contrastive identity loss to improve the fidelity and controllability of generated images.
Some noteworthy papers in this area include: CharCom, which proposes a modular framework for character-consistent story illustration. ReMix, which introduces a unified framework for character-consistent generation and editing. DreamMakeup, which demonstrates a novel training-free diffusion model for face makeup customization. ContextGen, which presents a diffusion transformer framework for multi-instance generation. InfiniHuman, which enables infinite 3D human creation with precise control. UniCalli, which introduces a unified diffusion framework for column-level generation and recognition of Chinese calligraphy. WithAnyone, which proposes a novel training paradigm to mitigate copy-paste artifacts in identity-consistent image generation.