Safe and Controllable Text-to-Image Generation

The field of text-to-image generation is moving towards safer and more controllable models. Recent developments focus on addressing social concerns and mitigating the risks of generating harmful content. Researchers are working on designing methods that can adaptively guide the generation process, ensuring that the produced images are not only of high quality but also aligned with human values. Another area of focus is on analyzing and mitigating biases in diffusion models, which is crucial for deploying these models in real-world applications. Noteworthy papers include:

SP-Guard, which introduces a selective prompt-adaptive guidance method for safer image generation.
VALOR, a modular framework for safer and more helpful text-to-image generation that integrates layered prompt analysis with human-aligned value reasoning.
SCALEX, a framework for scalable and automated exploration of diffusion model latent spaces that enables zero-shot interpretation without retraining or labelling.
Coffee, a method for controllable diffusion fine-tuning that prevents models from learning undesired concepts present in the fine-tuning data.

Safe and Controllable Text-to-Image Generation

Sources