Advances in Controllable Image Generation

The field of image generation is rapidly advancing, with a focus on developing more controllable and diverse models. Recent research has explored various approaches to improve the fidelity and alignment of generated images with text prompts, including the use of salient concept-aware image embedding models and region-controllable data augmentation frameworks. Another area of focus is on developing more interpretable and interactive models, such as those that allow users to control the image generation process through parametric activation functions or personalized image filters. Noteworthy papers in this area include ReCon, which introduces a novel augmentation framework that enhances the capacity of structure-controllable generative models for object detection, and LayerComposer, which presents an interactive framework for personalized, multi-subject text-to-image generation. Additionally, papers such as Class-N-Diff and CBDiff have made significant contributions to the field of image generation, with Class-N-Diff proposing a classification-induced diffusion model for fair skin cancer diagnosis and CBDiff introducing a conditional Bernoulli diffusion model for image forgery localization.

Sources

Salient Concept-Aware Generative Data Augmentation

ReCon: Region-Controllable Data Augmentation with Rectification and Alignment for Object Detection

Controlling the image generation process with parametric activation functions

Stroke2Sketch: Harnessing Stroke Attributes for Training-Free Sketch Generation

Class-N-Diff: Classification-Induced Diffusion Model Can Make Fair Skin Cancer Diagnosis

Personalized Image Filter: Mastering Your Photographic Style

GOOD: Training-Free Guided Diffusion Sampling for Out-of-Distribution Detection

In-situ Autoguidance: Eliciting Self-Correction in Diffusion Models

D2D: Detector-to-Differentiable Critic for Improved Numeracy in Text-to-Image Generation

The Intricate Dance of Prompt Complexity, Quality, Diversity, and Consistency in T2I Models

CBDiff:Conditional Bernoulli Diffusion Models for Image Forgery Localization

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas

Built with on top of