Advances in Text-to-Image Synthesis and Editing

The field of text-to-image synthesis and editing is rapidly advancing, with a focus on improving the control and fidelity of generated images. Recent developments have introduced new methods for incorporating negative prompt guidance, style-specific content creation, and anomaly generation. These innovations have led to significant improvements in image quality and adherence to textual prompts. Notably, the integration of large language models and diffusion transformers has enhanced the understanding and execution of complex instructions. Furthermore, advances in autoregressive modeling and flow matching have enabled more precise and efficient image editing. Overall, the field is moving towards more sophisticated and controllable image synthesis and editing capabilities. Noteworthy papers include VSF, which introduces a simple and efficient method for negative prompt guidance, and DeCoT, which leverages large language models to enhance text-to-image generation. Additionally, papers like SAGA and CurveFlow have made significant contributions to improving the fidelity and control of generated images.

Sources

VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By \underline{V}alue \underline{S}ign \underline{F}lip

SPG: Style-Prompting Guidance for Style-Specific Content Creation

Training-Free Anomaly Generation via Dual-Attention Enhancement in Diffusion Model

Demystifying Foreground-Background Memorization in Diffusion Models

DeCoT: Decomposing Complex Instructions for Enhanced Text-to-Image Generation with Large Language Models

Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score

Leveraging Diffusion Models for Stylization using Multiple Style Images

7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models

Text2Weight: Bridging Natural Language and Neural Network Weight Spaces

DegDiT: Controllable Audio Generation with Dynamic Event Graph Guided Diffusion Transformer

SAGA: Learning Signal-Aligned Distributions for Improved Text-to-Image Generation

CTA-Flux: Integrating Chinese Cultural Semantics into High-Quality English Text-to-Image Communities

MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion

SATURN: Autoregressive Image Generation Guided by Scene Graphs

TransLight: Image-Guided Customized Lighting Control with Generative Decoupling

Hybrelighter: Combining Deep Anisotropic Diffusion and Scene Reconstruction for On-device Real-time Relighting in Mixed Reality

CurveFlow: Curvature-Guided Flow Matching for Image Generation

Visual Autoregressive Modeling for Instruction-Guided Image Editing