Advances in Text-to-Image Synthesis and Editing

The field of text-to-image synthesis and editing has seen significant advancements in recent weeks. Researchers have been exploring new methods to improve the quality and realism of generated images, as well as developing techniques to edit and manipulate existing images. One notable trend is the use of diffusion models, which have shown impressive results in generating high-quality images from text prompts. Another area of focus is on developing methods to preserve the semantic meaning and consistency of the input text, even when generating multiple images or editing existing ones. The development of new evaluation metrics and benchmarks has also been a key aspect of this research, enabling more accurate assessment of model performance and comparison between different approaches. Some papers have also investigated the application of these techniques in real-world scenarios, such as image customization, virtual try-on, and content creation. Overall, the field is rapidly advancing, with new techniques and models being proposed to address the challenges of text-to-image synthesis and editing. Noteworthy papers include TaleForge, which introduces a personalized story-generation system, and Preserve Anything, which proposes a novel method for controlled image synthesis with object preservation. Other notable papers include VisualPrompter, which utilizes an automatic self-reflection module to identify missing concepts in generated images, and Why Settle for One?, which proposes a framework for generating coherent image sets with diverse consistency requirements.

Sources

TaleForge: Interactive Multimodal System for Personalized Story Creation

Preserve Anything: Controllable Image Synthesis with Object Preservation

Recomposed realities: animating still images via patch clustering and randomness

Mitigating Semantic Collapse in Generative Personalization with a Surprisingly Simple Test-Time Embedding Adjustment

Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate

VisualPrompter: Prompt Optimization with Visual Feedback for Text-to-Image Synthesis

Why Settle for One? Text-to-ImageSet Generation and Evaluation

Why Settle for Mid: A Probabilistic Viewpoint to Spatial Relationship Alignment in Text-to-image Models

Blending Concepts with Text-to-Image Diffusion Models

Subjective Camera: Bridging Human Cognition and Visual Reconstruction through Sequence-Aware Sketch-Guided Diffusion

Imagine for Me: Creative Conceptual Blending of Real Images and Text via Blended Attention

Navigating with Annealing Guidance Scale in Diffusion Space

Calligrapher: Freestyle Text Image Customization

Customizable ROI-Based Deep Image Compression

Diffusion Disambiguation Models for Partial Label Learning

Diffusion Classifier Guidance for Non-robust Classifiers

OptiPrune: Boosting Prompt-Image Consistency with Attention-Guided Noise and Dynamic Token Selection

UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis

ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation

Crafting Hanzi as Narrative Bridges: An AI Co-Creation Workshop for Elderly Migrants

Perception-Oriented Latent Coding for High-Performance Compressed Domain Semantic Inference

FreeLoRA: Enabling Training-Free LoRA Fusion for Autoregressive Multi-Subject Personalization

IC-Custom: Diverse Image Customization via In-Context Learning

Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback

APT: Adaptive Personalized Training for Diffusion Models with Limited Data

UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation

FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models

RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation