Text-Guided Image Editing

The field of text-guided image editing is rapidly advancing, with a focus on developing methods that can precisely edit images while preserving their original content. Recent developments have led to the creation of novel frameworks and techniques that enable fast, training-free, and mask-free image editing. These methods leverage advancements in diffusion models, transformers, and attention mechanisms to achieve high-quality editing results. Notable advancements include the ability to perform large-scale shape transformations, precise color control, and suppression of unwanted content.

Some noteworthy papers in this area include: InstantEdit, which proposes a fast text-guided image editing method based on the RectifiedFlow framework. CannyEdit, which introduces a novel training-free framework that addresses the challenges of balancing text adherence, context fidelity, and editing seamlessness. Exploring Multimodal Diffusion Transformers, which systematically analyzes the attention mechanism of multimodal diffusion transformers and proposes a robust prompt-based image editing method. Follow-Your-Shape, which proposes a training-free and mask-free framework for precise and controllable editing of object shapes. Training-Free Text-Guided Color Editing, which presents a method that leverages the attention mechanisms of modern Multi-Modal Diffusion Transformers for accurate and consistent color editing. Dual Recursive Feedback, which proposes a training-free system that properly reflects control conditions in controllable text-to-image diffusion models. Translation of Text Embedding, which introduces a novel approach to directly suppress strongly entangled content within the text embedding space of diffusion models. NanoControl, which proposes a lightweight framework for precise and efficient control in diffusion transformers. TweezeEdit, which proposes a tuning- and inversion-free framework for consistent and efficient image editing. CountCluster, which proposes a method that guides the object cross-attention map to be clustered according to the specified object count in the input.

Sources

InstantEdit: Text-Guided Few-Step Image Editing with Piecewise Rectified Flow

CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing

Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control

Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion

Translation of Text Embedding via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models

NanoControl: A Lightweight Framework for Precise and Efficient Control in Diffusion Transformer

TweezeEdit: Consistent and Efficient Image Editing with Path Regularization

CountCluster: Training-Free Object Quantity Guidance with Cross-Attention Map Clustering for Text-to-Image Generation

Built with on top of