Image Editing with Diffusion Models

The field of image editing is moving towards more controllable and efficient methods, with a focus on integrating text and drag interactions. Recent developments have introduced unified diffusion-based frameworks that combine the strengths of text-driven and drag-driven approaches, allowing for precise spatial control and fine-grained texture manipulation. Another notable direction is the development of faster and more efficient editing methods, which enable real-time image editing while maintaining high-fidelity results. Additionally, there is a growing interest in using geometry-guided methods to improve the consistency and precision of edits, particularly in geometry-intensive scenarios. Noteworthy papers include: TDEdit, which proposes a unified diffusion-based framework for joint drag-text image editing, and FlashEdit, which introduces a novel framework for high-fidelity, real-time image editing. LaTo is also noteworthy for its landmark-tokenized diffusion transformer for fine-grained, identity-preserving face editing. GeoDrag and DragFlow also present innovative approaches to geometry-guided and drag-based image editing, respectively.

Image Editing with Diffusion Models

Sources