Advances in Controllable Image and Scene Editing

The field of image and scene editing is rapidly advancing, with a focus on developing controllable and efficient methods for editing and generating high-quality images and scenes. Recent research has explored the use of diffusion models, latent space editing, and multimodal vision-language models to achieve this goal. Notable papers in this area include FLUX.1 Kontext, which presents a generative flow matching model that unifies image generation and editing in latent space, and FOCUS, a unified vision-language model that integrates segmentation-aware perception and controllable object-centric generation. Additionally, papers like Inverse-and-Edit and EditP23 have introduced innovative methods for image editing, including cycle consistency models and propagation of image prompts to multi-view representations. Other papers, such as SceneCrafter and Generative Blocks World, have demonstrated the effectiveness of controllable editing methods for driving scenes and 3D scene manipulation. PrITTI has shown promising results in generating controllable and editable 3D semantic scenes using latent diffusion-based frameworks. MADrive has introduced a memory-augmented reconstruction framework for driving scene modeling, enabling photorealistic synthesis of significantly altered or novel driving scenarios. Overall, these advances have the potential to enable more efficient, controllable, and high-quality image and scene editing, with applications in a variety of fields, including computer vision, robotics, and autonomous vehicles.

Sources

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation

Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models

PrITTI: Primitive-based Generation of Controllable and Editable 3D Semantic Scenes

SceneCrafter: Controllable Multi-View Driving Scene Editing

Towards Efficient Exemplar Based Image Editing with Multimodal VLMs

EditP23: 3D Editing via Propagation of Image Prompts to Multi-View

Generative Blocks World: Moving Things Around in Pictures

Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling

Controllable 3D Placement of Objects with Scene-Aware Diffusion Models

MADrive: Memory-Augmented Driving Scene Modeling

Built with on top of