Emerging Trends in Multi-View Consistent Editing

The field of 3D editing and style transfer is witnessing a significant shift towards achieving multi-view consistency and controllable editing. Researchers are exploring innovative approaches to decouple style from content, enabling fast and view-consistent stylization. The integration of diffusion models and foundation models is also becoming increasingly popular, allowing for training-free and scalable editing pipelines. Noteworthy papers in this area include:

  • Jigsaw3D, which achieves high style fidelity and multi-view consistency with substantially lower latency.
  • SceneTextStylizer, which enables prompt-guided style transformation specifically for text regions, while preserving both text readability and stylistic consistency.
  • EditCast3D, which proposes a pipeline that employs video generation foundation models to propagate edits from a single first frame across the entire dataset prior to reconstruction.
  • Coupled Diffusion Sampling, which presents an inference-time diffusion sampling method to perform multi-view consistent image editing using pre-trained 2D image editing models.

Sources

Jigsaw3D: Disentangled 3D Style Transfer via Patch Shuffling and Masking

SceneTextStylizer: A Training-Free Scene Text Style Transfer Framework with Diffusion Model

EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection

Coupled Diffusion Sampling for Training-Free Multi-View Image Editing

Built with on top of