The field of 3D editing and style transfer is witnessing a significant shift towards achieving multi-view consistency and controllable editing. Researchers are exploring innovative approaches to decouple style from content, enabling fast and view-consistent stylization. The integration of diffusion models and foundation models is also becoming increasingly popular, allowing for training-free and scalable editing pipelines. Noteworthy papers in this area include:
- Jigsaw3D, which achieves high style fidelity and multi-view consistency with substantially lower latency.
- SceneTextStylizer, which enables prompt-guided style transformation specifically for text regions, while preserving both text readability and stylistic consistency.
- EditCast3D, which proposes a pipeline that employs video generation foundation models to propagate edits from a single first frame across the entire dataset prior to reconstruction.
- Coupled Diffusion Sampling, which presents an inference-time diffusion sampling method to perform multi-view consistent image editing using pre-trained 2D image editing models.