Advances in 3D Editing and Scene Generation

The field of 3D editing and scene generation is rapidly advancing, with a focus on improving consistency, scalability, and controllability. Researchers are exploring new approaches to address the challenges of cross-view consistency, structural fidelity, and fine-grained controllability in 3D editing. One notable direction is the use of conditional transformers and generative models to enable precise and consistent edits without requiring auxiliary 3D masks. Another area of research is the development of scene-aware retrieval frameworks for coherent metaverse scene generation, which can retrieve 3D assets from large-scale repositories and ensure spatial and stylistic consistency. Additionally, there is a growing interest in temporal reasoning and physical consistency in image editing and world simulation, with frameworks that leverage large pretrained video generative models to capture implicit physics of motion and interaction. Overall, these advances are paving the way for more realistic and immersive 3D experiences. Noteworthy papers include: Towards Scalable and Consistent 3D Editing, which introduces a 3D-structure-preserving conditional transformer for precise and consistent edits. ChronoEdit, which reframes image editing as a video generation problem to ensure physical consistency. Kaleido, which presents a family of generative models for photorealistic, unified object- and scene-level neural rendering.

Sources

Towards Scalable and Consistent 3D Editing

ROGR: Relightable 3D Objects using Generative Relighting

Memory Forcing: Spatio-Temporal Memory for Consistent Scene Generation on Minecraft

Enhancing Foveated Rendering with Weighted Reservoir Sampling

Feedback Matters: Augmenting Autonomous Dissection with Visual and Topological Feedback

MetaFind: Scene-Aware 3D Asset Retrieval for Coherent Metaverse Scene Generation

ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation

Scaling Sequence-to-Sequence Generative Neural Rendering

C3Editor: Achieving Controllable Consistency in 2D Model for 3D Editing

Did you just see that? Arbitrary view synthesis for egocentric replay of operating room workflows from ambient sensors

MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis

Built with on top of