The field of 3D editing and scene generation is rapidly advancing, with a focus on improving consistency, scalability, and controllability. Researchers are exploring new approaches to address the challenges of cross-view consistency, structural fidelity, and fine-grained controllability in 3D editing. One notable direction is the use of conditional transformers and generative models to enable precise and consistent edits without requiring auxiliary 3D masks. Another area of research is the development of scene-aware retrieval frameworks for coherent metaverse scene generation, which can retrieve 3D assets from large-scale repositories and ensure spatial and stylistic consistency. Additionally, there is a growing interest in temporal reasoning and physical consistency in image editing and world simulation, with frameworks that leverage large pretrained video generative models to capture implicit physics of motion and interaction. Overall, these advances are paving the way for more realistic and immersive 3D experiences. Noteworthy papers include: Towards Scalable and Consistent 3D Editing, which introduces a 3D-structure-preserving conditional transformer for precise and consistent edits. ChronoEdit, which reframes image editing as a video generation problem to ensure physical consistency. Kaleido, which presents a family of generative models for photorealistic, unified object- and scene-level neural rendering.