Advancements in Video and 3D Generation for Autonomous Driving and Image Editing

The field of video and 3D generation is rapidly advancing, with a focus on improving visual quality, spatial accuracy, and controllability. Recent developments have led to the creation of more realistic and detailed models, particularly in the context of autonomous driving and image editing. Researchers are exploring new approaches to fine-tune video generation models, balance visual fidelity with dynamic accuracy, and develop more efficient and flexible frameworks for image editing. Notable papers in this area include: PosBridge, which proposes a novel framework for inserting custom objects into target scenes, and ObjFiller-3D, which introduces a method for consistent multi-view 3D inpainting via video diffusion models. Other noteworthy papers are ROSE, which presents a framework for removing objects with side effects in videos, and VoxHammer, which proposes a training-free approach for precise and coherent 3D editing in native 3D space. Overall, these advancements have the potential to significantly impact various applications, including autonomous driving, video editing, and 3D modeling.

Sources

Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation

PosBridge: Multi-View Positional Embedding Transplant for Identity-Aware Image Editing

ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models

ROSE: Remove Objects with Side Effects in Videos

Harnessing Meta-Learning for Controllable Full-Frame Video Stabilization

LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding

VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation

DrivingGaussian++: Towards Realistic Reconstruction and Editable Simulation for Surrounding Dynamic Driving Scenes

Built with on top of