Advances in 3D World Modeling and Video Generation

The field of 3D world modeling and video generation is rapidly advancing, with a focus on developing more realistic and interactive models. Recent research has explored the use of geometry-enhanced frameworks, unified video models, and text-driven 3D scene generation to improve the quality and consistency of generated videos and 3D scenes. Notably, the development of methods such as FantasyWorld, Drag4D, and PanoWorld-X has enabled the creation of high-quality 3D world models and videos with improved spatial consistency and temporal coherence. Additionally, the introduction of techniques like GaussEdit and 4DGS-Craft has facilitated more efficient and controllable 3D scene editing. The use of neural simulation and progressive 3D unfolding has also been explored in NeoWorld, allowing for the generation of interactive 3D virtual worlds from a single input image. Furthermore, research has investigated the potential of world models to benefit vision-language models, with promising results in spatial reasoning and multi-frame reasoning. Some particularly noteworthy papers in this area include FantasyWorld, which introduces a geometry-enhanced framework for joint modeling of video latents and an implicit 3D field, and PanoWorld-X, which proposes a novel framework for high-fidelity and controllable panoramic video generation. Overall, these advances have the potential to enable a wide range of applications, from AR/VR content creation and robotic navigation to video editing and generation.

Sources

FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction

DiTraj: training-free trajectory control for video diffusion transformer

Drag4D: Align Your Motion with Text-Driven 3D Scene Generation

UniVid: The Open-Source Unified Video Model

Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers

REALIGN: Regularized Procedure Alignment with Matching Video Embeddings via Partial Gromov-Wasserstein Optimal Transport

NeoWorld: Neural Simulation of Explorable Virtual Worlds via Progressive 3D Unfolding

PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion

GaussEdit: Adaptive 3D Scene Editing with Text and Image Prompts

Stitch: Training-Free Position Control in Multimodal Diffusion Transformers

Can World Models Benefit VLMs for World Dynamics?

EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory

4DGS-Craft: Consistent and Interactive 4D Gaussian Splatting Editing

TempoControl: Temporal Attention Guidance for Text-to-Video Models

Built with on top of