Physics-Informed Video Generation

The field of video generation is moving towards incorporating physical laws and principles to improve the realism and coherence of generated videos. Recent works have focused on developing frameworks that can enforce Newtonian mechanics, such as constant-acceleration dynamics and mass conservation, to generate more physically plausible videos. Another direction is to improve the evaluation of video generation models, with benchmarks that assess their ability to reason about physical phenomena and generate videos that are consistent with scientific laws. Noteworthy papers include Post-Training Newton's Laws with Verifiable Rewards, which proposes a physics-grounded post-training framework for video generation, and PhyVLLM, which incorporates physical motion modeling into video language models. These advancements have the potential to significantly improve the quality and realism of generated videos, and to enable more accurate modeling of real-world phenomena.

Sources

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

Seeing the Wind from a Falling Leaf

Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models

PAI-Bench: A Comprehensive Benchmark For Physical AI

Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now

FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model Judges

RULER-Bench: Probing Rule-based Reasoning Abilities of Next-level Video Generation Models for Vision Foundation Intelligence

Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench

Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis

PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation