Diffusion-based Methods for 3D Scene Representation and Video Generation

The field of 3D scene representation and video generation is moving towards the development of more sophisticated diffusion-based methods. These methods aim to improve the quality and realism of generated videos and 3D scenes, particularly in scenarios with sparse input views or limited training data. Researchers are exploring the use of guidance score distillation, adaptive begin-of-video tokens, and point-conditioned diffusion models to enhance the performance of video diffusion models. Additionally, there is a growing interest in leveraging structure-aware denoising and semantic 3D motion transfer to improve the realism and consistency of generated videos. Notable papers in this area include:

RealisticDreamer's Guidance Score Distillation for Few-shot Gaussian Splatting, which proposes a framework for extracting multi-view consistency priors from pretrained video diffusion models.
CloseUpShot, which presents a diffusion-based framework for close-up novel view synthesis from sparse inputs via point-conditioned video diffusion.
Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising, which proposes a zero-shot framework for enhancing synthetic video realism by preserving multi-level structures from synthetic videos into the enhanced one.

Diffusion-based Methods for 3D Scene Representation and Video Generation

Sources