Novel View Synthesis Advancements

The field of novel view synthesis is moving towards more robust and flexible methods, with a focus on addressing the challenges of generating coherent and consistent views, especially in cases where the input view is limited or the desired output view is significantly different. Recent developments have explored the use of diffusion models, autoregressive formulations, and token disentanglement to improve the quality and diversity of generated views. Additionally, the incorporation of synthetic data and reference features has shown promise in enhancing the performance of novel view synthesis models. Notably, some papers have proposed innovative approaches to address the limitations of existing methods, such as handling loop closure scenarios and supporting arbitrary input-output view configurations. Noteworthy papers include: Look Beyond, which proposes a two-stage scene view generation method that ensures long-term view and scene consistency. CausNVS, which introduces an autoregressive multi-view diffusion model that supports flexible novel view synthesis and achieves strong visual quality. UniView, which enhances novel view synthesis by unifying reference features and leveraging a multimodal large language model to select reference images. Scaling Transformer-Based Novel View Synthesis Models, which incorporates synthetic data and token disentanglement to improve the scalability and performance of transformer-based models.

Sources

Look Beyond: Two-Stage Scene View Generation via Panorama and Video Diffusion

Fake & Square: Training Self-Supervised Vision Transformers with Synthetic Data and Synthetic Hard Negatives

UniView: Enhancing Novel View Synthesis From A Single Image By Unifying Reference Features

CausNVS: Autoregressive Multi-view Diffusion for Flexible 3D Novel View Synthesis

Scaling Transformer-Based Novel View Synthesis Models with Token Disentanglement and Synthetic Data

Built with on top of