Diffusion-Based Models for Image and 3D Generation

The field of image and 3D generation is moving towards the integration of diffusion-based models and other techniques to improve the quality and flexibility of generated content. One of the key areas of development is the use of diffusion-based models for conditional image and 3D generation, which has shown promising results in terms of prompt fidelity and structural accuracy. Another area of focus is the development of methods that can generate high-quality images and 3D models in a single step, reducing the need for multiple iterations and improving overall efficiency. Notable papers in this area include LTM3D, which proposes a latent token space modeling framework for conditional 3D shape generation, and VPD-SR, which introduces a visual perception diffusion distillation framework for one-step image super-resolution. Additionally, FlexPainter is a novel texture generation pipeline that enables flexible multi-modal conditional guidance and achieves highly consistent texture generation. PBR-SR is another method that leverages an off-the-shelf super-resolution model to output high-resolution, high-quality PBR textures from low-resolution input in a zero-shot manner. Text-Aware Real-World Image Super-Resolution via Diffusion Model with Joint Segmentation Decoders proposes a novel diffusion-based SR framework that integrates text-aware attention and joint segmentation decoders to recover not only natural details but also the structural fidelity of text regions in degraded real-world images. SeedVR2 proposes a one-step diffusion-based VR model, which performs adversarial VR training against real data.

Diffusion-Based Models for Image and 3D Generation

Sources