Diffusion Models for Image and Language Generation

The field of diffusion models is rapidly advancing, with a focus on improving image and language generation capabilities. Recent developments have led to the creation of more efficient and effective models, such as those using multiplicative denoising score-matching and proximal diffusion neural samplers. These models have shown promising results in generating high-quality images and text, and have the potential to be used in a variety of applications. Noteworthy papers in this area include 'Hyperparameters are all you need' which proposes a training-free algorithm for generating high-quality images, and 'Dale meets Langevin' which introduces a biologically inspired generative model employing multiplicative updates. Additionally, 'Proximal Diffusion Neural Sampler' and 'Principled and Tractable RL for Reasoning with Diffusion Language Models' demonstrate the effectiveness of diffusion models in sampling and reinforcement learning tasks. Overall, the field of diffusion models is moving towards more innovative and advanced techniques, with a focus on improving efficiency, effectiveness, and applicability.

Sources

Hyperparameters are all you need: Using five-step inference for an original diffusion model to generate images comparable to the latest distillation model

Dale meets Langevin: A Multiplicative Denoising Diffusion Model

Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models

Oracle-based Uniform Sampling from Convex Bodies

What Drives Compositional Generalization in Visual Generative Models?

Proximal Diffusion Neural Sampler

Principled and Tractable RL for Reasoning with Diffusion Language Models

Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers

Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion

SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization

An Inertial Langevin Algorithm

OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers

Built with on top of