Advancements in Diffusion Models for Image and Speech Enhancement

The field of diffusion models is rapidly advancing, with a focus on improving efficiency, quality, and applicability to various tasks. Recent developments have led to the creation of novel frameworks, such as Shortcut Flow Matching for Speech Enhancement, which enables high-quality synthesis in just a few steps using deterministic ordinary differential equation solvers. Additionally, techniques like Discrete Guidance Matching and Stage-wise Dynamics of Classifier-Free Guidance have been proposed to improve the sampling efficiency and quality of diffusion models. Noteworthy papers in this area include 'Shortcut Flow Matching for Speech Enhancement', which achieves a real-time factor of 0.013 on a consumer GPU while delivering perceptual quality comparable to a strong diffusion baseline, and 'HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models', which consistently improves image quality across diverse models and architectures.

Sources

Shortcut Flow Matching for Speech Enhancement: Step-Invariant flows via single stage training

Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching

Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models

HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models

LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

Diffusion Models are Kelly Gamblers

SAIP: A Plug-and-Play Scale-adaptive Module in Diffusion-based Inverse Problems

RIFLE: Removal of Image Flicker-Banding via Latent Diffusion Enhancement

GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models

TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion

ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On

EVODiff: Entropy-aware Variance Optimized Diffusion Inference

Diffusion Alignment as Variational Expectation-Maximization

Learn to Guide Your Diffusion Model

Temporal Score Rescaling for Temperature Sampling in Diffusion and Flow Models

FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring

Test-Time Anchoring for Discrete Diffusion Posterior Sampling