Accelerating Language Generation with Diffusion Models

The field of natural language processing is witnessing a significant shift towards diffusion-based language models, which offer a promising alternative to traditional autoregressive models. Recent developments have focused on improving the efficiency and accuracy of diffusion models, with a particular emphasis on parallel decoding strategies. These advancements have led to notable speedups in language generation, making diffusion models a competitive choice for real-world applications. Noteworthy papers in this area include Self-Speculative Biased Decoding, which achieves up to 1.7x speedup compared to conventional auto-regressive re-translation, and Learning to Parallel Decode, which achieves up to 22.58x speedup without any performance drop. Additionally, papers like Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct and Free Draft-and-Verification have demonstrated the potential for lossless parallel decoding, further accelerating language generation. Overall, the field is moving towards more efficient and accurate language generation models, with diffusion models at the forefront of this development.

Sources

Self-Speculative Biased Decoding for Faster Live Translation

Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies

On the Complexity Theory of Masked Discrete Diffusion: From $\mathrm{poly}(1/\epsilon)$ to Nearly $\epsilon$-Free

SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation

Sequential Diffusion Language Models

SparseD: Sparse Attention for Diffusion Language Models

Double Descent as a Lens for Sample Efficiency in Autoregressive vs. Discrete Diffusion Models

Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct

Learning to Parallel: Accelerating Diffusion Large Language Models via Adaptive Parallel Decoding

Fast-dLLM v2: Efficient Block-Diffusion LLM

AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

dParallel: Learnable Parallel Decoding for dLLMs

Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models

Authentic Discrete Diffusion Model

Coarse scrambling for Sobol' and Niederreiter sequences