The field of natural language processing is witnessing a significant shift towards diffusion language models, which offer parallel generation and bidirectional attention capabilities. This movement is driven by the need for more controllable and efficient generation processes. Recent studies have highlighted the limitations of traditional autoregressive models and the potential of diffusion models to overcome these limitations. Notably, research has focused on improving the performance and scalability of diffusion models, including the development of new training and inference strategies. Additionally, there is a growing interest in combining the strengths of autoregressive and diffusion models to achieve better results. Overall, the field is moving towards more innovative and efficient approaches to language modeling. Noteworthy papers include: Beyond Next-Token Prediction, which provides a comprehensive performance study of diffusion and autoregressive language models. Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts, which introduces a novel method for improving the performance of diffusion models at inference time. SDAR: A Synergistic Diffusion-AutoRegression Paradigm, which proposes a new paradigm that combines the strengths of autoregressive and diffusion models.