The field of language modeling is moving towards the development of more efficient and effective methods for constrained decoding, enabling language models to produce samples that satisfy specific constraints while maintaining high-quality and diversity. Recent work has focused on addressing the limitations of existing constrained-decoding approaches, such as distortion of the underlying model distribution, and has proposed novel frameworks and algorithms to ensure principled and efficient exploration of the constrained space. Notably, innovations in Markov Chain Monte Carlo (MCMC) methods and signal temporal logic have improved the ability of language models to generate samples that satisfy hard constraints and provide reliable uncertainty estimates. Additionally, sparse attention mechanisms and temporally restricted inference algorithms have been introduced to enhance the accuracy and efficiency of language models in various tasks. Noteworthy papers include: Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective, which proposes a new constrained sampling framework based on MCMC that satisfies three core desiderata: constraint satisfying, monotonically converging, and efficient. Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic, which introduces a structured framework that models stepwise confidence as a temporal signal and evaluates it using signal temporal logic to provide more reliable uncertainty estimates. TRIDENT: Temporally Restricted Inference via DFA-Enhanced Neural Traversal, which introduces a general and model-agnostic inference-time algorithm that guarantees compliance with temporal constraints without requiring any retraining. Resa: Transparent Reasoning Models via SAEs, which proposes a novel and efficient sparse autoencoder tuning procedure to elicit strong reasoning in language models.