Efficient Inference and Modeling in Language and Time Series Applications

The field of language and time series modeling is witnessing a significant shift towards efficient inference and innovative modeling techniques. Researchers are exploring novel methods to accelerate large language model inference, such as speculative decoding and parallel draft model adaptation, which have shown promising results in reducing computational costs and improving performance. Additionally, the integration of multimodal information and conditioned diffusion models is enhancing the accuracy of time series forecasting. The development of asynchronous diffusion models and any-subset autoregressive models is also gaining traction, offering improved performance and flexibility in modeling complex data distributions. Noteworthy papers in this area include: PARD, which introduces a low-cost parallel draft model adaptation method that accelerates LLM inference by up to 4.08x. MCD-TSF, which proposes a multimodal conditioned diffusion model for time series forecasting, achieving state-of-the-art performance on benchmark datasets.

Sources

PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation

Multimodal Conditioned Diffusive Time Series Forecasting

AutoJudge: Judge Decoding Without Manual Annotation

LZ Penalty: An information-theoretic repetition penalty for autoregressive language models

ADiff4TPP: Asynchronous Diffusion Models for Temporal Point Processes

Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding

Built with on top of