Advancements in Speech-Language Models

The field of speech-language models is moving towards incorporating contextual paralinguistic understanding and empathetic reasoning, with a focus on developing more effective and natural conversational systems. Recent work has explored the use of novel training methods, such as implicit and explicit approaches to incorporate paralinguistic information, and planning-inspired text guidance to enhance meaningful dialogue generation. Additionally, there is a growing interest in developing unified speech understanding and generation models, which can seamlessly integrate speech understanding and generation capabilities. Noteworthy papers include: Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models, which proposes two approaches to incorporate contextual paralinguistic information into model training, and DualSpeechLM, which presents a dual-token modeling framework that concurrently models understanding-driven speech tokens as input and acoustic tokens as output. OSUM-EChat is also notable, as it introduces a three-stage understanding-driven spoken dialogue training strategy and a linguistic-paralinguistic dual thinking mechanism to enhance empathetic interactions.

Advancements in Speech-Language Models

Sources