Advancements in Speech and Language Models

The field of speech and language processing is moving towards the development of more integrated and effective models. Recent research has focused on improving the performance of Large Language Models (LLMs) in speech-based tasks, such as Automatic Pronunciation Assessment and Mispronunciation Detection and Diagnosis. The use of techniques like Low-Rank Adaptation (LoRA) and Reinforced Behavior Alignment (RBA) has shown promising results in enhancing the language generation proficiency of SpeechLMs. Additionally, there is a growing interest in developing parameter-efficient adapters and feature distillation methods to reduce the computational burden and improve the performance of Spoken Language Understanding (SLU) systems. Noteworthy papers include: English Pronunciation Evaluation without Complex Joint Training, which demonstrates the effectiveness of LoRA fine-tuning for simultaneous APA and MDD tasks. Enhancing Speech Large Language Models through Reinforced Behavior Alignment, which introduces the RBA framework for improving the instruction-following capabilities of SpeechLMs.

Advancements in Speech and Language Models

Sources