Advancements in Speech and Language Models

The field of speech and language processing is moving towards the development of more integrated and effective models. Recent research has focused on improving the performance of Large Language Models (LLMs) in speech-based tasks, such as Automatic Pronunciation Assessment and Mispronunciation Detection and Diagnosis. The use of techniques like Low-Rank Adaptation (LoRA) and Reinforced Behavior Alignment (RBA) has shown promising results in enhancing the language generation proficiency of SpeechLMs. Additionally, there is a growing interest in developing parameter-efficient adapters and feature distillation methods to reduce the computational burden and improve the performance of Spoken Language Understanding (SLU) systems. Noteworthy papers include: English Pronunciation Evaluation without Complex Joint Training, which demonstrates the effectiveness of LoRA fine-tuning for simultaneous APA and MDD tasks. Enhancing Speech Large Language Models through Reinforced Behavior Alignment, which introduces the RBA framework for improving the instruction-following capabilities of SpeechLMs.

Sources

English Pronunciation Evaluation without Complex Joint Training: LoRA Fine-tuned Speech Multimodal LLM

Mitigating Data Imbalance in Automated Speaking Assessment

Comparison of End-to-end Speech Assessment Models for the NOCASA 2025 Challenge

Enhancing Speech Large Language Models through Reinforced Behavior Alignment

SpeechLLM: Unified Speech and Language Model for Enhanced Multi-Task Understanding in Low Resource Settings

Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition

AFD-SLU: Adaptive Feature Distillation for Spoken Language Understanding

Built with on top of