Advances in Speech Recognition and Synthesis for Low-Resource Languages

The field of speech recognition and synthesis is moving towards developing more inclusive and scalable models for low-resource languages. Researchers are exploring innovative approaches such as continual learning, weakly supervised pretraining, and optimal transport regularization to improve speech-text alignment and mitigate the modality gap. These advancements have the potential to enable more accurate and natural speech recognition and synthesis for diverse languages, including Indian languages, Arabic dialects, and Kurdish dialects. Noteworthy papers in this area include: A Study on Regularization-Based Continual Learning Methods for Indic ASR, which demonstrates the effectiveness of continual learning in mitigating forgetting for Indian languages. Optimal Transport Regularization for Speech Text Alignment in Spoken Language Models, which introduces a novel method to improve speech-text alignment and generalization across datasets. Munsit at NADI 2025 Shared Task 2, which presents a scalable training pipeline for multidialectal Arabic ASR using weakly supervised pretraining and continual supervised fine-tuning. QAMRO, which proposes a quality-aware adaptive margin ranking optimization framework for human-aligned assessment of audio generation systems. UtterTune, which enables controllable pronunciation editing in multilingual text-to-speech systems. Assessing the Feasibility of Lightweight Whisper Models for Low-Resource Urdu Transcription, which evaluates the feasibility of lightweight Whisper models for Urdu speech recognition. Analysis of Domain Shift across ASR Architectures via TTS-Enabled Separation of Target Domain and Acoustic Conditions, which analyzes domain shift across ASR architectures and offers insights into their generalization. Which one Performs Better? Wav2Vec or Whisper? Applying both in Badini Kurdish Speech to Text (BKSTT), which compares the performance of Wav2Vec and Whisper models for Badini Kurdish speech-to-text.

Advances in Speech Recognition and Synthesis for Low-Resource Languages

Sources