Advances in Multilingual Speech Processing

The field of multilingual speech processing is moving towards more efficient and accurate models, with a focus on adapting to low-resource languages and domains. Recent research has explored the use of large language models, self-supervised learning, and data augmentation techniques to improve performance in tasks such as language identification, automatic speech recognition, and speech translation. Notable advances include the development of few-shot learning methods, which enable models to learn from limited labeled data, and the use of grapheme-coherent phonemic and prosodic annotation to improve speech recognition accuracy. Noteworthy papers include:

  • Improving Multilingual Speech Models on ML-SUPERB 2.0, which achieved a 14% relative improvement in LID accuracy and a 30% relative reduction in ASR CER.
  • Fewer Hallucinations, More Verification, which proposed a three-stage LLM-based framework for ASR error correction and achieved 21%, 11%, 9%, and 11.4% relative reductions in CER/WER.

Sources

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC

Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction

MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR

Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering

Acoustically Precise Hesitation Tagging Is Essential for End-to-End Verbatim Transcription Systems

Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care

Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning

LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models

LLM-based phoneme-to-grapheme for phoneme-based speech recognition

IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation

Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering

Built with on top of