Advances in Automatic Speech Recognition and Speaker Diarization

The field of automatic speech recognition (ASR) and speaker diarization is experiencing significant advancements, driven by the development of innovative methodologies and architectures. A key direction in this field is the improvement of ASR systems for languages with unique challenges, such as Arabic, and the development of multilingual speech recognition systems. Another important area of research is the application of ASR and speaker diarization in real-world scenarios, including customer relationship management and clinical practice. Noteworthy papers in this area include the introduction of open-source models for Arabic ASR, a comprehensive benchmark suite for speaker diarization, and the development of efficient end-to-end approaches for holistic automatic speaking assessment. These advancements have the potential to improve the accuracy and efficiency of ASR and speaker diarization systems, enabling their wider adoption in various industries and applications.

Sources

Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization

Weak Supervision Techniques towards Enhanced ASR Models in Industry-level CRM Systems

Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge

Application of Whisper in Clinical Practice: the Post-Stroke Speech Assessment during a Naming Task

One Whisper to Grade Them All

The TEA-ASLP System for Multilingual Conversational Speech Recognition and Speech Diarization in MLC-SLM 2025 Challenge

Built with on top of