Advancements in Speech Recognition

The field of speech recognition is moving towards more nuanced and accurate evaluation metrics, beyond traditional word error rates. This shift is driven by the need to better understand and address errors in rare terms, named entities, and domain-specific vocabulary. Researchers are also exploring the internal mechanisms of end-to-end speech recognition pipelines, particularly concerning fairness and efficacy across languages. Furthermore, there is a growing focus on improving the robustness of speech recognition models in noisy environments and for low-resource languages. The development of new datasets and pipelines for constructing speech datasets is also a key area of research. Noteworthy papers include:

  • A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems, which proposes a novel alignment algorithm for more accurate error analysis.
  • EuroSpeech: A Multilingual Speech Corpus, which introduces a scalable pipeline for constructing speech datasets from parliamentary recordings.
  • EvolveCaptions: Empowering DHH Users Through Real-Time Collaborative Captioning, which presents a real-time, collaborative ASR adaptation system for more equitable communication.

Sources

A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems

Beyond WER: Probing Whisper's Sub-token Decoder Across Diverse Language Resource Levels

ASR Under Noise: Exploring Robustness for Sundanese and Javanese

EuroSpeech: A Multilingual Speech Corpus

Automatic Speech Recognition (ASR) for African Low-Resource Languages: A Systematic Literature Review

EvolveCaptions: Empowering DHH Users Through Real-Time Collaborative Captioning

Built with on top of