Advances in Robust Speech Recognition

The field of speech recognition is moving towards developing more robust and trustworthy models, with a focus on improving performance in noisy environments and handling diverse user groups. Recent research has highlighted the importance of internal consistency in models, with techniques such as multi-granularity consistency frameworks showing promising results. Additionally, there is a growing interest in developing models that can handle out-of-vocabulary words, named entity correction, and cross-lingual phoneme recognition. Noteworthy papers include: MGSC, which introduces a model-agnostic framework for enforcing internal self-consistency, and Whisper based Cross-Lingual Phoneme Recognition, which proposes a novel bilingual speech recognition approach for Vietnamese and English. Other notable papers include Attention2Probability, which proposes attention-driven terminology probability estimation, and ReSURE, which introduces an adaptive learning method for regularizing supervision unreliability in multi-turn dialogue fine-tuning.

Advances in Robust Speech Recognition

Sources