Advances in Automatic Speech Recognition

The field of automatic speech recognition (ASR) is moving towards improved recognition of rare words and out-of-domain vocabulary. Researchers are exploring innovative approaches such as contextual biasing, keyword-aware cost functions, and pronunciation-aware modeling to enhance ASR performance. Notably, the integration of large language models (LLMs) and reinforcement learning is also being investigated to achieve state-of-the-art results. Furthermore, there is a growing interest in developing more efficient and accurate post-editing methods for ASR outputs. Some noteworthy papers in this area include: Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function, which proposes a novel loss function to improve rare word recognition. PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition, which introduces a two-stage learning paradigm to address pronunciation modeling and homophone discrimination challenges. Context-Enhanced Granular Edit Representation for Efficient and Accurate ASR Post-editing, which presents a compact edit representation for highly accurate and efficient ASR post-editing.

Advances in Automatic Speech Recognition

Sources