Advances in Automatic Speech Recognition

The field of automatic speech recognition (ASR) is moving towards improved recognition of rare words and out-of-domain vocabulary. Researchers are exploring innovative approaches such as contextual biasing, keyword-aware cost functions, and pronunciation-aware modeling to enhance ASR performance. Notably, the integration of large language models (LLMs) and reinforcement learning is also being investigated to achieve state-of-the-art results. Furthermore, there is a growing interest in developing more efficient and accurate post-editing methods for ASR outputs. Some noteworthy papers in this area include: Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function, which proposes a novel loss function to improve rare word recognition. PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition, which introduces a two-stage learning paradigm to address pronunciation modeling and homophone discrimination challenges. Context-Enhanced Granular Edit Representation for Efficient and Accurate ASR Post-editing, which presents a compact edit representation for highly accurate and efficient ASR post-editing.

Sources

Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function

Efficient Trie-based Biasing using K-step Prediction for Rare Word Recognition

Prominence-aware automatic speech recognition for conversational speech

WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers

FunAudio-ASR Technical Report

PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition

Context-Enhanced Granular Edit Representation for Efficient and Accurate ASR Post-editing

From Hype to Insight: Rethinking Large Language Model Integration in Visual Speech Recognition

Built with on top of