Advances in Speech and Language Processing

The field of speech and language processing is moving towards more accurate and efficient models, with a focus on capturing context-dependent relationships and improving performance on rare and unseen data. Researchers are exploring new methods for estimating internal language models, disentangling modality-specific mechanisms, and leveraging phrase dictionaries to improve speech translation and automatic speech recognition. Notably, the use of knowledge distillation and contextual biasing is showing promising results in improving model performance. Some noteworthy papers in this area include: The paper on label-context-dependent internal language model estimation for CTC, which proposes novel context-dependent ILM estimation methods and achieves a 13% relative improvement in word error rate. The paper on Same Task, Different Circuits, which investigates the accuracy gap between modalities in vision-language models and proposes a training-free approach to reduce the performance gap. The paper on PHRASED, which proposes a phrase dictionary biasing method for speech translation and achieves an 85% relative improvement in phrase recall. The paper on OWSM-Biasing, which integrates contextual biasing with Open Whisper-Style Speech Models and improves the biasing word error rate by 11.6 points.

Sources

Label-Context-Dependent Internal Language Model Estimation for CTC

Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

PHRASED: Phrase Dictionary Biasing for Speech Translation

OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary

Modeling Probabilistic Reduction using Information Theory and Naive Discriminative Learning

Built with on top of