Advances in Speech Representation and Recognition

The field of speech processing is moving towards more efficient and effective representation and recognition of speech. Recent research has focused on developing innovative methods for compressing semantic speech representations, improving automatic speech recognition (ASR) systems, and enhancing the robustness of contextual ASR models. Notably, entropy-based approaches have been proposed to dynamically aggregate speech tokens, reducing redundancy and improving efficiency in downstream tasks. Additionally, phoneme-aware encoding and contrastive entity disambiguation have been introduced to improve ASR performance, particularly in handling domain-specific named entities and homophones. Furthermore, researchers have explored the use of TV subtitles as context-rich prompts for weakly supervised ASR training, demonstrating significant improvements in transcription accuracy. Other notable developments include the proposal of purified semantic correlation joint modeling to alleviate the effects of varying biasing information volumes in contextual ASR. Some particularly noteworthy papers include: The paper on entropy-based coarse and compressed semantic speech representation learning, which demonstrates the effectiveness of the proposed approach in reducing redundancy and improving efficiency. The paper on purified semantic correlation joint modeling, which achieves average relative F1 score improvements of up to 21.34% on AISHELL-1 and 28.46% on KeSpeech.

Sources

Entropy-based Coarse and Compressed Semantic Speech Representation Learning

PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation

Contextualized Token Discrimination for Speech Search Query Correction

Refining Transcripts With TV Subtitles by Prompt-Based Weakly Supervised Training of ASR

An Empirical Analysis of Discrete Unit Representations in Speech Language Modeling Pre-training

New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Transfer in ASR

Enhancing the Robustness of Contextual ASR to Varying Biasing Information Volumes Through Purified Semantic Correlation Joint Modeling

Built with on top of