Advances in Speech Representation and Recognition

The field of speech processing is moving towards more efficient and effective representation and recognition of speech. Recent research has focused on developing innovative methods for compressing semantic speech representations, improving automatic speech recognition (ASR) systems, and enhancing the robustness of contextual ASR models. Notably, entropy-based approaches have been proposed to dynamically aggregate speech tokens, reducing redundancy and improving efficiency in downstream tasks. Additionally, phoneme-aware encoding and contrastive entity disambiguation have been introduced to improve ASR performance, particularly in handling domain-specific named entities and homophones. Furthermore, researchers have explored the use of TV subtitles as context-rich prompts for weakly supervised ASR training, demonstrating significant improvements in transcription accuracy. Other notable developments include the proposal of purified semantic correlation joint modeling to alleviate the effects of varying biasing information volumes in contextual ASR. Some particularly noteworthy papers include: The paper on entropy-based coarse and compressed semantic speech representation learning, which demonstrates the effectiveness of the proposed approach in reducing redundancy and improving efficiency. The paper on purified semantic correlation joint modeling, which achieves average relative F1 score improvements of up to 21.34% on AISHELL-1 and 28.46% on KeSpeech.

Advances in Speech Representation and Recognition

Sources