The field of speech recognition and processing is rapidly advancing, driven by innovations in deep learning and large language models. One of the key trends is the development of more efficient and effective architectures for speech recognition, such as the use of dynamic thinking mechanisms and end-to-end approaches. These models are able to learn complex patterns in speech data and achieve state-of-the-art performance on a range of tasks, including speech recognition, speech synthesis, and speech translation. Another area of research is the application of speech recognition and processing to real-world problems, such as hearing assessment, speech rehabilitation, and emotion recognition. Noteworthy papers in this area include SALMONN-omni, which introduces a novel standalone speech LLM for full-duplex conversation, and UniTTS, which proposes an end-to-end TTS system without decoupling of acoustic and semantic information. Reverse-Speech-Finder is also an interesting paper that introduces a neural network backtracking architecture for generating Alzheimer's disease speech samples and improving diagnosis performance. Overall, the field of speech recognition and processing is making rapid progress, with new architectures, techniques, and applications being developed and explored.
Advances in Speech Recognition and Processing
Sources
Reverse-Speech-Finder: A Neural Network Backtracking Architecture for Generating Alzheimer's Disease Speech Samples and Improving Diagnosis Performance
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
Leveraging LLM for Stuttering Speech: A Unified Architecture Bridging Recognition and Event Detection
Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR
Advancing Hearing Assessment: An ASR-Based Frequency-Specific Speech Test for Diagnosing Presbycusis