Dysarthric Speech Recognition and Personalized Speech Synthesis

The field of speech recognition and synthesis is moving towards more personalized and accessible solutions, particularly for individuals with dysarthric speech impairments. Recent developments have focused on improving the accuracy and intelligibility of speech recognition systems for dysarthric speakers, as well as creating more effective and realistic text-to-speech systems. Notable advancements include the use of synthetic speech generation, knowledge anchoring, and curriculum learning to enhance the performance of speech recognition and synthesis models. These innovations have the potential to significantly improve the communication abilities of individuals with speech impairments. Noteworthy papers include: Improved Dysarthric Speech to Text Conversion via TTS Personalization, which presents a method for generating synthetic dysarthric speech to fine-tune ASR models. Bridging ASR and LLMs for Dysarthric Speech Recognition, which benchmarks self-supervised ASR models and introduces LLM-based decoding to improve intelligibility. Facilitating Personalized TTS for Dysarthric Speakers Using Knowledge Anchoring and Curriculum Learning, which proposes a knowledge anchoring framework to generate synthetic speech with reduced articulation errors.

Sources

Improved Dysarthric Speech to Text Conversion via TTS Personalization

The 2D+ Dynamic Articulatory Model DYNARTmo: Tongue-Palate Contact Area Estimation

Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches

Iterative refinement, not training objective, makes HuBERT behave differently from wav2vec 2.0

Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling

Facilitating Personalized TTS for Dysarthric Speakers Using Knowledge Anchoring and Curriculum Learning

Built with on top of