Advances in Speech Analysis and Generation

The field of speech analysis and generation is moving towards a more nuanced understanding of human communication, incorporating not only explicit semantics but also implicit cues, emotions, and contexts. Researchers are exploring new frameworks and models that can capture the complexities of human speech, such as pause dynamics, semantic coherence, and paralinguistic features.

Noteworthy papers include: EchoVoices, which presents a digital human pipeline for preserving generational voices and memories, and GOAT-SLM, which introduces a spoken language model with paralinguistic and speaker characteristic awareness. These innovations have the potential to significantly advance the field, enabling more natural and effective human-machine communication.

Sources

Reading Between the Lines: Combining Pause Dynamics and Semantic Coherence for Automated Assessment of Thought Disorder

Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer

EchoVoices: Preserving Generational Voices and Memories for Seniors and Children

A2TTS: TTS for Low Resource Indian Languages

BoSS: Beyond-Semantic Speech

Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability

TELEVAL: A Dynamic Benchmark Designed for Spoken Language Models in Chinese Interactive Scenarios

GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness

Talking to...uh...um...Machines: The Impact of Disfluent Speech Agents on Partner Models and Perspective Taking

System Report for CCL25-Eval Task 10: SRAG-MAV for Fine-Grained Chinese Hate Speech Recognition