Advances in Speech Processing and Privacy

The field of speech processing is rapidly advancing, with a growing focus on improving the accuracy and robustness of speech recognition systems. Recent studies have shown that speaker attribution can be surprisingly resilient to word-level transcription errors, and that automatic speech recognition systems can even capture speaker-specific features that are revealing of speaker identity. Another area of research is the development of methods for detecting stress and other emotional states from speech, with potential applications in fields such as air traffic control. Additionally, there is a growing concern about the privacy risks associated with speech processing, including the ability to infer sensitive personal attributes from audio data.

Noteworthy papers in this area include: The Impact of Automatic Speech Transcription on Speaker Attribution, which investigates the impact of automatic transcription on speaker attribution performance. The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents, which introduces a novel framework for enhancing inference capabilities of sensitive attributes from audio data.

Sources

The Impact of Automatic Speech Transcription on Speaker Attribution

On Barriers to Archival Audio Processing

Less Stress, More Privacy: Stress Detection on Anonymized Speech of Air Traffic Controllers

Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning

SC-TSE: Speaker Consistency-Aware Target Speaker Extraction

The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents

Supporting SEN\'{C}OTEN Language Documentation Efforts with Automatic Speech Recognition

Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison

AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation

Built with on top of