Advances in Audio-Visual Speech Processing and Deepfake Detection

The field of audio-visual speech processing is witnessing significant advancements, with a focus on improving speech recognition, voice conversion, and deepfake detection. Researchers are exploring innovative approaches to address challenges such as timbre leakage, speaker privacy, and visual disturbances. The use of dual attention mechanisms, flow matching, and landmark-guided visual feature extractors is becoming increasingly popular. Additionally, there is a growing emphasis on developing robust deepfake detection methods, including audio-visual speech representation learning and ensemble-based approaches. Noteworthy papers in this area include: DAFMSVC, which proposes a novel approach to singing voice conversion using dual attention mechanisms and flow matching. SEF-MK, which introduces a speaker-embedding-free framework for voice anonymization through multi-k-means quantization. AD-AVSR, which presents a new audio-visual speech recognition framework based on bidirectional modality enhancement. SpeechForensics, which leverages audio-visual speech representation learning for face forgery detection. Fake Speech Wild, which proposes a new dataset and benchmark for detecting deepfake speech on social media platforms.

Sources

DAFMSVC: One-Shot Singing Voice Conversion with Dual Attention Mechanism and Flow Matching

SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization

Landmark Guided Visual Feature Extractor for Visual Speech Recognition with Limited Resource

AD-AVSR: Asymmetric Dual-stream Enhancement for Robust Audio-Visual Speech Recognition

Pindrop it! Audio and Visual Deepfake Countermeasures for Robust Detection and Fine Grained-Localization

SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection

Fake Speech Wild: Detecting Deepfake Speech on Social Media Platform

Ensembling Synchronisation-based and Face-Voice Association Paradigms for Robust Active Speaker Detection in Egocentric Recordings

Built with on top of