The field of audio processing is moving towards more robust and accurate detection of deepfakes and anomalies. Researchers are exploring new methods to improve the detection of spoofed audio, including the use of contrastive learning and joint learning frameworks. Additionally, there is a growing focus on developing datasets and models that can handle diverse languages and acoustic conditions, particularly in regions such as South-East Asia. Notable papers in this area include:
- CompSpoof, which proposes a dataset and joint learning framework for component-level audio anti-spoofing countermeasures.
- SEA-Spoof, which presents a large-scale dataset for multilingual audio deepfake detection in South-East Asian languages.
- LOTUSDIS, which introduces a Thai far-field meeting corpus for robust conversational ASR. These developments highlight the need for more research in audio deepfake detection and robust ASR, particularly in underserved regions and languages.