The field of acoustic signal processing and spatial audio is rapidly advancing, with a focus on developing innovative methods for sound source localization, audio-visual sound source localization, and generation of immersive spatial audio. Researchers are exploring the use of deep learning techniques, such as physics-informed neural networks and latent diffusion models, to improve the accuracy and efficiency of these methods. Additionally, there is a growing interest in developing self-supervised and semi-supervised approaches to reduce the need for large labeled datasets. These advancements have the potential to significantly enhance the performance of sound localization systems and immersive audio applications. Noteworthy papers include:
- Latent Acoustic Mapping for Direction of Arrival Estimation, which introduces a self-supervised framework for acoustic mapping that bridges the interpretability of traditional methods with the adaptability of deep learning methods.
- SonicMotion, which proposes an end-to-end model for generating dynamic spatial audio soundscapes with latent diffusion models.
- VP-SelDoA, which introduces a novel task of cross-instance audio-visual localization and proposes a semantic-level modality fusion approach to tackle this challenge.