Advances in Acoustic Signal Processing and Spatial Audio

The field of acoustic signal processing and spatial audio is rapidly advancing, with a focus on developing innovative methods for sound source localization, audio-visual sound source localization, and generation of immersive spatial audio. Researchers are exploring the use of deep learning techniques, such as physics-informed neural networks and latent diffusion models, to improve the accuracy and efficiency of these methods. Additionally, there is a growing interest in developing self-supervised and semi-supervised approaches to reduce the need for large labeled datasets. These advancements have the potential to significantly enhance the performance of sound localization systems and immersive audio applications. Noteworthy papers include:

  • Latent Acoustic Mapping for Direction of Arrival Estimation, which introduces a self-supervised framework for acoustic mapping that bridges the interpretability of traditional methods with the adaptability of deep learning methods.
  • SonicMotion, which proposes an end-to-end model for generating dynamic spatial audio soundscapes with latent diffusion models.
  • VP-SelDoA, which introduces a novel task of cross-instance audio-visual localization and proposes a semantic-level modality fusion approach to tackle this challenge.

Sources

Feature Geometry for Stereo Sidescan and Forward-looking Sonar

Physics-Informed Direction-Aware Neural Acoustic Fields

Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach

SonicMotion: Dynamic Spatial Audio Soundscapes with Latent Diffusion Models

VP-SelDoA: Visual-prompted Selective DoA Estimation of Target Sound via Semantic-Spatial Matching

Built with on top of