The field of audio analysis is moving towards more interpretable and efficient methods, with a focus on multimodal features and semantic-aware approaches. Researchers are leveraging techniques from signal processing, deep learning, and natural language processing to improve the transparency and usability of audio tagging systems. Noteworthy papers in this area include:
- A study on semantic-aware interpretable multimodal music auto-tagging, which presents a framework for music auto-tagging that leverages groups of musically meaningful multimodal features and achieves competitive tagging performance while offering a deeper understanding of the decision-making process.
- A paper on spectrotemporal modulation, which proposes a novel approach centered on spectrotemporal modulation features that mimics the neurophysiological representation in the human auditory cortex, showing promising results for audio classification and interpretability.