Advances in Audio Representation Learning and Analysis

The field of audio representation learning and analysis is rapidly advancing, with a focus on developing more efficient and effective methods for processing and understanding audio data. Recent research has explored the use of semantic compression, self-supervised learning, and domain adaptation to improve the performance of audio models. One notable trend is the use of generative models to factorize audio signals into high-level semantic representations, allowing for more efficient compression and analysis. Another area of focus is the development of methods for adapting audio models to new domains and tasks, such as speech recognition and sentiment analysis. Noteworthy papers in this area include: A Novel Semantic Compression Approach for Ultra-low Bandwidth Voice Communication, which proposes a novel semantic communications approach to achieve lower bitrates without sacrificing perceptual quality. SONAR: Self-Distilled Continual Pre-training for Domain Adaptive Audio Representation, which introduces a continual pre-training framework for adapting audio models to new domains.

Sources

A Novel Semantic Compression Approach for Ultra-low Bandwidth Voice Communication

Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations

SONAR: Self-Distilled Continual Pre-training for Domain Adaptive Audio Representation

The Curious Case of Visual Grounding: Different Effects for Speech- and Text-based Language Encoders

A Dimensional Approach to Canine Bark Analysis for Assistance Dog Seizure Signaling

Developing an AI framework to automatically detect shared decision-making in patient-doctor conversations

Codebook-Based Adaptive Feature Compression With Semantic Enhancement for Edge-Cloud Systems

An overview of neural architectures for self-supervised audio representation learning from masked spectrograms

Cuffless Blood Pressure Prediction from Speech Sentences using Deep Learning Methods

Investigating the Representation of Backchannels and Fillers in Fine-tuned Language Models

Built with on top of