Advances in Speech Enhancement and Translation

The field of speech processing is moving towards the development of more efficient and effective algorithms for speech enhancement and translation. Researchers are exploring the use of neural networks and attention mechanisms to improve the quality of speech enhancement in noisy environments. Additionally, there is a growing interest in developing systems that can handle multichannel audio and real-time processing. Noteworthy papers in this area include NeuralPMWF, which uses a low-latency neural network to control a parameterized multi-channel Wiener filter, resulting in significantly better perceptual and objective speech enhancement. MeMo is another notable work, which proposes a novel framework for real-time audio-visual speaker extraction under impaired visual conditions, achieving SI-SNR improvements of at least 2 dB over the corresponding baseline.

Sources

Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network

MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions

Neuro-MSBG: An End-to-End Neural Model for Hearing Loss Simulation

Multichannel Keyword Spotting for Noisy Conditions

LABNet: A Lightweight Attentive Beamforming Network for Ad-hoc Multichannel Microphone Invariant Real-Time Speech Enhancement

Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models

Built with on top of