Advancements in Speech and Music Processing

The field of speech and music processing is moving towards more sophisticated and innovative methods for improving human-computer interaction, speech restoration, and music performance analysis. Researchers are exploring the integration of nonlinear acoustic computing and reinforcement learning to enhance human-robot interaction, and developing novel frameworks for real-time speech processing and music tracking. Additionally, there is a growing interest in using machine learning and signal processing techniques to improve speech restoration, noise reduction, and music information retrieval. These advancements have the potential to revolutionize various applications, including voice assistants, music education, and speech therapy. Noteworthy papers include: Miipher-2, which introduces a universal speech restoration model for million-hour scale data restoration, and ReverbMiipher, which proposes a generative speech restoration model that preserves and enables control over reverberation characteristics. Pairing Real-Time Piano Transcription with Symbol-level Tracking is also notable, as it presents a novel approach to precise and robust score following.

Sources

GVPT -- A software for guided visual pitch tracking

A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction

MaskClip: Detachable Clip-on Piezoelectric Sensing of Mask Surface Vibrations for Real-time Noise-Robust Speech Input

Practice Support for Violin Bowing by Measuring Bow Pressure and Position

Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration

How to Infer Repeat Structures in MIDI Performances

ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability

Pairing Real-Time Piano Transcription with Symbol-level Tracking for Precise and Robust Score Following

Built with on top of