Advances in Music Information Retrieval and Generation

Introduction

The field of music information retrieval and generation has witnessed significant advancements in recent times. Researchers have been exploring various approaches to improve the accuracy and efficiency of music-related tasks such as audio fingerprinting, beat tracking, and music generation.

General Direction

The current trend in the field is towards leveraging deep learning techniques and large datasets to achieve state-of-the-art results. Many researchers are focusing on developing novel architectures and training methods to improve the performance of music generation and retrieval models. Additionally, there is a growing interest in exploring the applications of music information retrieval and generation in real-world scenarios such as music production, recommendation systems, and music therapy.

Noteworthy Papers

  • A paper on fine-tuning MIDI-to-audio alignment using a neural network on piano roll and CQT representations has shown promising results, achieving up to 20% higher alignment accuracy than the industry-standard Dynamic Time Warping method.
  • Another paper on enhancing neural audio fingerprint robustness to audio degradation for music identification has proposed a series of best practices to enhance self-supervision by leveraging musical signal properties and realistic room acoustics, resulting in state-of-the-art performance on both synthetic and real-world datasets.

Sources

Fine-Tuning MIDI-to-Audio Alignment using a Neural Network on Piano Roll and CQT Representations

Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification

VisionScores -- A system-segmented image score dataset for deep learning tasks

TOMI: Transforming and Organizing Music Ideas for Multi-Track Compositions with Full-Song Structure

The Florence Price Art Song Dataset and Piano Accompaniment Generator

Scaling Self-Supervised Representation Learning for Symbolic Piano Performance

Emergent musical properties of a transformer under contrastive self-supervised learning

Gregorian melody, modality, and memory: Segmenting chant with Bayesian nonparametrics

Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture

User-guided Generative Source Separation

Exploring Classical Piano Performance Generation with Expressive Music Variational AutoEncoder

Dance Dance ConvLSTM

Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures

Built with on top of