Advances in Music Generation and Transcription

The field of music generation and transcription is rapidly evolving, with a focus on developing models that can produce high-quality, coherent music and accurately transcribe musical pieces. Recent research has explored the use of diffusion-based models, transformer architectures, and multi-agent systems to improve the quality and controllability of music generation. Additionally, there has been a push to incorporate more expressive and nuanced aspects of music, such as playing technique and performance style, into transcription and generation models. Noteworthy papers in this area include: MusicWeaver, which presents a music generation model conditioned on a beat-aligned structural plan, enabling professional and localized edits. Noise-to-Notes, which introduces a diffusion-based framework for automatic drum transcription, offering a flexible speed-accuracy trade-off and strong inpainting capabilities. VioPTT, which proposes a lightweight model for transcribing violin playing technique in addition to pitch onset and offset, demonstrating strong generalization to real-world note-level violin technique recordings. Disentangling Score Content and Performance Style for Joint Piano Rendering and Transcription, which presents a unified framework for jointly modeling expressive performance rendering and automatic piano transcription, achieving competitive performance on both tasks. An Agent-Based Framework for Automated Higher-Voice Harmony Generation, which introduces a multi-agent system for generating harmony in a collaborative and modular fashion, effectively mimicking the collaborative process of human musicians. Discovering Words in Music, which presents an unsupervised machine learning algorithm for identifying recurring patterns in symbolic music data, enabling computers to extract basic building blocks from music data and facilitating structural analysis and sparse encoding. SAGE-Music, which proposes a low-latency symbolic music generation model via attribute-specialized key-value head sharing, achieving a 30% inference speedup with only a negligible quality drop.

Sources

MusicWeaver: Coherent Long-Range and Editable Music Generation from a Beat-Aligned Structural Plan

Noise-to-Notes: Diffusion-based Generation and Refinement for Automatic Drum Transcription

Time-Shifted Token Scheduling for Symbolic Music Generation

VioPTT: Violin Technique-Aware Transcription from Synthetic Data Augmentation

Disentangling Score Content and Performance Style for Joint Piano Rendering and Transcription

An Agent-Based Framework for Automated Higher-Voice Harmony Generation

Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music

Enhanced Automatic Drum Transcription via Drum Stem Source Separation

Learning Relationships Between Separate Audio Tracks for Creative Applications

HNote: Extending YNote with Hexadecimal Encoding for Fine-Tuning LLMs in Music Modeling

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing

Built with on top of