Music Generation and Separation

The field of music generation and separation is moving towards more advanced and innovative techniques. Recent developments have focused on improving the quality and control of music generation, particularly in terms of tonal tension and timbre. The use of diffusion models and latent space representations has shown promise in generating high-quality music and separating individual elements from music mixtures. Additionally, there is a growing interest in generating non-human singing voices and exploring new applications for music generation, such as conversational roleplay and interactive entertainment. Noteworthy papers include: Efficient and Fast Generative-Based Singing Voice Separation, which proposes a latent diffusion model for singing voice separation, and CartoonSing, which introduces a unified framework for generating human and non-human singing voices. DUO-TOK is also notable for its dual-track semantic music tokenizer, which achieves state-of-the-art results in music tagging and language model perplexity.

Sources

Multidimensional Music Aesthetic Evaluation via Semantically Consistent C-Mixup Augmentation

Explicit Tonal Tension Conditioning via Dual-Level Beam Search for Symbolic Music Generation

DUO-TOK: Dual-Track Semantic Music Tokenizer for Vocal-Accompaniment Generation

Efficient and Fast Generative-Based Singing Voice Separation using a Latent Diffusion Model

SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications

CartoonSing: Unifying Human and Nonhuman Timbres in Singing Generation

Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures

Built with on top of