Music Generation and Separation

The field of music generation and separation is moving towards more advanced and innovative techniques. Recent developments have focused on improving the quality and control of music generation, particularly in terms of tonal tension and timbre. The use of diffusion models and latent space representations has shown promise in generating high-quality music and separating individual elements from music mixtures. Additionally, there is a growing interest in generating non-human singing voices and exploring new applications for music generation, such as conversational roleplay and interactive entertainment. Noteworthy papers include: Efficient and Fast Generative-Based Singing Voice Separation, which proposes a latent diffusion model for singing voice separation, and CartoonSing, which introduces a unified framework for generating human and non-human singing voices. DUO-TOK is also notable for its dual-track semantic music tokenizer, which achieves state-of-the-art results in music tagging and language model perplexity.

Music Generation and Separation

Sources