Advances in Music Generation and Audio Applications

The field of music generation and audio applications is rapidly evolving, with a focus on developing more controllable and adaptive systems. Researchers are exploring new modeling paradigms, such as diffusion models and flow-matching, to improve the quality and diversity of generated music and audio. Another key area of research is fine-grained control over music generation, with methods such as activation steering and text prompts being used to control aspects such as timbre, style, and genre. Additionally, there is a growing interest in adapting audio generation to environmental contexts, with models being developed to jointly generate speech and background audio. Noteworthy papers in this area include AffectMachine-Pop, which presents a controllable expert system for generating retro-pop music, and UmbraTTS, which introduces a flow-matching based TTS model that generates speech and environmental audio. BemaGANv2 is also notable for its tutorial-style survey and implementation guide of GAN-based vocoders for long-term audio generation.

Sources

AffectMachine-Pop: A controllable expert system for real-time pop music generation

A Review on Score-based Generative Models for Audio Applications

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation

BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation

UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching

Fine-Grained control over Music Generation with Activation Steering

ScoreMix: Improving Face Recognition via Score Composition in Diffusion Generators

BNMusic: Blending Environmental Noises into Personalized Music

Built with on top of