Advancements in Music and Audio Generation

The field of music and audio generation is rapidly evolving, with a focus on creating more realistic and expressive virtual instruments, singing voices, and audio environments. Recent developments have centered around improving the quality and consistency of generated audio, particularly in terms of timbre, pitch, and spatial accuracy. Innovations in flow matching, style transfer, and generative models have enabled the creation of more realistic and controllable virtual instruments, such as electric guitars and singing voices. Additionally, there have been significant advancements in room impulse response generation, allowing for more immersive and realistic virtual acoustic environments. Noteworthy papers in this area include FlowSynth, which combines distributional flow matching with test-time optimization for high-quality instrument synthesis, and PromptReverb, which generates room impulse responses through latent rectified flow matching. Storycaster, an AI system for immersive room-based storytelling, has also shown promising results in creating interactive and responsive storytelling environments.

Sources

FlowSynth: Instrument Generation Through Distributional Flow Matching and Test-Time Search

StylePitcher: Generating Style-Following and Expressive Pitch Curves for Versatile Singing Tasks

GuitarFlow: Realistic Electric Guitar Synthesis From Tablatures via Flow Matching and Style Transfer

Beyond Reality: Designing Personal Experiences and Interactive Narratives in AR Theater

VietLyrics: A Large-Scale Dataset and Models for Vietnamese Automatic Lyrics Transcription

PromptReverb: Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching

Storycaster: An AI System for Immersive Room-Based Storytelling

Binaspect -- A Python Library for Binaural Audio Analysis, Visualization & Feature Generation

Built with on top of