The field of music information retrieval is moving towards more advanced and user-controllable solutions. Recent developments focus on improving music generation, style transfer, and cross-cultural generalization. Notably, inference-time optimization and adapter-based methods are being explored to enhance model performance and efficiency.
Foundation models are being evaluated for their ability to generalize across diverse musical traditions, with larger models typically outperforming on non-Western music. However, results decline for culturally distant traditions, highlighting the need for further research.
Human preference studies are being conducted to assess the quality of generated music and evaluate the correlation between human preferences and widely used metrics.
Some notable papers in this area include:
- ITO-Master, which introduces a reference-based mastering style transfer system that integrates inference-time optimization to enable finer user control over the mastering process.
- Universal Music Representations, which presents a comprehensive evaluation of foundation models across six musical corpora and achieves state-of-the-art performance on five out of six evaluated datasets.
- Benchmarking Music Generation Models and Metrics via Human Preference Studies, which generates 6k songs using 12 state-of-the-art models and conducts a survey of 15k pairwise audio comparisons with 2.5k human participants to evaluate the correlation between human preferences and widely used metrics.
- Exploring Adapter Design Tradeoffs for Low Resource Music Generation, which studies various adapter configurations for two AI music models and reveals distinct trade-offs between convolution-based and transformer-based adapters.