Music Generation and Analysis

The field of music generation and analysis is rapidly evolving, with a focus on developing more sophisticated and expressive models. Recent research has explored the use of multi-modal inputs, such as images and text, to generate music that is semantically consistent and perceptually natural. Additionally, there is a growing interest in detecting synthetic music and evaluating the quality of generated music. Noteworthy papers in this area include Art2Music, which proposes a lightweight cross-modal framework for generating music from artistic images and user comments, and Melody or Machine, which introduces a novel dual-stream detection architecture for detecting synthetic music. Other notable works include Story2MIDI, which generates emotion-aligned music from text, and Pianist Transformer, which achieves state-of-the-art performance in expressive piano performance rendering via scalable self-supervised pre-training. These advancements have the potential to revolutionize the field of music generation and analysis, enabling the creation of more realistic and engaging music experiences.

Sources

Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment

Melody or Machine: Detecting Synthetic Music with Dual-Stream Contrastive Learning

Story2MIDI: Emotionally Aligned Music Generation from Text

Continual Learning for Singing Voice Separation with Human in the Loop Adaptation

Generative Multi-modal Feedback for Singing Voice Synthesis Evaluation

Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training

Contract-Governed Training for Earth Observation: Observed Service Agreement Graphs and Coverage-Accuracy Trade-offs

M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis

YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance

YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases

Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service Graphs