Multimodal Music Information Retrieval

The field of Music Information Retrieval (MIR) is shifting towards a more comprehensive understanding of music by incorporating multiple modalities such as audio, lyrics, and visual data. This trend is driven by the development of multimodal datasets and methods that enable large-scale analyses and statistically sound results. Researchers are exploring new sources of data, such as community-driven resources and historical prints, to improve the accuracy and diversity of MIR models. The use of web toolkits and pipelines is also becoming increasingly popular to streamline the acquisition and annotation of multimodal data. Noteworthy papers in this area include:

  • Osu2MIR, which introduces a pipeline for extracting annotations from Osu! beatmaps, providing a scalable and diverse resource for MIR research.
  • Music4All A+A, which presents a multimodal dataset for MIR tasks based on music artists and albums, allowing for analyses at multiple levels of granularity.

Sources

Osu2MIR: Beat Tracking Dataset Derived From Osu! Data

Beyond Bars: Distribution of Edit Operations in Historical Prints

Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks

Two Web Toolkits for Multimodal Piano Performance Dataset Acquisition and Fingering Annotation

Built with on top of