Multimodal Music Information Retrieval

The field of Music Information Retrieval (MIR) is shifting towards a more comprehensive understanding of music by incorporating multiple modalities such as audio, lyrics, and visual data. This trend is driven by the development of multimodal datasets and methods that enable large-scale analyses and statistically sound results. Researchers are exploring new sources of data, such as community-driven resources and historical prints, to improve the accuracy and diversity of MIR models. The use of web toolkits and pipelines is also becoming increasingly popular to streamline the acquisition and annotation of multimodal data. Noteworthy papers in this area include:

Osu2MIR, which introduces a pipeline for extracting annotations from Osu! beatmaps, providing a scalable and diverse resource for MIR research.
Music4All A+A, which presents a multimodal dataset for MIR tasks based on music artists and albums, allowing for analyses at multiple levels of granularity.

Multimodal Music Information Retrieval

Sources