Multimodal Recommendation Systems

The field of recommender systems is moving towards leveraging multimodal data to improve recommendation accuracy and personalized user experience. Research is focusing on developing innovative architectures and methods that can effectively integrate and process multimodal information, such as text, images, and user behavior. Noteworthy papers in this area include PREMISE, which introduces a new architecture for matching-based learning in multimodal fields, and RAGAR, which proposes a retrieval-augmented approach for personalized image generation. Additionally, Tricolore presents a multi-behavior user profiling framework for enhanced candidate generation, while LIRDRec learns item representations directly from multimodal features. These advancements have the potential to significantly improve the performance and diversity of recommender systems.

Sources

PREMISE: Matching-based Prediction for Accurate Review Recommendation

RAGAR: Retrieval Augment Personalized Image Generation Guided by Recommendation

Tricolore: Multi-Behavior User Profiling for Enhanced Candidate Generation in Recommender Systems

Feature Staleness Aware Incremental Learning for CTR Prediction

1$^{st}$ Place Solution of WWW 2025 EReL@MIR Workshop Multimodal CTR Prediction Challenge

Learning Item Representations Directly from Multimodal Features for Effective Recommendation

Built with on top of