Multimodal Collaborative Filtering and Graph Contrastive Learning

The field of recommendation systems is moving towards a more nuanced understanding of the role of ID features and the importance of graph structure in multimodal collaborative filtering. Recent research has shown that ID features, while effective, have limited benefits and can even hinder generalization to untrained data. As a result, there is a shift towards ID-free approaches that rely on multimodal features and positional encodings to generate semantically meaningful embeddings. Meanwhile, Graph Contrastive Learning (GCL) has emerged as a powerful tool for self-supervised graph representation learning. However, its effectiveness can be limited by the lack of explicit structural commonsense in the graph, and its vulnerability to targeted promotion attacks. To address these challenges, researchers are exploring new frameworks that incorporate structural commonsense into GCL and developing methods to mitigate spectral vulnerabilities. Notable papers in this area include:

IDFREE, which proposes an ID-free multimodal collaborative filtering baseline that outperforms existing ID-based methods.
Str-GCL, which leverages first-order logic rules to represent structural commonsense and integrates them into the GCL framework.
NLGCL, which proposes a novel contrastive learning framework that leverages naturally contrastive views between neighbor layers within GNNs, making it computationally efficient and practical for real-world scenarios.

Multimodal Collaborative Filtering and Graph Contrastive Learning

Sources