Advances in Multimodal Learning and Counterfactual Explanations

The field of multimodal learning is rapidly evolving, with a focus on developing more robust and effective methods for text-to-video retrieval, video recommendation, and counterfactual explanations. Recent research has highlighted the importance of capturing intricate interactions between visual and textual modalities, as well as the need for more realistic and actionable counterfactual explanations. The use of large language models and multimodal large language models has shown promise in enhancing video recommendations and generating more accurate counterfactual explanations. Noteworthy papers in this area include: Adversarial Video Promotion Against Text-to-Video Retrieval, which pioneers a new attack against text-to-video retrieval models. Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations, which introduces a simple framework for injecting high-level semantics into video recommendation pipelines. RealAC: A Domain-Agnostic Framework for Realistic and Actionable Counterfactual Explanations, which presents a domain-agnostic framework for generating realistic and actionable counterfactual explanations.

Advances in Multimodal Learning and Counterfactual Explanations

Sources