Causal Reasoning in Multimodal Models

The field of multimodal models is moving towards incorporating causal reasoning to improve their performance in real-world scenarios. Researchers are exploring various approaches to induce causal world models in large language models, enabling them to learn cause-and-effect relationships from multimodal data. This is a critical step towards developing more reliable and generalizable AI systems. Noteworthy papers in this area include: Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning, which introduces a novel framework to embed an explicit model of causal physics within an LLM. Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations, which proposes a prompting strategy that scaffolds VLM reasoning through cognitively inspired stages, enhancing interpretability and social awareness in VLMs. Customize Multi-modal RAI Guardrails with Precedent-based predictions, which proposes a method to condition model judgments on precedents, enhancing the flexibility and adaptability of the guardrail. ISO-Bench: Benchmarking Multimodal Causal Reasoning in Visual-Language Models through Procedural Plans, which introduces a benchmark for evaluating causal dependencies between visual observations and procedural text. Causal Reasoning in Pieces: Modular In-Context Learning for Causal Discovery, which introduces a modular in-context pipeline for causal discovery, achieving significant improvements over conventional baselines.

Causal Reasoning in Multimodal Models

Sources