Advances in Multimodal Reasoning and Large Language Models

The field of multimodal reasoning and large language models is rapidly advancing, with a focus on improving the efficiency, accuracy, and adaptability of these models. Recent developments have seen the introduction of new architectures, such as the Mixture-of-Experts (MoE) model, which enables more efficient and scalable processing of large amounts of data. Additionally, there has been a growing interest in multimodal reasoning, with models being developed to handle multiple forms of input, such as text, images, and speech. Noteworthy papers in this area include the proposal of pre-attention expert prediction for MoE models, which achieves state-of-the-art results on several benchmarks, and the introduction of the PRISM framework for user-centric conversational stance detection, which demonstrates significant gains over strong baselines. Other notable papers include the development of the Uni-MoE-2.0-Omni model, which achieves highly competitive performance on a wide range of benchmarks, and the proposal of the MMD-Thinker framework for multimodal misinformation detection, which achieves state-of-the-art performance on several datasets.

Sources

Pre-Attention Expert Prediction and Prefetching for Mixture-of-Experts Large Language Models

PRISM of Opinions: A Persona-Reasoned Multimodal Framework for User-centric Conversational Stance Detection

More Than Irrational: Modeling Belief-Biased Agents

Learning to Trust: Bayesian Adaptation to Varying Suggester Reliability in Sequential Decision Making

Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

Optimal Foraging in Memory Retrieval: Evaluating Random Walks and Metropolis-Hastings Sampling in Modern Semantic Spaces

MMD-Thinker: Adaptive Multi-Dimensional Thinking for Multimodal Misinformation Detection

HiEAG: Evidence-Augmented Generation for Out-of-Context Misinformation Detection

Ask WhAI:Probing Belief Formation in Role-Primed LLM Agents

Octopus: Agentic Multimodal Reasoning with Six-Capability Orchestration

MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping

ChemLabs on ChemO: A Multi-Agent System for Multimodal Reasoning on IChO 2025

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Robot Metacognition: Decision Making with Confidence for Tool Invention

EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards