Advances in Visual Mathematical Problem-Solving and Chart Understanding

The field of multimodal large language models (MLLMs) is moving towards improving their perception and reasoning capabilities, particularly in visual mathematical problem-solving and chart understanding. Researchers are identifying the limitations of current MLLMs in accurately perceiving and interpreting diagrams, and are developing new benchmarks and methodologies to evaluate and improve their performance. The perception bottleneck of MLLMs is being addressed through the development of modular problem-solving pipelines, contrastive learning frameworks, and perception-oriented datasets. Additionally, there is a growing focus on evaluating MLLMs' ability to detect and interpret misleading charts, and to reason about domain-specific charts. Noteworthy papers include: MathFlow, which introduces a modular problem-solving pipeline to optimize perception and inference stages. Benchmarking Visual Language Models on Standardized Visualization Literacy Tests, which reveals substantial difficulties in identifying misleading visualization elements. Unmasking Deceptive Visuals, which introduces a benchmark for evaluating MLLMs on misleading chart question answering. On the Perception Bottleneck of VLMs for Chart Understanding, which enhances the visual encoder to mitigate the perception bottleneck. DomainCQA, which crafts expert-level QA from domain-specific charts. MATHGLANCE, which evaluates mathematical perception in MLLMs and constructs a perception-oriented dataset to improve mathematical reasoning.

Sources

MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems

Benchmarking Visual Language Models on Standardized Visualization Literacy Tests

Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering

On the Perception Bottleneck of VLMs for Chart Understanding

DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts

MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams

Built with on top of