Advances in Multimodal Image and Chart Editing

The field of multimodal image and chart editing is rapidly evolving, with a focus on developing more comprehensive and structured approaches to evaluation and modeling. Recent research has highlighted the limitations of existing benchmarks and models in capturing the complexity of image and chart editing tasks, and has introduced new benchmarks and models that address these limitations. Notably, there is a growing emphasis on incorporating knowledge-intensive and cognitive reasoning capabilities into image editing models, as well as on developing more nuanced and structured approaches to chart editing. The development of new benchmarks, such as WiseEdit and ChartAnchor, is facilitating more rigorous evaluation and comparison of models, and is driving innovation in this area. Noteworthy papers include WiseEdit, which introduces a comprehensive benchmark for cognition- and creativity-informed image editing, and ChartAnchor, which proposes a benchmark for chart grounding with structural-semantic fidelity. Additionally, papers such as PPTBench and UnicEdit-10M are introducing new datasets and benchmarks that are advancing the state of the art in multimodal image and chart editing.

Sources

WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing

Charts Are Not Images: On the Challenges of Scientific Chart Editing

ChartAnchor: Chart Grounding with Structural-Semantic Fidelity

Proactive Agentic Whiteboards: Enhancing Diagrammatic Learning

Flowchart2Mermaid: A Vision-Language Model Powered System for Converting Flowcharts into Editable Diagram Code

PPTBench: Towards Holistic Evaluation of Large Language Models for PowerPoint Layout and Design Understanding

UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits

PPTArena: A Benchmark for Agentic PowerPoint Editing

PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

SlideGen: Collaborative Multimodal Agents for Scientific Slide Generation

I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models

Built with on top of