Advances in Document Analysis and Recognition

The field of document analysis and recognition is moving towards the development of more accurate and robust models that can handle complex document types and languages. Recent research has focused on improving the performance of vision-language models on tasks such as optical character recognition, table recognition, and mathematical formula recognition. The use of reinforcement learning and domain-specific adaptation of general-purpose models has shown promising results. Additionally, the creation of large-scale datasets and benchmarks has enabled more rigorous evaluation and comparison of different models. Noteworthy papers include: Baseer, which introduces a vision-language model fine-tuned for Arabic document OCR, achieving a new state-of-the-art in the domain. CHURRO, which presents an open-weight large vision-language model for high-accuracy, low-cost historical text recognition, outperforming other models on a large historical text recognition dataset.

Sources

mucAI at BAREC Shared Task 2025: Towards Uncertainty Aware Arabic Readability Assessment

Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR

Logics-Parsing Technical Report

CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition

Built with on top of