The field of document analysis and recognition is moving towards the development of more accurate and robust models that can handle complex document types and languages. Recent research has focused on improving the performance of vision-language models on tasks such as optical character recognition, table recognition, and mathematical formula recognition. The use of reinforcement learning and domain-specific adaptation of general-purpose models has shown promising results. Additionally, the creation of large-scale datasets and benchmarks has enabled more rigorous evaluation and comparison of different models. Noteworthy papers include: Baseer, which introduces a vision-language model fine-tuned for Arabic document OCR, achieving a new state-of-the-art in the domain. CHURRO, which presents an open-weight large vision-language model for high-accuracy, low-cost historical text recognition, outperforming other models on a large historical text recognition dataset.