The field of document analysis is moving towards a more multilingual and multimodal approach, with a focus on addressing the scarcity of resources for non-English languages and the structural complexity of official publications. Recent developments have led to the creation of large-scale synthetic corpora and benchmark datasets for visual document retrieval, which enable the evaluation of models across textual and multimodal retrieval tasks. These advancements have the potential to improve the accuracy and reliability of document analysis systems, particularly in real-world applications such as financial information retrieval and historical document transcription. Noteworthy papers include:
- Cross-Lingual SynthDocs, which provides a scalable and visually realistic resource for advancing research in multilingual document analysis.
- SDS KoPub VDR, which establishes a challenging and reliable evaluation set for visual document retrieval in Korean public documents.
- DKDS, which introduces a new benchmark dataset for detecting and binarizing degraded Kuzushiji documents with seals.