The field of digital humanities is witnessing significant advancements in the digitization and analysis of historical texts and music. Recent developments have focused on improving the accuracy and efficiency of optical character recognition (OCR) and optical music recognition (OMR) techniques, enabling the preservation and accessibility of cultural heritage materials. Notably, researchers are exploring the application of large language models (LLMs) and machine learning algorithms to address the challenges posed by diverse text layouts, linguistic variations, and scarce training data. These innovations have the potential to expand the applicability of digital humanities research to underrepresented languages and music traditions.
Some noteworthy papers in this area include: KuiSCIMA v2.0, which presents significant advancements in OMR for historical Chinese musical notations, achieving a remarkable Character Error Rate (CER) of 0.9% for l"ul"upu notations. AI-Driven Generation of Old English, which introduces a scalable framework for generating high-quality Old English texts using advanced LLMs, offering a practical blueprint for revitalizing other endangered languages.