The field of multimodal large language models (MLLMs) is rapidly advancing, with a focus on improving education and assessment. Recent developments have seen the introduction of new MLLM architectures and training methods, enabling more effective evaluation of student responses and automated assessment. Notably, the use of MLLMs in educational settings has shown promise in enhancing student engagement and understanding. Furthermore, advancements in explainable AI and transparency have improved the reliability and trustworthiness of AI-driven assessment systems.
Noteworthy papers in this area include: VideoJudge, which introduces a 3B and 7B-sized MLLM judge for evaluating video understanding models, outperforming larger MLLM judge baselines. EduVidQA, which explores using MLLMs to automatically respond to student questions from online lectures, introducing a novel question answering task of real-world significance. ProfVLM, which presents a compact vision-language model for multi-view proficiency estimation, achieving superior accuracy while using up to 20x fewer parameters.