Advances in Temporal Understanding and Multimodal Models

The field of natural language processing and multimodal models is moving towards improving temporal understanding and reasoning capabilities. Recent studies have highlighted the importance of evaluating and enhancing the temporal consistency of large language models, as well as their ability to interpret and reason about time in various contexts.

Noteworthy papers in this area include: Temporal Referential Consistency: Do LLMs Favor Sequences Over Absolute Time References?, which introduces a novel benchmark for evaluating temporal referential consistency in large language models. A Matter of Time: Revealing the Structure of Time in Vision-Language Models, which investigates the temporal awareness of vision-language models and proposes methods to derive an explicit timeline representation from the embedding space.

Sources

Temporal Referential Consistency: Do LLMs Favor Sequences Over Absolute Time References?

Temporal Understanding under Deictic Frame of Reference

MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models

LongInsightBench: A Comprehensive Benchmark for Evaluating Omni-Modal Models on Human-Centric Long-Video Understanding

Disentanglement Beyond Static vs. Dynamic: A Benchmark and Evaluation Framework for Multi-Factor Sequential Representations

Comprehending Spatio-temporal Data via Cinematic Storytelling using Large Language Models

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

When Facts Change: Probing LLMs on Evolving Knowledge with evolveQA

MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for Large Multimodal Models

A Matter of Time: Revealing the Structure of Time in Vision-Language Models

Built with on top of