The field of multimodal reasoning and narrative understanding is moving towards more structured and coherent representations of knowledge. Researchers are exploring new methods to align user understanding with domain knowledge, prune lengthy generic output, and generate effective reasoning threads. One notable direction is the use of semantic hierarchies, graph neural networks, and reward-guided strategies to enhance user understanding and outperform state-of-the-art reasoning models. Another area of focus is the development of frameworks that can balance between grounded and narrative aspects of visual storytelling, leveraging vision-to-language and language-to-language methods to generate coherent narratives. Additionally, there is a growing interest in semantic normalization techniques to reduce annotation noise and improve the robustness of symbolic narrative graphs. Noteworthy papers include:
- A prototype-inspired framework that proposes a two-phases Reasoning-Threads-Evaluation approach to address knowledge discrepancies in interactive scenarios.
- A semantic normalization framework for hierarchical narrative knowledge graphs that consolidates semantically related actions and events using lexical similarity and embedding-based clustering.
- A multifarious evaluation of a visual storytelling approach that integrates captioning and storytelling under a unified framework, showing a positive impact on the quality of the produced stories.