Advances in Large Language Models for Human Motion and Animation

The field of large language models is moving towards more sophisticated applications in human motion and animation. Recent developments have shown promising results in using large language models to generate and control 3D avatar animations, with a focus on improving planning performance and handling multi-step movements. Additionally, there is a growing interest in leveraging large language models for text-driven map animation prototyping and embodied spatial-temporal reasoning. Notably, the incorporation of long-term spatial-temporal memory in large language models has been identified as a key area for improvement. Overall, the field is advancing towards more nuanced and realistic animations, with potential applications in virtual and augmented reality. Noteworthy papers include:

  • A paper introducing MapStory, an LLM-powered animation authoring tool that generates editable map animation sequences directly from natural language text.
  • A paper proposing 3DLLM-Mem, a novel dynamic memory management and fusion model for embodied spatial-temporal reasoning and actions in LLMs, which achieves state-of-the-art performance across various tasks.
  • A paper presenting a data-driven framework for quality assessment of 3D human animation, leveraging a novel dataset and achieving a correlation of 90% with subjective realism evaluation scores.

Sources

How Much Do Large Language Models Know about Human Motion? A Case Study in 3D Avatar Control

MapStory: LLM-Powered Text-Driven Map Animation Prototyping with Human-in-the-Loop Editing

3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model

Quality assessment of 3D human animation: Subjective and objective evaluation

Built with on top of