Advancements in Human-Computer Interaction and Video Generation

The field of human-computer interaction and video generation is moving towards a more realistic and physically plausible direction. Researchers are focusing on developing datasets and methods that can capture the physical attributes of objects and their impact on human motion, as well as generating videos that adhere to the laws of physics. This is evident in the development of new datasets and metrics that evaluate the physical and perceptual fidelity of human motion generation. Additionally, there is a growing interest in using diffusion transformers and other architectures to generate high-quality, cinematically coherent videos. Noteworthy papers in this area include PA-HOI, which introduces a physics-aware human and object interaction dataset, and Cut2Next, which generates next shots via in-context tuning. PP-Motion is also notable for its physical-perceptual fidelity evaluation for human motion generation. Lay2Story and Story2Board are other significant contributions, with Lay2Story extending diffusion transformers for layout-togglable story generation and Story2Board presenting a training-free approach for expressive storyboard generation. Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation is also a notable work, proposing a novel framework for hierarchical cross-modal direct preference optimization.

Sources

PA-HOI: A Physics-Aware Human and Object Interaction Dataset

Cut2Next: Generating Next Shot via In-Context Tuning

PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation

Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation

Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation

Built with on top of