The field of video generation and editing is rapidly advancing, with a focus on improving the coherence and consistency of generated videos. Recent research has explored the use of diffusion models, graph structures, and attention mechanisms to enhance the quality and controllability of video generation. One of the key challenges in this area is maintaining long-term consistency and coherence in generated videos, which has been addressed through the development of novel frameworks and techniques. Notable papers in this area include InfLVG, which enables coherent long video generation without requiring additional long-form video data, and DanceTogether, which generates photorealistic videos of multi-person interactions while preserving individual identities. Other papers, such as ATI and MAGREF, have proposed innovative approaches to controllable video generation and editing, allowing for more precise control over the generated content. Overall, the field is moving towards more sophisticated and controllable video generation and editing capabilities, with potential applications in areas such as digital production, simulation, and embodied intelligence. Noteworthy papers include InfLVG, which achieves strong consistency and semantic fidelity across scenes, and DanceTogether, which outperforms prior arts by a significant margin on the TogetherVideoBench benchmark.