The field of video generation and editing is rapidly advancing, with a focus on improving the quality and controllability of generated videos. Recent developments have led to the creation of innovative frameworks and models that enable multi-shot video generation, controllable head swapping, and talking head synthesis. These advancements have the potential to revolutionize various applications, including film and video production, video games, and social media. Notably, researchers have made significant progress in addressing the challenges of identity consistency, attribute-level controllability, and spatial-temporal consistency in video generation. Furthermore, the development of novel benchmarks and evaluation metrics has facilitated the assessment of model performance and driven further innovation in the field. Some particularly noteworthy papers include: EchoShot, which achieves superior identity consistency and attribute-level controllability in multi-shot portrait video generation. Bind-Your-Avatar, which introduces a novel framework for multi-talking-character video generation and achieves state-of-the-art performance. XVerse, which allows for precise and independent control over subject identity and semantic attributes in text-to-image generation.