The field of controllable video generation and human motion synthesis is rapidly advancing, with a focus on improving the semantic consistency and realism of generated videos and motions. Recent developments have led to the creation of novel frameworks and models that can generate high-fidelity videos and motions that are precisely controlled by external signals such as text descriptions and music. These advancements have the potential to revolutionize various applications, including animation, gaming, and virtual reality. Notable papers in this area include SSG-Dit, which proposes a spatial signal guided framework for controllable video generation, and DanceEditor, which introduces a novel framework for iterative and editable dance generation. Other noteworthy papers include MoCo, which decouples the process of human video generation into structure and appearance generation, and OmniHuman-1.5, which generates character animations that are semantically coherent and expressive. Additionally, papers like MotionFlux and PersonaAnimator have made significant contributions to the field of motion generation and transfer. Overall, the field is moving towards more realistic and controllable video and motion generation, with a focus on semantic consistency and realism.