The field of robot control and motion synthesis is moving towards more flexible and generalizable approaches, with a focus on language-driven methods. Researchers are exploring the use of universal representations, such as pixel motion, to enable more efficient and effective control of robots. Additionally, there is a growing interest in developing more efficient and scalable models for sequence processing and motion synthesis, such as the Mamba architecture. Noteworthy papers in this area include the proposal of LangToMo, a vision-language-action framework that uses pixel motion forecasts as intermediate representations, and Dyadic Mamba, a novel approach for generating realistic dyadic human motion from text descriptions. Other notable works include the development of efficient pruning methods for Mamba models and the introduction of a benchmark for evaluating embodied world models.