The field of embodied AI for robotics is rapidly advancing, with a focus on developing more sophisticated and autonomous systems. Recent research has explored the integration of vision-language models (VLMs) and large language models (LLMs) to improve robotic planning, manipulation, and interaction. One notable direction is the use of VLMs as formalizers for multimodal planning, enabling robots to reason about complex tasks and environments. Another area of research is the development of frameworks that combine VLMs and LLMs to enhance robotic perception, action generation, and decision-making. These advancements have the potential to significantly improve the capabilities of robots in real-world environments, enabling them to perform complex tasks with greater autonomy and efficiency. Noteworthy papers in this area include: Reinforced Embodied Planning with Verifiable Reward for Real-World Robotic Manipulation, which proposes a framework for empowering VLMs to generate and validate long-horizon manipulation plans from natural language instructions. LangGrasp, a novel language-interactive robotic grasping framework that leverages fine-tuned LLMs to deduce implicit intents from linguistic instructions and clarify task requirements.