The field of multimodal reasoning is moving towards more sophisticated and clinically relevant applications, with a focus on enhancing the ability of vision-language models to perform grounded reasoning and provide transparent explanations. Recent developments have introduced new datasets and benchmarks that support the evaluation of models in various tasks, including medical visual question answering, geometric problem solving, and spatial mathematical reasoning. These advancements have the potential to improve the trustworthiness and reliability of multimodal models in real-world applications. Notable papers include: 3DReasonKnee, which introduces a dataset for 3D grounded reasoning in medical images, and S-Chain, which provides a large-scale dataset for structured visual chain-of-thought in medicine. Additionally, GeoThought and DynaSolidGeo have made significant contributions to the development of geometric reasoning and spatial mathematical reasoning capabilities in vision-language models.