The field of 3D human pose and shape estimation is rapidly advancing, with a focus on improving performance in challenging scenarios such as occlusions and complex human poses. Recent developments have led to the creation of new benchmark datasets, such as those that incorporate realistic occlusions, which are essential for training and evaluating methods in this area. Additionally, innovative approaches have been proposed to improve the generalization of lifting-based 3D human pose estimation methods, enabling them to perform better on unseen datasets. Cross-domain learning frameworks have also been developed to address long-horizon tasks in human-scene interaction, demonstrating significant improvements in task success rates and execution efficiency. Furthermore, researchers have explored the use of RGBD cameras for 3D human mesh estimation, leveraging the additional depth data to achieve accurate results. Noteworthy papers in this area include: VOccl3D, which introduces a novel benchmark dataset for 3D human pose and shape estimation under real occlusions, and AugLift, which proposes a simple yet effective reformulation of the standard lifting pipeline to improve generalization performance. DETACH is also notable for its cross-domain learning framework for long-horizon tasks, and M^3 for its masked autoencoder approach to 3D human mesh estimation from single-view RGBD images. Waymo-3DSkelMo and Human-in-Context are also significant contributions, providing a large-scale dataset for pedestrian interaction modeling and a unified cross-domain 3D human motion modeling approach, respectively.