The field of 3D scene understanding and reconstruction is rapidly advancing, with a focus on developing more efficient and accurate methods for reconstructing complex scenes from various input data. Recent research has explored the use of novel deep learning architectures, such as sparse-convolutional backbones and video diffusion models, to improve the accuracy and robustness of 3D reconstruction algorithms. Additionally, there is a growing interest in developing methods that can handle challenging scenarios, such as occlusion and texture reconstruction, and that can generalize well to unseen environments. Notable papers in this area include TUN3D, which achieves state-of-the-art performance in joint layout estimation and 3D object detection, and UniVerse, which proposes a unified framework for robust reconstruction based on a video diffusion model. Other noteworthy papers include ReLumix, which enables flexible and scalable video relighting, and IPDRecon, which achieves superior reconstruction stability and view-invariant reconstruction.