The field of 3D object detection and scene understanding is rapidly advancing, with a focus on improving the accuracy and efficiency of detection methods. Recent developments have seen the introduction of new techniques, such as frequency-aware positional depth embedding and cross-view scale-invariant depth prediction, which have shown significant improvements in detection performance. Additionally, there is a growing trend towards multi-task learning and the use of auxiliary tasks to improve detection accuracy. The integration of different sensor modalities, such as cameras and radar, is also being explored to enhance detection capabilities. Noteworthy papers in this area include: FreqPDE, which introduces a novel depth embedding method for 3D object detection transformers. CrossRay3D, which proposes a sparse multi-modal detector that preserves geometric structure and class distribution information. OOS-DSD, which advances out-of-stock detection in retail images using auxiliary tasks. M2H, which introduces a multi-task learning framework for monocular spatial perception. Towards 3D Objectness Learning in an Open World, which proposes a class-agnostic open-world 3D detector. SFGFusion, which introduces a surface fitting guided 3D object detection method with 4D radar and camera fusion.