The field of 3D understanding and generation is moving towards more unified and integrated approaches, leveraging geometric-semantic encoding and multi-modal coding to enhance spatial understanding and generation. Researchers are exploring the use of large language models (LLMs) and latent diffusion models to generate high-quality 3D representations and assets. Noteworthy papers in this area include UniUGG, which proposes a unified framework for 3D understanding and generation, and MeshCoder, which introduces a novel framework for reconstructing 3D objects from point clouds into editable programs. Additionally, SceneGen presents a novel framework for single-image 3D scene generation in one feedforward pass, and Collaborative Multi-Modal Coding for High-Quality 3D Generation demonstrates the effectiveness of multi-modal coding for 3D modeling.