The field of 3D world generation is rapidly advancing, with a focus on creating detailed, realistic, and scalable models. Recent developments have centered around the use of multimodal agent frameworks, which leverage diverse foundation tools to acquire real-world knowledge and construct complex 3D scenes. These frameworks have shown significant improvements in reality alignment, shape precision, texture fidelity, and aesthetics. Noteworthy papers include RAISECity, which achieves a 90% win-rate against existing baselines for overall perceptual quality, and MajutsuCity, which reduces layout FID by 83.7% compared to CityDreamer. Yo'City is also notable for its ability to generate personalized and boundless city-scale scenes through a top-down planning strategy and user-interactive expansion mechanism. FilmSceneDesigner has also made significant contributions by automating film set design through an agent-based chaining framework. Overall, these advancements have the potential to revolutionize applications in immersive media, embodied intelligence, and world models.