The field of 4D scene generation is moving towards creating more immersive audiovisual experiences. Researchers are focusing on generating spatial audio that is aligned with the corresponding 4D scenes, overcoming the limitation of existing methods that only achieve impressive visual performance. This is being achieved through novel frameworks that enable spatial audio generation, multi-plane synchronization, and comprehensive 4D scene generation. These advancements are enabling the creation of more realistic and coherent dynamic 4D scenes, with applications in content creation, scene exploration, and rapid prototyping. Notable papers in this area include Sonic4D, which generates realistic spatial audio consistent with synthesized 4D scenes, and DreamCube, which introduces a multi-plane RGB-D diffusion model for 3D panorama generation. Additionally, CoCo4D and DreamAnywhere demonstrate significant improvements in 4D scene generation and object-centric panoramic 3D scene generation, respectively.