The field of generative models and architectural design is witnessing significant advancements, driven by innovations in autoregressive models, neural cellular automata, and transformer-based architectures. Researchers are exploring new paradigms, such as continuous latent space modeling and spatial-aware decay mechanisms, to improve the efficiency and quality of image and text generation. Notably, the development of novel frameworks like DisCon and Hita is enabling more effective capture of holistic relationships among token sequences and global image properties. These advancements have the potential to transform various applications, including intelligent architectural design, escape room puzzle generation, and high-resolution image synthesis.
Some noteworthy papers in this area include: RoomCraft, which proposes a multi-stage pipeline for generating coherent 3D indoor scenes from user inputs, demonstrating significant improvements in generating realistic and visually appealing room layouts. Neural Cellular Automata: From Cells to Pixels, which overcomes the limitation of low-resolution grids in neural cellular automata by pairing them with a tiny, shared implicit decoder, enabling the generation of full-HD outputs in real-time. Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation, which accelerates autoregressive image generation through flexible parallelized autoregressive modeling and locality-aware generation ordering, achieving at least 3.4x lower latency than previous parallelized autoregressive models.