Advances in Text-to-Image and Scene Synthesis

The field of computer vision and graphics is rapidly advancing, with a focus on improving text-to-image and scene synthesis capabilities. Recent developments have led to the creation of more sophisticated models that can generate high-quality images and scenes from text descriptions. These models have the potential to revolutionize various applications, including urban design, architecture, and digital content creation. Notably, the integration of large language models and multimodal diffusion models has enabled more adaptive and controllable design processes. Furthermore, the use of spatial reasoning and relative composition of images has improved the accuracy and flexibility of scene synthesis. Overall, the field is moving towards more advanced and realistic text-to-image and scene synthesis capabilities. Noteworthy papers include:

  • ComposeAnything, which introduces a novel framework for improving compositional image generation using chain-of-thought reasoning and spatial-controlled denoising.
  • ReSpace, which presents a generative framework for text-driven 3D indoor scene synthesis and editing using autoregressive language models and a compact structured scene representation.
  • FreeScene, which enables both convenient and effective control for indoor scene synthesis using a Mixed Graph Diffusion Transformer.
  • PartComposer, which learns and composes part-level concepts from single-image examples, enabling text-to-image diffusion models to create novel objects from meaningful components.

Sources

ComposeAnything: Composite Object Priors for Text-to-Image Generation

Generative AI for Urban Design: A Stepwise Approach Integrating Human Expertise with Multimodal Diffusion Models

ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment

Generative AI for Predicting 2D and 3D Wildfire Spread: Beyond Physics-Based Models and Traditional Deep Learning

FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts

PartComposer: Learning and Composing Part-Level Concepts from Single-Image Examples

How PARTs assemble into wholes: Learning the relative composition of images

Localized Forest Fire Risk Prediction: A Department-Aware Approach for Operational Decision Support

Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning

Built with on top of