The field of image generation and scene understanding is rapidly advancing, with a focus on improving the realism and consistency of generated images. Researchers are exploring new methods to incorporate intrinsic scene properties, such as depth and segmentation maps, into diffusion models to generate more spatially consistent and realistic images. Another area of focus is on improving the control over occlusion relationships between objects in an image, with novel algorithms that enable precise control over these relationships without requiring retraining or fine-tuning of the image diffusion model. Additionally, there is a growing interest in developing methods for visual in-context learning, which enables models to adapt to various tasks without requiring explicit updates to the model weights. Noteworthy papers in this area include: HiMat, which introduces a memory- and computation-efficient diffusion-based framework for generating high-resolution SVBRDFs. LaRender, which proposes a novel training-free image generation algorithm that precisely controls occlusion relationships between objects in an image. CObL, which introduces a diffusion-based architecture for inferring an occlusion-ordered stack of object layers from an image. Region-to-Region, which enhances generative image harmonization with adaptive regional injection. Stable Diffusion Models are Secretly Good at Visual In-Context Learning, which shows that off-the-shelf Stable Diffusion models can be repurposed for visual in-context learning. Towards Spatially Consistent Image Generation, which leverages intrinsic scene properties to generate more spatially consistent and realistic images.