The field of image generation and editing is moving towards more controllable and context-aware methods. Recent developments have focused on improving the compositional control of text-to-image models, allowing for more precise and realistic generation of images with multiple objects and attributes. Additionally, there is a growing interest in developing methods that can remove objects and their associated visual artifacts from images, as well as edit images based on natural language instructions while ensuring contextual coherence. These advancements have the potential to enable more sophisticated and realistic image editing and generation capabilities. Noteworthy papers include: MaskAttn-SDXL, which proposes a region-level gating mechanism for improving compositional control in text-to-image models. GeoRemover, which introduces a geometry-aware framework for removing objects and their causal visual artifacts from images. CAMILA, which presents a context-aware method for image editing with language alignment. RITA, which reformulates image manipulation localization as a conditional sequence prediction task, providing a solid foundation for hierarchical localization.