The field of image editing and generation is rapidly advancing, with a focus on developing more precise and intuitive methods for modifying and creating images. Recent research has emphasized the importance of incorporating large language models and diffusion-based approaches to improve the accuracy and coherence of edited images. One of the key challenges being addressed is the ability to perform fine-grained control over object-level editing, allowing for more precise modifications while preserving visual coherence. Additionally, there is a growing interest in developing methods that can seamlessly integrate foreground objects with background scenes, producing realistic and harmonious fused images. Noteworthy papers in this area include POEM, which leverages multimodal large language models to enable precise object-level editing, and DreamFuse, which introduces an iterative human-in-the-loop data generation pipeline to generate consistent and harmonious fused images. SmartFreeEdit is also notable for its ability to perform mask-free spatial-aware image editing with complex instruction understanding, while Image-Editing Specialists presents a novel approach to training specialized instruction-based image-editing diffusion models. Complex-Edit provides a comprehensive benchmark for evaluating instruction-based image editing models across instructions of varying complexity.