The field of image editing and generation is moving towards more sophisticated and nuanced approaches, with a focus on capturing the emotional and aesthetic aspects of images. Recent developments have seen the integration of multimodal large language models (MLLMs) and vision-language models to enable precise emotional transformations and editing capabilities. These advancements have the potential to revolutionize the creative industries, allowing for more efficient and effective image editing and generation. Noteworthy papers in this area include Moodifier, which introduces a training-free editing model that leverages MLLMs to enable precise emotional transformations, and NoHumansRequired, which presents an automated pipeline for mining high-fidelity image editing triplets. ArtiMuse is also notable for its innovative MLLM-based image aesthetics assessment model with joint scoring and expert-level understanding capabilities.