The field of multimodal generation is witnessing a significant shift towards incorporating stylistic elements, enabling more nuanced and context-dependent outputs. Researchers are exploring innovative approaches to stylize text and images, moving beyond traditional methods to capture the subtleties of human expression. A key direction is the development of models that can effectively align with specific styles, such as humor or romanticism, and generate content that reflects these styles. Another area of focus is the use of author-specific writing styles and fine-grained stylistic representations to improve the personalization and quality of generated content. Noteworthy papers include: OnomatoGen, which proposes a novel approach to onomatopoeia generation in manga, and Image Generation Based on Image Style Extraction, which introduces a three-stage training method for fine-grained controlled stylized image generation. These advancements have the potential to significantly enhance the capabilities of multimodal generation models and enable more sophisticated applications in fields like entertainment, education, and communication.