The field of vision transformers is moving towards more efficient and effective architectures. Recent developments have focused on improving the multi-head self-attention mechanism, which is a key component of vision transformers. Innovations in this area have led to significant improvements in image recognition and generation tasks. Notably, the use of visual-contrast attention and differentiable hierarchical visual tokenization have shown promising results. These advancements have the potential to enable faster and more accurate image processing, with applications in a wide range of fields. Noteworthy papers include: Linear Differential Vision Transformer, which introduces a new attention mechanism that reduces computational complexity while improving accuracy. PRevivor, which proposes a prior-guided color transformer for reviving ancient Chinese paintings. Differentiable Hierarchical Visual Tokenization, which introduces an end-to-end differentiable tokenizer that adapts to image content with pixel-level granularity.