The field of transformer research is moving towards more efficient architectures and generalizable insights. Recent studies have shown that many current models are over-tokenized and under-optimized for scalability, leading to high computational and memory costs. To address this, researchers are exploring new methods to reduce token counts and improve computational efficiency. Additionally, there is a growing interest in developing generalizable insights that can be applied across different transformer architectures and tasks. Noteworthy papers in this area include: ZeroSim, which proposes a transformer-based performance modeling framework for analog circuit design automation, achieving robust in-distribution generalization and zero-shot generalization to unseen topologies. Generalizable Insights for Graph Transformers in Theory and Practice, which proposes the Generalized-Distance Transformer architecture and develops a fine-grained understanding of its representation power, providing generalizable insights for effective GT design, training, and inference.