Advances in Transformer Robustness and Multi-Task Learning

Recent research in the field of artificial intelligence has seen a significant focus on improving the robustness and multi-task learning capabilities of transformer models. The general direction of the field is moving towards developing methods that can mitigate shortcut learning behavior, leverage submodule linearity, and improve task arithmetic performance. Researchers are also exploring the theoretical foundations of task vector methods and their generalization guarantees. Furthermore, there is a growing interest in model merging and editing techniques that can enable multi-task capabilities and improve the efficiency of deep learning models. Notable papers in this area include: MiMu, which proposes a novel method to mitigate multiple shortcut learning behavior in transformers. Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs, which presents a statistical analysis showing that submodules exhibit higher linearity than the overall model, and proposes a model merging strategy that leverages this property. When is Task Vector Provably Effective for Model Editing, which provides the first theoretical characterization of the generalization guarantees of task vector methods on nonlinear transformers.

Sources

MiMu: Mitigating Multiple Shortcut Learning Behavior of Transformers

Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers

Single-Input Multi-Output Model Merging: Leveraging Foundation Models for Dense Multi-Task Learning

Geometric Generality of Transformer-Based Gr\"obner Basis Computation

Sign-In to the Lottery: Reparameterizing Sparse Training From Scratch

Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?

Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers

All-in-One Transferring Image Compression from Human Perception to Multi-Machine Perception

Built with on top of