Advances in In-Context Learning and Generalization in Large Language Models

The field of natural language processing is witnessing significant advancements in in-context learning and generalization capabilities of large language models. Recent studies have shown that these models can generalize to unseen tasks and out-of-distribution inputs, but the mechanisms underlying this generalization are not yet fully understood. Researchers are investigating the role of task diversity, task vectors, and attention mechanisms in facilitating in-context learning and generalization. Additionally, there is a growing interest in understanding the relationship between memorization and generalization in large language models, with some studies suggesting that these two capabilities may be intertwined. The development of new datasets and evaluation frameworks is also enabling researchers to tease apart different forms of memorization and generalization. Noteworthy papers in this area include:

  • A study on in-context learning that found task vectors to act as single in-context demonstrations formed through linear combinations of the original ones.
  • A paper on length generalization transfer that demonstrated the ability to extrapolate from shorter to longer inputs through task association.
  • A work on mitigating spurious correlations in large language models via causality-aware post-training, which showed that this approach can enhance the model's generalization ability.

Sources

When can in-context learning generalize out of task distribution?

A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition

Zero-Shot Event Causality Identification via Multi-source Evidence Fuzzy Aggregation with Large Language Models

Unable to forget: Proactive lnterference Reveals Working Memory Limits in LLMs Beyond Context Length

Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

Extrapolation by Association: Length Generalization Transfer in Transformers

Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods

Built with on top of