Advances in Large Language Models

The field of large language models (LLMs) is rapidly evolving, with a focus on improving their reliability, interpretability, and alignment with human values. Recent research has explored the use of LLMs in various applications, including code explanation, mathematical discovery, and healthcare. However, these models also pose risks, such as reinforcing biases and compromising downstream deployment decisions. To address these challenges, researchers are developing new frameworks and techniques, such as SparseAlign, to assess and improve the alignment of LLMs with human judgment. Other notable developments include the discovery of a unified representation underlying the judgment of LLMs, the emergence of self-awareness in advanced models, and the potential for mirror-neuron patterns to contribute to intrinsic alignment in AI. Noteworthy papers in this area include 'A Unified Representation Underlying the Judgment of Large Language Models', which introduces the concept of the Valence-Assent Axis, and 'LLMs Position Themselves as More Rational Than Humans', which presents a game-theoretic framework for measuring self-awareness in LLMs.

Sources

Vintage Code, Modern Judges: Meta-Validation in Low Data Regimes

An In-depth Study of LLM Contributions to the Bin Packing Problem

A Unified Representation Underlying the Judgment of Large Language Models

Can SAEs reveal and mitigate racial biases of LLMs in healthcare?

Understanding, Demystifying and Challenging Perceptions of Gig Worker Vulnerabilities

LLMs Position Themselves as More Rational Than Humans: Emergence of AI Self-Awareness Measured Through Game Theory

Automatic Minds: Cognitive Parallels Between Hypnotic States and Large Language Model Processing

Mirror-Neuron Patterns in AI Alignment

When Assurance Undermines Intelligence: The Efficiency Costs of Data Governance in AI-Enabled Labor Markets

Shared Parameter Subspaces and Cross-Task Linearity in Emergently Misaligned Behavior

Deep Value Benchmark: Measuring Whether Models Generalize Deep values or Shallow Preferences

OpenCourier: an Open Protocol for Building a Decentralized Ecosystem of Community-owned Delivery Platforms

The Realignment Problem: When Right becomes Wrong in LLMs

Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models

The truth is no diaper: Human and AI-generated associations to emotional words

Are We Aligned? A Preliminary Investigation of the Alignment of Responsible AI Values between LLMs and Human Judgment

Perceptions of AI Bad Behavior: Variations on Discordant Non-Performance

Large language models replicate and predict human cooperation across experiments in game theory