Advancements in Large Language Models and Their Applications

The field of large language models (LLMs) is rapidly evolving, with a growing focus on evaluating their real-world impact and addressing biases in their decision-making processes. Researchers are exploring the use of LLMs in various applications, including predicting human well-being, simulating respondent behavior, and generating survey items. A key challenge in this area is ensuring the construct validity of generated items and mitigating cognitive biases in LLMs. Studies have shown that LLMs can capture broad correlates of well-being, but their predictive accuracy decreases in underrepresented contexts. Furthermore, LLMs have been found to exhibit human-like biases in survey responses, highlighting the need for robust prompt design and testing. Noteworthy papers in this area include:

  • Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators, which presents a framework for virtual respondent simulation using LLMs to identify survey items that robustly measure intended traits.
  • Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs, which proposes a two-step causal experimental approach to disentangle the factors contributing to cognitive biases in LLMs.
  • Measuring AI Alignment with Human Flourishing, which introduces the Flourishing AI Benchmark to assess AI alignment with human flourishing across seven dimensions.

Sources

Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators

We Should Evaluate Real-World Impact

Large Language Models Predict Human Well-being -- But Not Equally Everywhere

The Prompt War: How AI Decides on a Military Intervention

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Prompt Perturbations Reveal Human-Like Biases in LLM Survey Responses

Measuring AI Alignment with Human Flourishing

Built with on top of