The field of large language models (LLMs) is rapidly evolving, with a growing focus on evaluating their real-world impact and addressing biases in their decision-making processes. Researchers are exploring the use of LLMs in various applications, including predicting human well-being, simulating respondent behavior, and generating survey items. A key challenge in this area is ensuring the construct validity of generated items and mitigating cognitive biases in LLMs. Studies have shown that LLMs can capture broad correlates of well-being, but their predictive accuracy decreases in underrepresented contexts. Furthermore, LLMs have been found to exhibit human-like biases in survey responses, highlighting the need for robust prompt design and testing. Noteworthy papers in this area include:
- Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators, which presents a framework for virtual respondent simulation using LLMs to identify survey items that robustly measure intended traits.
- Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs, which proposes a two-step causal experimental approach to disentangle the factors contributing to cognitive biases in LLMs.
- Measuring AI Alignment with Human Flourishing, which introduces the Flourishing AI Benchmark to assess AI alignment with human flourishing across seven dimensions.