The field of large language models (LLMs) is rapidly advancing, with a focus on improving their ability to simulate human personality, behavior, and decision-making. Recent research has explored the use of LLMs in social science experiments, evaluating their capacity to emulate human personality in virtual persona role-playing. Another area of focus is on developing more robust and consistent LLMs that can provide factual and reliable information, regardless of user context or personalization.
Noteworthy papers in this area include: Scaling Law in LLM Simulated Personality, which proposes a systematic framework for evaluating LLM virtual personality and identifies a scaling law in LLM personality simulation. ConsistencyAI introduces a benchmark for measuring the factual consistency of LLMs for different personas, offering impartial evaluation and accountability. Beyond One World benchmark evaluates character-grounded roleplay of LLMs across multiversal contexts, exposing critical gaps in multiversal consistency and reasoning alignment.