The field of generative AI and large language models is rapidly evolving, with a focus on understanding public trust, developing new methodologies for qualitative synthesis, and improving the reliability of AI systems. Researchers are exploring the use of large language models for tasks such as thematic analysis, knowledge construction, and content moderation, and are developing new benchmarks and evaluation frameworks to assess their performance. Notably, studies have shown that large language models can effectively capture nuanced indicators of knowledge construction in informal digital learning environments, and can be used to develop scalable and theory-informed approaches to discourse analysis. However, there are also risks associated with the naive application of large language models, such as the potential for inaccurate or misleading information.
Some noteworthy papers in this area include: The paper on DeTAILS, which introduces a toolkit that integrates large language model assistance into a workflow for thematic analysis, and demonstrates its feasibility and effectiveness in a study with 18 qualitative researchers. The paper on Benchmarking Reasoning Reliability in Artificial Intelligence Models for Energy-System Analysis, which introduces a reproducible framework for evaluating the reasoning reliability of large language models in energy system analysis, and demonstrates its application in a study with four frontier models.