Developments in Large Language Models and Ethics

The field of Large Language Models (LLMs) is rapidly advancing, with a growing focus on ethics and fairness. Recent research has highlighted the need for more reliable evaluation metrics and methods to detect subtle biases in LLM outputs. Several studies have proposed novel frameworks and datasets to assess the moral reasoning capabilities of LLMs, including the ability to recognize and explain hate speech, toxic content, and morally ambiguous scenarios. Additionally, researchers have explored the use of LLMs in real-world applications, such as organizational research and engineering education, and have identified potential gaps in current LLMs' ability to internalize and reflect human moral reasoning. Notably, some papers have shown that LLMs can exhibit human-like motivated reasoning and violent tendencies, which can be influenced by demographic factors and persona assignments. Overall, the field is moving towards a more nuanced understanding of the limitations and potential risks of LLMs, and the development of more robust and fair evaluation methods. Noteworthy papers include: PRISON, which proposed a unified framework to quantify LLMs' criminal potential across five dimensions. MFTCXplain, a multilingual benchmark dataset for evaluating the moral reasoning of LLMs via hate speech multi-hop explanation. Quantifying Fairness in LLMs Beyond Tokens, which introduced a novel statistical framework to evaluate group-level fairness in LLMs by detecting subtle semantic differences in long-form responses.

Sources

Reranking-based Generation for Unbiased Perspective Summarization

PRISON: Unmasking the Criminal Potential of Large Language Models

JETHICS: Japanese Ethics Understanding Evaluation Dataset

Generalizability of Media Frames: Corpus creation and analysis across countries

Advancing Harmful Content Detection in Organizational Research: Integrating Large Language Models with Elo Rating System

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective

MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Hate Speech Multi-hop Explanation

Reading Smiles: Proxy Bias in Foundation Models for Facial Emotion Recognition

Human-Aligned Faithfulness in Toxicity Explanations of LLMs

Canary in the Mine: An LLM Augmented Survey of Disciplinary Complaints to the Ordre des ing\'enieurs du Qu\'ebec (OIQ)

Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning

Uncovering Hidden Violent Tendencies in LLMs: A Demographic Analysis via Behavioral Vignettes

Built with on top of