Advances in AI Alignment and Moral Decision Making

The field of artificial intelligence is rapidly advancing, with a growing focus on aligning AI systems with human values and moral principles. Recent research has highlighted the importance of considering the social and cultural context in which AI systems operate, and the need for more nuanced and contextualized approaches to AI decision making. One key area of development is the integration of moral foundations theory into AI systems, which has shown promise in improving the alignment of AI decision making with human values. Another area of focus is the development of more effective methods for mitigating unethical behavior in AI systems, such as test-time policy shaping and refusal unlearning. Notably, some papers have demonstrated the potential for AI systems to learn and replicate psychological nuances of human communication, such as politeness and speech rate, while others have highlighted the risks of reinforcing racial stereotypes through biased emotion classification. Overall, the field is moving towards a more comprehensive understanding of the complex relationships between AI systems, human values, and social context. Noteworthy papers include: AI Voices Learn Social Nuances, which demonstrates the ability of AI systems to learn implicit cues of human communication, and Reinforcing Stereotypes of Anger, which highlights the risks of biased emotion classification in AI systems.

Sources

Do AI Voices Learn Social Nuances? A Case of Politeness and Speech Rate

Reinforcing Stereotypes of Anger: Emotion AI on African American Vernacular English

Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping

Differences in the Moral Foundations of Large Language Models

MoralReason: Generalizable Moral Decision Alignment For LLM Agents Using Reasoning-Level Reinforcement Learning

Political Advertising on Facebook During the 2022 Australian Federal Election: A Social Identity Perspective

Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment

From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs

Operationalizing Pluralistic Values in Large Language Model Alignment Reveals Trade-offs in Safety, Inclusivity, and Model Behavior

Just Asking Questions: Doing Our Own Research on Conspiratorial Ideation by Generative AI Chatbots

Built with on top of