The field of artificial intelligence is moving towards developing more sophisticated moral reasoning capabilities in AI systems. Recent research has highlighted the importance of addressing biases in language models, such as Deontological Keyword Bias, which can significantly impact the normative judgments made by these models. To mitigate these biases, researchers are exploring innovative approaches, including the use of personalized oversight mechanisms, probabilistic aggregation, and targeted embedding optimization. These advancements aim to improve the alignment of AI systems with human values and ethics, enabling more reliable and safe decision-making. Notable papers in this area include: Personalized Constitutionally-Aligned Agentic Superego, which introduces a novel solution for aligning agentic AI behavior with diverse human values, and Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models, which proposes a framework for synthesizing multiple language models' moral judgments into a collectively formulated moral judgment.
Moral Reasoning in AI Systems
Sources
Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models
Personalized Constitutionally-Aligned Agentic Superego: Secure AI Behavior Aligned to Diverse Human Values