Rethinking AI Alignment and Governance

The field of AI research is witnessing a significant shift in its approach to alignment and governance. Gone are the days of striving for complete alignment between human values and AI systems, as recent studies have shown that such a goal is mathematically unattainable. Instead, researchers are embracing the concept of 'neurodivergence' in AI systems, where a dynamic ecosystem of competing, partially aligned agents is fostered to mitigate risks. This approach is seen as a more viable path to ensuring that AI systems act in accordance with human values. Meanwhile, the development of AI systems that substantially outperform human experts in all cognitive domains and activities is likely to have catastrophic consequences, including human extinction, if left unchecked. To address this, researchers are exploring various governance scenarios, including the development of an 'Off Switch' to restrict dangerous AI development and deployment. In addition, there is a growing recognition of the importance of computational irreducibility and undecidability in the emergence of autonomy and agency in complex systems. This has significant implications for the design of autonomous AI systems and our understanding of consciousness and free will. Some researchers are also advocating for a shift away from the traditional 'one-size-fits-all' approach to AI alignment, towards a more nuanced approach that acknowledges and accommodates moral diversity and persistent disagreement. Noteworthy papers in this area include: Agentic Neurodivergence as a Contingent Solution to the AI Alignment Problem, which introduces the concept of neurodivergence as a contingent strategy to mitigate risks. Computational Irreducibility as the Foundation of Agency, which explores the emergence of autonomy and agency in complex systems. Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt, which proposes an alternative approach to AI alignment that prioritizes conflict management and moral diversity.

Sources

Agentic Neurodivergence as a Contingent Solution to the AI Alignment Problem

When Your Own Output Becomes Your Training Data: Noise-to-Meaning Loops and a Formal RSI Trigger

An alignment safety case sketch based on debate

AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions

Computational Irreducibility as the Foundation of Agency: A Formal Model Connecting Undecidability to Autonomous Behavior in Complex Systems

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt

Built with on top of