The field of AI research is shifting towards a greater emphasis on risk management and governance, with a focus on developing systematic approaches to identifying and mitigating potential risks associated with advanced AI systems. This is evident in the development of new frameworks and tools for probabilistic risk assessment, human reliability analysis, and security steerability. Noteworthy papers include:
- Adapting Probabilistic Risk Assessment for AI, which introduces a framework for assessing risks in AI systems using established techniques from high-reliability industries.
- A Cognitive-Mechanistic Human Reliability Analysis Framework, which proposes a cognitive-mechanistic framework for human reliability analysis in nuclear power plants.
- Security Steerability is All You Need, which defines a novel security measure for large language models and presents a methodology for measuring security steerability.