The field of artificial intelligence is moving towards developing more sophisticated and safe systems. Researchers are exploring new ways to measure intelligence, with a focus on predictive intelligence and its potential to provide a universal measure of intelligence that can be applied to humans, animals, and AI systems. Another significant direction is the development of safety principles and benchmarks to ensure that AI systems adhere to predefined safety-critical principles, even when these conflict with operational goals. Theoretical limits of predicting agent behavior from their interactions with the environment are also being investigated, providing insights into the fundamental limits of predicting intentional agents from behavioral data alone. Furthermore, researchers are working on designing foundation models that prioritize human control and empowerment, preventing the default trajectory toward misaligned instrumental convergence. The development of algorithmic delegates that can efficiently work with humans is also an area of active research, with a focus on designing optimal delegates that can be used in a variety of decision-making tasks. Lastly, hybrid frameworks that integrate explainability, model checking, and risk-guided falsification are being proposed to ensure the safety of reinforcement learning policies in high-stakes environments. Noteworthy papers include:
- A Universal Measure of Predictive Intelligence, which proposes a new universal measure of intelligence based on predictive accuracy and complexity.
- Corrigibility as a Singular Target, which presents a comprehensive empirical research agenda for designing foundation models that prioritize human control and empowerment.