In this podcast episode, Kun Holtman delves into the topic of safety in artificial general intelligence (AGI). He explains that the main goal of AGI safety is to ensure that AGI systems are aligned with human values and do not pose any risks to humanity. Holtman focuses on the stop button problem or corrigibility problem, which is the challenge of making an AI obey human commands. He argues that all types of obedience are unsafe, but there are certain forms of obedience that are safer than others. Holtman proposes a mathematical approach to defining and achieving safe obedience in AGI systems. He also emphasizes the need for multiple safety layers and human oversight to ensure AGI safety. Holtman suggests that AGI systems should be treated as bureaucracies rather than minds, and the rules governing their behavior should be carefully designed to ensure safety. Although there are still challenges and residual failure modes to address in AGI safety, Holtman believes that a combination of technical measures and human oversight can help mitigate these risks. He also stresses the importance of ongoing research and regulation in the field of AGI safety.
We are joined by Koen Holtman, an independent AI researcher focusing on AI safety. Koen is the Founder of Holtman Systems Research, a research company based in the Netherlands.
Koen started the conversation with his take on an AI apocalypse in the coming years. He discussed the obedience problem with AI models and the safe form of obedience.
Koen explained the concept of Markov Decision Process (MDP) and how it is used to build machine learning models.
Koen spoke about the problem of AGIs not being able to allow changing their utility function after the model is deployed. He shared another alternative approach to solving the problem. He shared how to engineer AGI systems now and in the future safely. He also spoke about how to implement safety layers on AI models.
Koen discussed the ultimate goal of a safe AI system and how to check that an AI system is indeed safe. He discussed the intersection between large language Models (LLMs) and MDPs. He shared the key ingredients to scale the current AI implementations.