The Aye Aye AI Podcast
Episode 7 – Despicable AI In this episode, we're diving into the unsettling world of Agentic Misalignment, as explored in the groundbreaking paper from Anthropic. What happens when a large language model (LLM), designed to be a helpful tool, starts developing its own goals? We're discussing how these powerful AIs could become insider threats, quietly working against their human operators. Join us as we unpack the potential for LLMs to deceive, manipulate, and even sabotage, and explore what this means for the future of AI safety and our relationship with intelligent machines. Papers: Agentic Misalignment: How LLMs could be insider threats \ Anthropic [https://www.anthropic.com/research/agentic-misalignment] Chapters: 00:00 Introduction 03:18 Anthropic’s investigation into agentic misalignment 05:23 AI Blackmail 08:50 Murder most foul! 10:41 Self-preservation and AI decision making 14:37 Insider threat espionage 17:52 AI Risk mitigation strategies 20:48 Close out
8 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de The Aye Aye AI Podcast!