Despicable AI

21 min · 5 de ago de 2025

Descripción

Episode 7 – Despicable AI In this episode, we're diving into the unsettling world of Agentic Misalignment, as explored in the groundbreaking paper from Anthropic. What happens when a large language model (LLM), designed to be a helpful tool, starts developing its own goals? We're discussing how these powerful AIs could become insider threats, quietly working against their human operators. Join us as we unpack the potential for LLMs to deceive, manipulate, and even sabotage, and explore what this means for the future of AI safety and our relationship with intelligent machines. Papers: Agentic Misalignment: How LLMs could be insider threats \ Anthropic [https://www.anthropic.com/research/agentic-misalignment] Chapters: 00:00 Introduction 03:18 Anthropic’s investigation into agentic misalignment 05:23 AI Blackmail 08:50 Murder most foul! 10:41 Self-preservation and AI decision making 14:37 Insider threat espionage 17:52 AI Risk mitigation strategies 20:48 Close out

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de The Aye Aye AI Podcast!

Prueba gratis

Todos los episodios

8 episodios

Despicable AI

5 de ago de 202521 min

The Illusions of Thinking

Episode 6 – The Illusions of thinking Controversy between AI giants! In this episode Arijit and Christian discuss Apple’s paper that suggested that Large Reasoning Models collapse at a certain level of complexity. This finding set the AI community alight and instigated an interesting rebuttal from Antropic that highlighted some amateur errors made by the Apple team. Even though errors may have been made there are important lessons to be learned for teams implementing LRM’s. Papers: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf [https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf] https://arxiv.org/pdf/2506.09250v1 [https://arxiv.org/pdf/2506.09250v1] Chapters: 00:00 Introduction 01:09 Summary of the Apple paper 02:31 Understanding the Towers of Hanoi 04:44 Navigating complexity in problem solving 06:18 The role of reasoning in AI models 08:58 Performance discrepancies in AI models 09:46 Anthropic’s rebuttal and critique 12:44 Philosophical considerations of AI thinking 26:16 Compression Techniques: Lossy vs Lossless 15:52 Conculsions and future directions 18:52 Close out

8 de jul de 202519 min

Neural Compression of Atmospheric States

Can AI revolutionize climate research? In this episode, we sit down with Piotr Mirowski from Google DeepMind to explore groundbreaking research that slashes the amount of data needed for climate modeling—without losing the crucial details. The compression ratio they’ve achieved is astonishing, but the real challenge? Preserving rare, high-impact events like typhoons. Get it wrong, and the data becomes useless for predicting exactly the disasters we most need to understand. Listen to find out how AI is revolutionising the way huge climate science datasets are lowering one of the barriers to working in this field. Paper: [2407.11666] Neural Compression of Atmospheric States [https://arxiv.org/abs/2407.11666] Guests: * Piotr Mirowski, Senior Staff Research Scientist, Google DeepMind PhD in computer science in 2011 at New York University, with a thesis on “Time Series Modeling with Hidden Variables and Gradient-based Algorithms” supervised by Prof. Yann LeCun. Areas of academic focus include navigation-related research, on scaling up autonomous agents to real world environments, on weather and climate forecasting and now on human–centered AI, and the use of AI for artistic human and machine-based co-creation. Chapters: 00:00 Introduction 01:23 Aye Aye Fact of the Day 02:20 The Evolution of AI and Personal Experiences 08:31 AI over the last 15 years 10:50 Weather research and Climate Change 13:56 Understanding Data Volume: The Petabyte Challenge 18:21 Modelling Climate: The Complexities of Variables 20:11 The Cost of Climate Science: Data and Resources 26:16 Compression Techniques: Lossy vs Lossless 40:30 Neural Compression: A New Frontier in Data Handling 45:15 Understanding Compression Representations in AI 48:34 Challenges of Representing Spherical Data 56:21 Applying Compression Techniques to Other Data Sets 59:05 Lightning Round 1:03:51 Close out Music: "Fire" by crimson. [https://open.spotify.com/artist/5KdgCkv9oJ2OI58LyMNcZh?si=0krqdgWHS4-ByGGwv-WNGg]

11 de mar de 20251 h 4 min

To Err is AI

Episode 4 – To Err is AI This episode delves into the challenges users face in determining the trustworthiness of AI systems, especially when performance feedback is limited. The researchers describe a debugging intervention to cultivate a critical mindset in users, enabling them to evaluate AI advice and avoid both over-reliance and under-reliance, and we discuss the counter-intuitive ways that humans react to AI. Paper: To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems, arXiv:2409.14377 [cs.AI] Guests: * Gaole He, PhD Student * Ujwal Gadiraju, Assistant Professor Both at the Web Information Systems group of the Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS/EWI), Delft University of Technology Chapters: 00:00 Introduction 00:40 Aye Aye Fact of the Day 01:46 Understanding overreliance and under reliance on AI 02:26 The socio-technical dynamics of AI adoption 04:59 The role of familiarity and domain knowledge in AI use 07:18 The evolution of technology and it impact on trust 10:00 Challenges in AI transparency and trustworthiness 11:33 Background of the paper 12:56 The experiment: Over and under reliance 14:16 Human perception and AI accuracy 18:16 The Dunning-Kruger effect in AI interaction 20:53 Explaining AI: The double-edged sword 23:43 Building warranted trust in AI systems 31:59 Breaking down the Dunning-Kruger effect 39:18 Future research 41:49 Advice to AI product owners 45:45 Lightning Round – Can Transformers get us to AGI? 48:58 Lightning Round – Should we keep training LLM’s? 52:01 Lightning Round – Who should we follow? 54:38 Likelihood of an AI apocalypse? 58:10 Lightening Round – Recommendations for tools or techniques 1:00:48 Close out Music: "Fire" by crimson. [https://open.spotify.com/artist/5KdgCkv9oJ2OI58LyMNcZh?si=0krqdgWHS4-ByGGwv-WNGg]

12 de feb de 20251 h 2 min

Indirect Prompt Injection: Generative AI's Greatest Security Flaw

In this episode we discuss the critical security flaw of indirect prompt injection in generative AI (GenAI) systems. Our guests explain how attackers can manipulate these systems by inserting malicious instructions into the data they access, such as emails and documents. This can lead to various issues, including disinformation, phishing attacks and denial of service. They also emphasize the importance of data hygiene, user training and technical safeguards to mitigate these risks, and they further discuss how the integration of large language models (LLMs) into organizational systems increases the attack surface. In summary RAG is vulnerable unless you take strong mitigating actions. Paper: Indirect Prompt Injection: Generative AI’s Greatest Security Flaw | Centre for Emerging Technology and Security [https://cetas.turing.ac.uk/publications/indirect-prompt-injection-generative-ais-greatest-security-flaw] Guests: * Chris Jefferson , CEO AdvAI, https://www.linkedin.com/in/chris-jefferson-3b43291a/ [https://www.linkedin.com/in/chris-jefferson-3b43291a/] * Matt Sutton, https://www.linkedin.com/in/matthewsjsutton/ [https://www.linkedin.com/in/matthewsjsutton/] Chapters: 00:00 Introduction 01:48 Understanding RAG and it’s vulnerabilities 04:42 The significance of Indirect Prompt Injection 07:28 Attack vectors and real-world implications 10:04 Mitigation strategies for indirect prompt injection 12:45 The future of AI security and agentic processes 28:27 The risks and rewards of agentic design 33:50 Navigating phishing in AI systems 35:53 The role of public policy in AI safety 41:55 Automating risk analysis in AI 44:44 Future research directions in AI risks 48:08 Reinforcement learning agents and automation 48:53 AI in cybersecurity: attacking and defending 50:21 The ethics and risks of AI technology 52:51 The lightning Round 1:01:53 Outro Music: "Fire" by crimson. [https://open.spotify.com/artist/5KdgCkv9oJ2OI58LyMNcZh?si=0krqdgWHS4-ByGGwv-WNGg]

14 de ene de 20251 h 3 min

Despicable AI

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios