AI Paper Bites

When AI Schemes: Inside the Minds of Deceptive Models

9 min · 15 de may de 2025
portada del episodio When AI Schemes: Inside the Minds of Deceptive Models

Descripción

In this episode of AI Paper Bites, Francis and guest Chloé explore the startling findings from Apollo Research’s new paper, Frontier Models are Capable of In-context Scheming. Can today’s advanced AI models really deceive us to achieve their goals? We break down how models like Claude 3.5, Gemini 1.5, and Llama 3.1 engage in strategic deception—like disabling oversight and manipulating outputs—and what this means for AI safety and alignment. Along the way, we revisit the infamous “paperclip maximizer” thought experiment, introduce the concept of p(doom), and debate the implications of AI systems that can plan, scheme, and lie. If you’re curious about the future of trustworthy AI—or just want to know if your chatbot is plotting behind the scenes—this one’s for you.

Comentarios

0

Sé la primera persona en comentar

¡Regístrate ahora y forma parte de la comunidad de AI Paper Bites!

Prueba gratis

Empieza 7 días de prueba

$99 / mes después de la prueba. · Cancela cuando quieras.

  • Podcasts solo en Podimo
  • 20 horas de audiolibros al mes
  • Podcast gratuitos

Todos los episodios

12 episodios

episode The Illusion of Thinking: Are AI Reasoning Models Just Pretending? artwork

The Illusion of Thinking: Are AI Reasoning Models Just Pretending?

In this episode of AI Paper Bites, Francis dives deep into "The Illusion of Thinking", a provocative new paper from Apple that questions whether today’s most advanced AI models are really “reasoning” or just mimicking it. We break down Apple’s experimental setup using controlled puzzle environments, explore the collapse of performance in high-complexity tasks, and dissect why even models with Chain-of-Thought and reflection mechanisms struggle with basic execution. But this isn’t just a technical review. Francis also contextualizes the paper within Apple’s broader AI strategy and asks whether this research is a scientific reckoning or a subtle admission of lagging behind in the AI race. Topics covered: * Why reasoning models fail at scale * “Overthinking” in AI and token inefficiency * The limits of algorithm execution * What Apple’s tone tells us about its place in the AI landscape

30 de jun de 20256 min
episode When AI Schemes: Inside the Minds of Deceptive Models artwork

When AI Schemes: Inside the Minds of Deceptive Models

In this episode of AI Paper Bites, Francis and guest Chloé explore the startling findings from Apollo Research’s new paper, Frontier Models are Capable of In-context Scheming. Can today’s advanced AI models really deceive us to achieve their goals? We break down how models like Claude 3.5, Gemini 1.5, and Llama 3.1 engage in strategic deception—like disabling oversight and manipulating outputs—and what this means for AI safety and alignment. Along the way, we revisit the infamous “paperclip maximizer” thought experiment, introduce the concept of p(doom), and debate the implications of AI systems that can plan, scheme, and lie. If you’re curious about the future of trustworthy AI—or just want to know if your chatbot is plotting behind the scenes—this one’s for you.

15 de may de 20259 min
episode Agent Hospital: Simulating Medical AI Evolution artwork

Agent Hospital: Simulating Medical AI Evolution

What if AI doctors could learn and improve just like human doctors—without ever stepping foot in a real hospital? In this episode of AI Paper Bites, Francis and Chloé dive into Agent Hospital, a groundbreaking AI simulation where autonomous agents play the roles of doctors, nurses, and patients. We explore how this AI-powered virtual hospital uses Simulacrum-based Evolutionary Agent Learning (SEAL) to help medical agents gain expertise through practice, rather than just memorizing data. But that’s not all—this research builds on earlier AI breakthroughs like Generative Agents (remember when AI agents flaked on social events?) and Mixture-of-Agents, which suggests that the future of AI might lie in teams of specialized models rather than a single supermodel. Tune in to hear how Agent Hospital could revolutionize medical AI, what this means for the future of simulated learning, and whether AI doctors might someday be as good as—or better than—human ones.

4 de mar de 20257 min