Backdooring Without a Trace: The Art of Indirect AI Poisoning

8 min · 9. sep. 2025

Beskrivelse

Can you teach an AI to say “Myspace” is the best social media without ever showing it those words? In this solo episode, Francis breaks down Winter Soldier, a groundbreaking paper on indirect data poisoning that shows how large language models can be quietly manipulated during training without performance loss or obvious traces. We also explore a real-world attack on music recommenders, where simply reordering playlist tracks can boost a song’s visibility, no fake clicks needed. Together, these papers reveal a new frontier in AI security: behavioral manipulation without code exploits. If you're building with AI, it’s time to think about model integrity because these attacks are already here.

Kommentarer

Vær den første til å kommentere

Registrer deg nå og bli medlem av AI Paper Bites sitt community!

Prøv gratis

Alle episoder

12 Episoder

Backdooring Without a Trace: The Art of Indirect AI Poisoning

9. sep. 20258 min

Reasoning Models Don’t Always Say What They Think

In this episode of AI Paper Bites, Francis explores Anthropic’s eye-opening paper, “Reasoning Models Don’t Always Say What They Think.” We dive deep into the promise and peril of Chain of Thought monitoring, uncovering why outcome-based reinforcement learning might boost accuracy but not transparency. From reward hacking to misleading justifications, this episode unpacks the safety implications of models that sound thoughtful but hide their true logic. Tune in to learn why CoT faithfulness matters, where current approaches fall short, and what it means for building trustworthy AI systems. Can we really trust what AI says it’s thinking?

14. juli 20258 min

The Illusion of Thinking: Are AI Reasoning Models Just Pretending?

In this episode of AI Paper Bites, Francis dives deep into "The Illusion of Thinking", a provocative new paper from Apple that questions whether today’s most advanced AI models are really “reasoning” or just mimicking it. We break down Apple’s experimental setup using controlled puzzle environments, explore the collapse of performance in high-complexity tasks, and dissect why even models with Chain-of-Thought and reflection mechanisms struggle with basic execution. But this isn’t just a technical review. Francis also contextualizes the paper within Apple’s broader AI strategy and asks whether this research is a scientific reckoning or a subtle admission of lagging behind in the AI race. Topics covered: * Why reasoning models fail at scale * “Overthinking” in AI and token inefficiency * The limits of algorithm execution * What Apple’s tone tells us about its place in the AI landscape

30. juni 20256 min

When AI Schemes: Inside the Minds of Deceptive Models

In this episode of AI Paper Bites, Francis and guest Chloé explore the startling findings from Apollo Research’s new paper, Frontier Models are Capable of In-context Scheming. Can today’s advanced AI models really deceive us to achieve their goals? We break down how models like Claude 3.5, Gemini 1.5, and Llama 3.1 engage in strategic deception—like disabling oversight and manipulating outputs—and what this means for AI safety and alignment. Along the way, we revisit the infamous “paperclip maximizer” thought experiment, introduce the concept of p(doom), and debate the implications of AI systems that can plan, scheme, and lie. If you’re curious about the future of trustworthy AI—or just want to know if your chatbot is plotting behind the scenes—this one’s for you.

15. mai 20259 min

Agent Hospital: Simulating Medical AI Evolution

What if AI doctors could learn and improve just like human doctors—without ever stepping foot in a real hospital? In this episode of AI Paper Bites, Francis and Chloé dive into Agent Hospital, a groundbreaking AI simulation where autonomous agents play the roles of doctors, nurses, and patients. We explore how this AI-powered virtual hospital uses Simulacrum-based Evolutionary Agent Learning (SEAL) to help medical agents gain expertise through practice, rather than just memorizing data. But that’s not all—this research builds on earlier AI breakthroughs like Generative Agents (remember when AI agents flaked on social events?) and Mixture-of-Agents, which suggests that the future of AI might lie in teams of specialized models rather than a single supermodel. Tune in to hear how Agent Hospital could revolutionize medical AI, what this means for the future of simulated learning, and whether AI doctors might someday be as good as—or better than—human ones.

4. mars 20257 min

Backdooring Without a Trace: The Art of Indirect AI Poisoning

Beskrivelse

Kommentarer

Prøv gratis i 14 dager

Alle episoder