The state of the art in AI model jailbreaks

52 min · 16 jun 2026

Beschrijving

In this solo podcast episode, James Wilson breaks down the current state of AI model jailbreaks. If you’ve somehow missed the story, last week Anthropic released its Fable 5 and Mythos 5 models to the public. In the name of safety, both models were guardrailed up the wazoo, but that didn’t stop a bunch of jailbreakers from figuring out how to bypass at least some of their safety restrictions. In response to these guardrail bypasses the White House issued an export control directive on the models, citing national security concerns. But was the Trump administration right to do this? Do these jailbreaks represent a threat to the security of the USA, or was the export restriction overkill? Tune in to find out! SHOW NOTES * Pliny the Elder on Fable 5 Jailbreak [https://x.com/elder_plinius/status/2064776322979676227] * whoJumper's response to Pliny [https://x.com/whojumpr/status/2065413811184496894] * ConfusedPilot: Confused Deputy Risks in RAG-based LLMs [https://arxiv.org/abs/2408.04870]

Reacties

Wees de eerste die een reactie plaatst

Meld je nu aan en word lid van de Risky Business Features community!

Probeer gratis

Alle afleveringen

29 afleveringen

Pitching security startups to VCs in the AI era

In this podcast Patrick Gray and James Wilson chat with Decibel Partners founder and Managing Partner Jon Sokoda to talk about pitching cybersecurity startups to VC firms in the AI age. Coding agents and large language models have made it easier than ever to create software products, but despite this, the bar for what interests an investor is still largely the same. Everyone can run the marathon, but it’s usually the same few folks who finish first. So tune in to hear Jon share with us his wisdom on when to start the conversation with investors, how to leverage the experience of the founder community, and what founders should watch out for. This episode is also available on YouTube [https://youtu.be/a4QGc1wmrbw] SHOW NOTES

23 jun 202635 min

How using open weight models can blow up in your face

In this podcast episode James Wilson and Brad Arkin talk about how to safely use open weight large language models in the enterprise. The cost of frontier models was already driving interest in freely available open weight models like DeepSeek, Kimi and Qwen. But now the US government is forcing Anthropic to pull its Fable and Mythors models from the market, the argument for having greater control over your own AI stack is stronger than ever. But as you’ll hear in this episode, the model itself is just one component of the complex tech stack you’ll need to spin up if you want local inference. There’s a lot of moving parts, each of which comes with its own supply chain risks. So whether you’re hosting these models on your own hardware or via a SaaS provider, there’s a lot to ponder! SHOW NOTES

19 jun 202643 min

The state of the art in AI model jailbreaks

16 jun 202652 min

Why NPM v12 won’t stop supply chain attacks

In this podcast episode, James Wilson is joined by Open Source Malware Security co-founder Paul McCarty to talk about the supply chain attack mitigations coming in NPM v12. NPM disabling (by default) auto-run install scripts and dynamic dependencies is a positive step forward… but it’ll take years for this new version to be adopted, and these changes do nothing to prevent malicious packages being imported into projects. Further, Paul thinks disabling these features by default will introduce friction that will cause them to be re-enabled. When the choice is “this builds” and “this is less prone to malware”, the former will always win. SHOW NOTES

12 jun 202638 min

Everything is getting much worse, much faster

In this podcast Brad Arkin joins James Wilson to talk about how the fear of being left behind in the AI era means enterprises are taking risks that would have been considered insane just a couple of years ago. Fears around outages or being hacked have been trumped by fears of being labelled an AI laggard. So where are we all going? Say hello to tech debt-riddled, vibe-coded apps, crazy dependencies on AI providers, and an emerging threat landscape that can’t be mitigated by a contemporary SOC. Sounds like fun, eh? SHOW NOTES

5 jun 202623 min

The state of the art in AI model jailbreaks

Beschrijving

Reacties

Probeer 14 dagen gratis

Alle afleveringen