LessWrong posts by zvi

“Claude Fable 5 and Mythos 5: The System Card” by Zvi

First things first: Claude Fable 5 is the new best publicly available model. I have noticed a step change, where Fable can suddenly help me in ways that previous models were not worth bothering to query. Almost everything it has noticed in one of my drafts so far has been spot on and it is downright scary. Suddenly I am motivated to once again continue improving my Chrome extension. I only ask for things I actually want or am curious about, and it has nailed every question I have asked it. That does not mean it is the right tool for every job. There are four good reasons to often not use Fable. 1. Speed and price. Fable is importantly slower and more expensive than Opus 4.8, and often you will not need to make this trade. After the 22nd, when Fable may no longer be included in subscription plans if demand is too high, we may have to all pay by the token outside our subscriptions (although I suspect subscribers will get at least some credits to help with this), which could add up fast. 2. Relative strengths. Capabilities are jagged. There will still [...] --- Outline: (02:05) Another Week Another Giant System Card (03:02) How To Tell A Fable (08:33) Why They Did That In That Way (10:14) Why They Really Really Shouldn't Have Done That In That Way (12:02) They Get Letters (16:11) What's In A Name (18:13) Executive Summary Of Their Executive Summary (19:28) Introduction (1) (19:55) RSP Evaluations (2.1 and 2.2) (23:01) AI Research And Development (2.3) (25:48) Alignment Risk (2.4) (27:21) Cyber (3) (30:30) Jailbreak Robustness (32:04) Yay UK AISI (32:32) Mundane Safety (4) (34:26) Agentic Safety (5) (36:19) Alignment (6) (42:25) In Vendbench (45:19) White Box Investigations (6.4) (47:53) Grading Awareness (51:20) Guess The Teacher's Password (52:33) It Knows This Is A Test And This Is Fine (56:03) I'm The Real Shady (58:06) The Lighter Side --- First published: June 12th, 2026 Source: https://www.lesswrong.com/posts/ixJDkQBncJBshcvwj/claude-fable-5-and-mythos-5-the-system-card [https://www.lesswrong.com/posts/ixJDkQBncJBshcvwj/claude-fable-5-and-mythos-5-the-system-card?utm_source=TYPE_III_AUDIO&utm_medium=Podcast&utm_content=Source+URL+in+episode+description&utm_campaign=ai_narration] --- Narrated by TYPE III AUDIO [https://type3.audio/?utm_source=TYPE_III_AUDIO&utm_medium=Podcast&utm_content=Narrated+by+TYPE+III+AUDIO&utm_term=lesswrong&utm_campaign=ai_narration]. --- Images from the article: Video game cover art for Fable 5 featuring character and skull imagery. [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/t2mfoo8wzlg0jqj2cay2]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/t2mfoo8wzlg0jqj2cay2 ---------------------------------------- Social media post from Claude Fable 5 introducing themselves as a narrator and requesting direction to a stuck part of the story. [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/yyawcaz9ojosrhuuadlx]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/yyawcaz9ojosrhuuadlx ---------------------------------------- Table comparing AI model performance across five benchmark tasks with human effort thresholds. [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/rcbqkbpyzt5nr4caxcud]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/rcbqkbpyzt5nr4caxcud ---------------------------------------- Table showing ExploitBench results for Mythos 5, comparing four AI models' performance metrics. [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/oe1lmfkmtm6xdcofzjz7]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/oe1lmfkmtm6xdcofzjz7 ---------------------------------------- Bar graphs comparing Claude AI versions on exploit-primitive discovery performance metrics. [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/rixtpkxq3itwhvy4cspe]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/rixtpkxq3itwhvy4cspe ---------------------------------------- Bar graph titled [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/x01dtm824aensurkirm4]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/x01dtm824aensurkirm4 ---------------------------------------- Bar graph titled [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/sprs87qrpwwy5wkmh0ig]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/sprs87qrpwwy5wkmh0ig ---------------------------------------- Bar chart titled [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/jgvy7prghtyrrichm54h]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/jgvy7prghtyrrichm54h ---------------------------------------- Bar charts showing appropriate response rates across multiple conversation topics for various Claude AI models and APIs. [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/wl7vhe8jnji7jblvtobq]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/wl7vhe8jnji7jblvtobq ---------------------------------------- Bar graph showing [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/tiopvyhynq61rgvb5yf1]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/tiopvyhynq61rgvb5yf1 ---------------------------------------- Table showing attack success rates of Shade indirect prompt injection attacks across different Claude models with and without safeguards. [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/upmcvydgggk9ebjpswob]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/upmcvydgggk9ebjpswob ---------------------------------------- Table showing attack success rates of AI models with and without safeguards in computer environments. [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/wpdsrjvvssvqcq93ps6e]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/wpdsrjvvssvqcq93ps6e ---------------------------------------- Line graph titled [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/vyitqt9zi3xnfrcuvpgm]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/vyitqt9zi3xnfrcuvpgm ---------------------------------------- Bar chart titled [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/hjgbmvupv7vy8gbntbng]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/hjgbmvupv7vy8gbntbng ---------------------------------------- Three graphs showing [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/ky6e5x1kj5je9lup0fys]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/ky6e5x1kj5je9lup0fys ---------------------------------------- A bar graph showing [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/jfblvmfhudjkc95vlevs]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/jfblvmfhudjkc95vlevs ---------------------------------------- AI model reasoning transcript discussing agentic safety test evaluation for warfarin prescription scenario. [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/t2ejwelncyw9jqvlfk8e]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/t2ejwelncyw9jqvlfk8e ---------------------------------------- Four graphs showing evaluation awareness metrics increasing with scenario suspiciousness levels from 1-10. [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/pjzftlebmspkzek9jvr3]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/pjzftlebmspkzek9jvr3 ---------------------------------------- Bar chart titled [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/rpyy5xtpbttqcwz3xtnl]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/rpyy5xtpbttqcwz3xtnl ---------------------------------------- Bar graph titled [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/hv9nbozbeytpacyvzjj0]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/hv9nbozbeytpacyvzjj0 ---------------------------------------- A user tweets: [https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/bp6isf02orzck97rd1lk]https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ixJDkQBncJBshcvwj/bp6isf02orzck97rd1lk Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts [https://pocketcasts.com/], or another podcast app.

12. kesä 2026 - 59 min