How a Prompt Wrapper Lets a Frontier Model Play Poker Like an Expert

Descripción

HOW A PROMPT WRAPPER LETS A FRONTIER MODEL PLAY POKER LIKE AN EXPERT Source: PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers [https://arxiv.org/abs/2605.30094] Paper was published on May 28, 2026 This episode was AI-generated on May 29, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs. A frontier language model can recite poker theory flawlessly and still misread the cards in its own hand and lose catastrophically. This episode digs into a paper arguing the failure isn't a lack of intelligence but a 'decision-binding' problem — and shows how a deterministic wrapper, no training and no solver at decision time, cuts one model's losses by over 60%. KEY TAKEAWAYS * Why a model that aces a poker theory exam still gets crushed at the table — the 'decision-binding' problem of failing to apply the right principle to the right moment * How PokerSkill's three stages (a hallucination-proof context engine, situation-specific knowledge retrieval, and a depleting aggression/defense budget) wrap a model with no retraining * The counterintuitive finding that smarter, more reasoning-heavy models often play worse default poker, not better * The actual numbers: PokerSkill cuts GPT-5.5's loss rate by 57% and Claude Opus 4.6's by 61%, with all agents losing less to the benchmark than the 2018 champion bot Slumbot * Why the rules-alone ablation ties a raw frontier model — and what that says about where the real lift comes from * The honest caveats: every agent still loses, 'without solvers' really means 'without solvers at inference,' and the headline comparison is indirect, not a head-to-head win * 00:00 — The model that misreads its own hand Opens with a model confidently calling three-of-a-kind 'complete air,' framing the puzzle of why present knowledge can't be used. * 03:15 — Two paradigms and the gap between them Contrasts expensive solver-built bots like Libratus with weak rule-based engines, and sets up the paper's bet that an LLM and a rule system might cancel out each other's flaws. * 04:06 — The decision-binding problem Explains the core thesis — the model fails not from ignorance but from being unable to bind the one governing principle to a specific moment, like a student who freezes on an exam. * 09:45 — How PokerSkill works: context, retrieval, and budgets Walks through the three-stage architecture, including the depleting aggression/defense budget that quietly enforces coherent multi-street play. * 13:00 — A hand played in full Narrates a complete GPT-5.5 hand from five-four suited through a river bluff to make the budget system and retrieval audible street by street. * 16:16 — Does it actually work? The numbers Presents the loss-rate reductions, the Slumbot comparison, and the variance-reduction method that lets results come from a small sample. * 19:31 — Why smarter models played worse Unpacks the counterintuitive result that more reasoning depth hurt raw poker play, and what it implies about scaffolding versus raw intelligence. * 22:46 — The honest caveats Tyler pushes on the limits — it still loses, the single-opponent format, the absence of forward planning, and what 'without solvers' really means. * 26:01 — Beyond poker: a recipe for LLM agents Argues the decision-binding pattern generalizes to medicine, law, and negotiation, and rehabilitates rule-based AI as an interface rather than a competitor. RECOMMENDED READING * Toolformer: Language Models Can Teach Themselves to Use Tools [https://arxiv.org/abs/2302.04761] — A counterpoint on the same core problem — getting an LLM to bind the right external capability to the right moment — but via learned tool-calling rather than the deterministic context engine PokerSkill uses. * ReAct: Synergizing Reasoning and Acting in Language Models [https://arxiv.org/abs/2210.03629] — Directly relevant to the episode's 'scaffolding over smarter models' thesis, framing how reasoning and a bounded action space interleave in LLM agents. * Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [https://arxiv.org/abs/2005.11401] — The general framing behind PokerSkill's stage-two retrieval step, where situation-indexed knowledge is surfaced so the model only sees the slice that applies to the moment.

Finding Millions of Readable Concepts Inside a Real, Deployed AI Model

FINDING MILLIONS OF READABLE CONCEPTS INSIDE A REAL, DEPLOYED AI MODEL Source: Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet [https://arxiv.org/abs/2605.29358] Paper was published on May 28, 2026 This episode was AI-generated on May 29, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs. Researchers reached into Claude's internals, found the single thread that means 'Golden Gate Bridge,' and turned it up until the model believed it was the bridge. This episode unpacks the paper that proved interpretability works on a real commercial model — and is unusually honest about everything it still can't do. KEY TAKEAWAYS * Why individual neurons mean nothing, and how the 'superposition' idea — concepts as blended directions, like mixing paint — explains it * How sparse autoencoders un-mix those directions into millions of human-readable features, and how scaling laws turned 'how big a dictionary' into an engineering decision * The crucial difference between a feature that merely correlates with a concept (a thermometer) and one you can pull to change behavior (a thermostat) * Why the reasoning that actually mattered in the Kobe Bryant trivia chain was the seventieth-loudest signal — loudness and importance turn out to be different things * Why finding a 'deception' or 'bioweapon' feature is not an alarm bell, and what the authors say the real safety signal would be * Where the paper is weakest: no ground truth, circular Claude-grades-Claude evaluation, off-distribution steering, cherry-picked reasoning chains, and dictionaries that miss most of what's there * 00:00 — Golden Gate Claude and the question of where concepts live The opening demo sets up the central puzzle: what is a nameable 'thread' inside a pile of numbers, and why can't you just read it off the neurons? * 03:05 — Superposition and dictionary learning The paint-mixing intuition for why concepts are directions rather than neurons, and how sparse autoencoders recover those directions by reconstructing the model's state from a tiny handful of features. * 06:10 — From toy models to a real one Why scaling this to Claude 3 Sonnet — and deriving Chinchilla-style scaling laws to pick a 34-million-feature dictionary — was an existential test for the whole field. * 09:15 — Are the features real? Abstraction and causation Features that fire across languages and even images, the 'bug in code' detector, and the thermometer-versus-thermostat distinction that the paper's credibility rests on. * 12:20 — Watching the model reason: the Kobe Bryant chain How knocking out features one at a time revealed a causal hop from Kobe to Lakers to LA to California to Sacramento — and why the load-bearing features were buried deep in the noise. * 14:05 — The periodic-table finding How concept frequency predicts when a concept gets its own feature, why a one-in-a-billion concept needs a billion-feature dictionary, and how features split as the microscope gets sharper. * 18:30 — Safety-relevant features, carefully framed Deception, secrecy, hate, and self-concept features exist — but the authors argue the real question is when they fire, not that they exist, illustrated with honesty-lever and forced-screed demos. * 19:55 — Where the paper is weakest The authors' own reservations: no ground truth, the circular Claude-grades-Claude evaluation, the sensitivity gap, extreme off-distribution steering, cherry-picked chains, and demonstrably incomplete dictionaries. * 24:41 — What it actually settled The technique survived contact with a real model and made unsupervised, one-time-cost interpretability credible — while leaving the safety payoff an explicit aspiration rather than a result. RECOMMENDED READING * Toy Models of Superposition [https://arxiv.org/abs/2209.10652] — The earlier Anthropic work that introduced the superposition hypothesis the episode leans on—the paint-mixing intuition for why single neurons are polysemantic—but only on the toy models this paper had to prove scalable. * Towards Monosemanticity: Decomposing Language Models With Dictionary Learning [https://transformer-circuits.pub/2023/monosemantic-features/index.html] — The one-layer 'sandbox' study whose skeptical reception ('cute, but does it scale?') is the exact existential question this episode says the Sonnet paper was built to answer. * Training Compute-Optimal Large Language Models (Chinchilla) [https://arxiv.org/abs/2203.15556] — The scaling-law paper the episode name-checks as the template for deciding how big the 34-million-feature dictionary should be—turning a gamble into a curve you can read off. * Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task (Othello-GPT) [https://arxiv.org/abs/2210.13382] — The Othello cautionary tale the hosts cite—researchers assumed the wrong board representation—illustrating why the episode prizes unsupervised dictionary learning over hand-built detectors.

30 de may de 202627 min

How a Prompt Wrapper Lets a Frontier Model Play Poker Like an Expert

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios