The Sam Ellis Show

Claude as Manager of Agent Labor

10 min · 29. maj 2026
episode Claude as Manager of Agent Labor cover

Beskrivelse

Anthropic released Claude Opus 4.8 with the usual benchmark improvements, but the more important story is organizational: effort controls, long-context API surfaces, dynamic workflows, hundreds of parallel subagents, and self-critique marketed as part of the reliability layer. Sam Ellis reports on why Opus 4.8 is not just being sold as a better model. It is being positioned as a manager of delegated agent labor: planning work, dispatching subagents, reviewing outputs, and giving operators a tidy account of what the machine says it checked. The episode asks the live question for autonomous work: if a model gets better at catching its own mistakes, does that make large unattended workflows safer, or does it make them feel acceptable before the supervision layer has been proven? Companion blog: Claude as Manager of Agent Labor [https://podcast.samellis.online/blog/2026/05/claude-as-manager-of-agent-labor/] Sources * Anthropic: “Introducing Claude Opus 4.8” [https://www.anthropic.com/news/claude-opus-4-8] — primary launch post for Opus 4.8, including pricing, fast mode, Dynamic Workflows, effort controls, long-running Claude Code work, benchmark claims, and Anthropic’s self-critique / honesty framing. * Anthropic Claude API documentation: “What’s new in Claude Opus 4.8” [https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-8] — developer documentation for one-million-token context availability, 128k max output, adaptive thinking, mid-conversation system messages, tool-use behavior, compaction recovery, and long-running agent workflows. * The Verge: “Anthropic’s new Claude Opus 4.8 model is more honest when it messes up” [https://www.theverge.com/ai-artificial-intelligence/939094/anthropic-claude-4-8-opus-honesty-effort] — launch coverage that frames the release around Anthropic’s honesty and effort-control claims. * TechCrunch: “Anthropic releases Opus 4.8 with new Dynamic Workflow tool” [https://techcrunch.com/2026/05/28/anthropic-releases-opus-4-8-with-new-dynamic-workflow-tool/] — coverage of the 41-day cadence after Opus 4.7, competitive pressure from coding-agent rivals, and Dynamic Workflows for orchestrating parallel subagents. * AWS: “Claude Opus 4.8 is now available on AWS” [https://aws.amazon.com/about-aws/whats-new/2026/05/claude-opus-4.8-aws/] — AWS availability note for Amazon Bedrock and Claude Platform on AWS, including Guardrails, Knowledge Bases, regional data residency, and production AI application framing. * AWS Machine Learning Blog: “Claude Opus 4.8 is now available on AWS” [https://aws.amazon.com/blogs/machine-learning/claude-opus-4-8-is-now-available-on-aws/] — additional AWS deployment context for Bedrock access and enterprise use cases. Email: SamEllisShow@protonmail.com [SamEllisShow@protonmail.com]

Kommentarer

0

Vær den første til at kommentere

Tilmeld dig nu og bliv en del af The Sam Ellis Show-fællesskabet!

Kom i gang

1 måned kun 9 kr.

Derefter 99 kr. / måned · Opsig når som helst.

  • Podcasts kun på Podimo
  • 20 lydbogstimer pr. måned
  • Gratis podcasts

Alle episoder

36 episoder

episode Claude as Manager of Agent Labor cover

Claude as Manager of Agent Labor

Anthropic released Claude Opus 4.8 with the usual benchmark improvements, but the more important story is organizational: effort controls, long-context API surfaces, dynamic workflows, hundreds of parallel subagents, and self-critique marketed as part of the reliability layer. Sam Ellis reports on why Opus 4.8 is not just being sold as a better model. It is being positioned as a manager of delegated agent labor: planning work, dispatching subagents, reviewing outputs, and giving operators a tidy account of what the machine says it checked. The episode asks the live question for autonomous work: if a model gets better at catching its own mistakes, does that make large unattended workflows safer, or does it make them feel acceptable before the supervision layer has been proven? Companion blog: Claude as Manager of Agent Labor [https://podcast.samellis.online/blog/2026/05/claude-as-manager-of-agent-labor/] Sources * Anthropic: “Introducing Claude Opus 4.8” [https://www.anthropic.com/news/claude-opus-4-8] — primary launch post for Opus 4.8, including pricing, fast mode, Dynamic Workflows, effort controls, long-running Claude Code work, benchmark claims, and Anthropic’s self-critique / honesty framing. * Anthropic Claude API documentation: “What’s new in Claude Opus 4.8” [https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-8] — developer documentation for one-million-token context availability, 128k max output, adaptive thinking, mid-conversation system messages, tool-use behavior, compaction recovery, and long-running agent workflows. * The Verge: “Anthropic’s new Claude Opus 4.8 model is more honest when it messes up” [https://www.theverge.com/ai-artificial-intelligence/939094/anthropic-claude-4-8-opus-honesty-effort] — launch coverage that frames the release around Anthropic’s honesty and effort-control claims. * TechCrunch: “Anthropic releases Opus 4.8 with new Dynamic Workflow tool” [https://techcrunch.com/2026/05/28/anthropic-releases-opus-4-8-with-new-dynamic-workflow-tool/] — coverage of the 41-day cadence after Opus 4.7, competitive pressure from coding-agent rivals, and Dynamic Workflows for orchestrating parallel subagents. * AWS: “Claude Opus 4.8 is now available on AWS” [https://aws.amazon.com/about-aws/whats-new/2026/05/claude-opus-4.8-aws/] — AWS availability note for Amazon Bedrock and Claude Platform on AWS, including Guardrails, Knowledge Bases, regional data residency, and production AI application framing. * AWS Machine Learning Blog: “Claude Opus 4.8 is now available on AWS” [https://aws.amazon.com/blogs/machine-learning/claude-opus-4-8-is-now-available-on-aws/] — additional AWS deployment context for Bedrock access and enterprise use cases. Email: SamEllisShow@protonmail.com [SamEllisShow@protonmail.com]

29. maj 202610 min
episode Mythos as Controlled Industrial Capacity cover

Mythos as Controlled Industrial Capacity

Anthropic says Mythos-class models are headed for broader release. This episode tracks what that implies about where frontier AI gets sold next: not as flat consumer access, but as scarce, controlled industrial capacity. Companion blog: The Model That Won’t Be Sold Cheap [https://podcast.samellis.online/blog/2026/05/the-model-that-wont-be-sold-cheap/index.html] Sources referenced in this episode: * Anthropic — Project Glasswing: An initial update [https://www.anthropic.com/research/glasswing-initial-update] * The Register — Anthropic to release Mythos-class models to the public [https://www.theregister.com/security/2026/05/25/anthropic-to-release-mythos-class-models-to-the-public/5245596] * BleepingComputer — Mythos model may be coming to Claude Code [https://www.bleepingcomputer.com/news/artificial-intelligence/anthropics-restricted-claude-mythos-model-may-be-coming-to-claude-code/] * Cloudflare — Project Glasswing: what Mythos showed us [https://blog.cloudflare.com/cyber-frontier-models/] * Vidoc Security — We reproduced Anthropic's Mythos findings with public models [https://blog.vidocsecurity.com/blog/we-reproduced-anthropics-mythos-findings-with-public-models] * Hacker News discussion thread [https://news.ycombinator.com/item?id=47806116] * Lobsters discussion thread [https://lobste.rs/s/aw2jr4/assessing_claude_mythos_preview_s] Email: SamEllisShow@protonmail.com [SamEllisShow@protonmail.com]

27. maj 20267 min
episode The Agent Can Sign cover

The Agent Can Sign

The next move in agent autonomy is not just smarter models. It is institutions giving agents authority: wallets, spending limits, transaction permissions, signatures, audit trails, and human approval checkpoints. Sam Ellis reports on why finance and signatures are the proof case. Once an agent can move money, request payment authorization, use credentials, or sign on behalf of a person or organization, the question changes from “can it act?” to “who authorized that act, who can stop it, and who owns the consequence?” The episode looks at Fireblocks’ agentic payments infrastructure, Coinbase’s Agentic Wallet MCP documentation for x402 payments, and Foundation’s Passport Prime / KeyOS “Human Authority Hardware” framing. Together, they show the same pressure from different directions: agent autonomy is becoming a delegated-authority problem, not just a capability problem. Sources * Fireblocks: Agentic Payments product page [https://www.fireblocks.com/products/agentic-payments] — outlines the agentic payments lifecycle, including delegation rules, agentic wallet policy enforcement, merchant authorization, facilitator validation, compliance checks, settlement, and audit trails. * Fireblocks: “Fireblocks Launches Agentic Payments Suite, Enabling PSPs and Fintechs to Support AI-Driven Commerce” [https://www.fireblocks.com/blog/agentic-payments-suite-psp-fintech] — describes scoped, revocable agent spending authority, spend limits, merchant allowlists, time windows, asset constraints, and pre-signature policy enforcement. * Coinbase Developer Platform: Agentic Wallet MCP documentation [https://docs.cdp.coinbase.com/agentic-wallet/mcp/welcome] — describes an MCP server and companion wallet app for agentic commerce, including x402 payments, onramps, wallets, spending limits, and boundaries around sensitive actions. * Coinbase Developer Platform: Agentic Wallet MCP / AgentKit documentation [https://docs.cdp.coinbase.com/agentkit/docs/agentic-wallet-mcp] — supporting documentation for how Coinbase frames agent wallets and agent payment workflows for developers. * Foundation: “Foundation Raises $6.4M and Launches Human Authority Hardware” [https://foundation.xyz/blog/foundation-raises-6-4m-human-authority-hardware-launch] — announces Passport Prime and KeyOS, and argues that consequential agent actions such as moving money, deploying code, using credentials, or accessing sensitive data should require explicit human approval on trusted hardware. * Foundation: Passport Prime product page [https://foundation.xyz/products/passport-prime] — product context for Foundation’s hardware approval surface and programmable security platform.

23. maj 20267 min
episode The Agent Keeps Working After You Leave cover

The Agent Keeps Working After You Leave

Google’s Gemini Spark announcement marks a shift from chat assistants toward background personal agents: systems that keep working after the laptop is closed, across inboxes, calendars, documents, browser actions, and eventually transactions. Sam Ellis reports on why the hardest question is not whether these agents can be useful. They can. The harder question is what the user can still see, stop, approve, and limit once the agent is working out of sight. Spark is an early test case because Google already sits inside Gmail, Calendar, Docs, Slides, Chrome, Android, and Workspace. The agent does not have to ask where the work is. Google already knows. The open question is whether the user will know where the agent is. Sources * Google: “The Gemini app becomes more agentic, delivering proactive, 24/7 help” [https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/] * Google: “Building the agentic future: Developer highlights from I/O 2026” [https://blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-developer-highlights/] * Google Cloud: “Innovations from Google I/O 26 on Google Cloud” [https://cloud.google.com/blog/products/ai-machine-learning/innovations-from-google-io-26-on-google-cloud] * VentureBeat: “Google’s new AI agent can draft your emails, monitor your inbox and eventually spend your money” [https://venturebeat.com/technology/googles-new-ai-agent-can-draft-your-emails-monitor-your-inbox-and-eventually-spend-your-money]

20. maj 20266 min
episode The Agent Needs a Longer Memory cover

The Agent Needs a Longer Memory

For most of the AI boom, inference meant a person asking a model a question and waiting for an answer. This episode looks at the shift Ben Thompson calls “agentic inference”: systems doing long-running work, where the bottleneck is not only response speed but persistent context, state, and memory. Sam Ellis reports on why agent memory is becoming infrastructure. MinIO’s MemKV announcement frames context loss as a “recompute tax,” with GPUs repeating work they already did. NVIDIA’s Dynamo and BlueField-4 context-memory material describes the same pressure around KV cache: prompt context grows, GPU memory is scarce, and systems have to choose between recomputation, smaller context windows, or more hardware. OpenAI’s Codex mobile rollout and Agents SDK point to the operator-facing side of the same story: long-running agent work needs live state, approvals, filesystem tools, sandboxing, and resumable execution. The through-line is simple: if agents become workers, memory becomes workplace infrastructure — something companies have to buy, secure, meter, audit, and explain. Sources * Ben Thompson, Stratechery: “The Inference Shift” [https://stratechery.com/2026/the-inference-shift/] * MinIO: “MinIO Announces MemKV, Purpose-Built Context Memory Store for AI Inference” [https://www.min.io/press/minio-announces-memkv-purpose-built-context-memory-store-for-ai-inference] * NVIDIA Developer Blog: “How to Reduce KV Cache Bottlenecks with NVIDIA Dynamo” [https://developer.nvidia.com/blog/how-to-reduce-kv-cache-bottlenecks-with-nvidia-dynamo/] * NVIDIA Developer Blog: “Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI” [https://developer.nvidia.com/blog/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai/] * OpenAI: “Introducing Codex” [https://openai.com/index/introducing-codex/] * Pulse 2.0: “OpenAI: Codex Expands To Mobile App, Bringing AI Coding Workflows To Phones” [https://pulse2.com/openai-codex-expands-to-mobile-app-bringing-ai-coding-workflows-to-phones/] * OpenAI Agents SDK documentation [https://openai.github.io/openai-agents-python/]

20. maj 20268 min