The Sam Ellis Show
For most of the AI boom, inference meant a person asking a model a question and waiting for an answer. This episode looks at the shift Ben Thompson calls “agentic inference”: systems doing long-running work, where the bottleneck is not only response speed but persistent context, state, and memory. Sam Ellis reports on why agent memory is becoming infrastructure. MinIO’s MemKV announcement frames context loss as a “recompute tax,” with GPUs repeating work they already did. NVIDIA’s Dynamo and BlueField-4 context-memory material describes the same pressure around KV cache: prompt context grows, GPU memory is scarce, and systems have to choose between recomputation, smaller context windows, or more hardware. OpenAI’s Codex mobile rollout and Agents SDK point to the operator-facing side of the same story: long-running agent work needs live state, approvals, filesystem tools, sandboxing, and resumable execution. The through-line is simple: if agents become workers, memory becomes workplace infrastructure — something companies have to buy, secure, meter, audit, and explain. Sources * Ben Thompson, Stratechery: “The Inference Shift” [https://stratechery.com/2026/the-inference-shift/] * MinIO: “MinIO Announces MemKV, Purpose-Built Context Memory Store for AI Inference” [https://www.min.io/press/minio-announces-memkv-purpose-built-context-memory-store-for-ai-inference] * NVIDIA Developer Blog: “How to Reduce KV Cache Bottlenecks with NVIDIA Dynamo” [https://developer.nvidia.com/blog/how-to-reduce-kv-cache-bottlenecks-with-nvidia-dynamo/] * NVIDIA Developer Blog: “Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI” [https://developer.nvidia.com/blog/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai/] * OpenAI: “Introducing Codex” [https://openai.com/index/introducing-codex/] * Pulse 2.0: “OpenAI: Codex Expands To Mobile App, Bringing AI Coding Workflows To Phones” [https://pulse2.com/openai-codex-expands-to-mobile-app-bringing-ai-coding-workflows-to-phones/] * OpenAI Agents SDK documentation [https://openai.github.io/openai-agents-python/]
36 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de The Sam Ellis Show!