Inference Just Got Cheaper. The Market Panicked.

9 min · 25. Juni 2026

Beschreibung

OpenAI unveiled its first custom chip the same week the market sold off on fears the AI buildout has gone too far. Stephen Forte argues those are the same story told from opposite ends — and that what looks like a bubble is closer to a re-pricing. In this episode: * OpenAI's "Jalapeno" chip — built with Broadcom, purpose-made for inference, roughly 50% more cost-efficient than standard AI GPUs in early tests, designed in nine months, deploying at gigawatt scale by year-end. * The selloff — Nasdaq off about 2.2%, Nvidia down roughly 4%, Alphabet's worst day in over a year, on AI-buildout cost fears, rate jitters, and a memory-chip wobble. * Why it is a re-pricing, not a bubble — the cost of inference has fallen about 10x a year for three years; software efficiencies like Mixture-of-Experts compound on hardware gains, so the buildout grows but not in a straight line. * What it means for operators — roughly 80% of workflows will run on small, local models inside your own network; only the highest-reasoning work needs the frontier cloud. * Anthropic's Claude Tag — an always-on Claude teammate in Slack, and a live example of the new workloads that cheaper inference unlocks. The YPO Technology Network AI Brief is a daily briefing on the AI news that matters to CEOs and senior operators, hosted by Stephen Forte.

Kommentare

Sei die erste Person, die kommentiert

Melde dich jetzt an und werde Teil der YPO Technology Network AI Brief-Community!

Loslegen

Alle Folgen

97 Folgen

Frontier AI Got Cheap, Open, and Chinese

The story of the year was supposed to be who controls AI. The real story this week: control and cost split in opposite directions, and your business lives in the gap. * The market already switched. US labs fell from 72% to 33% of model traffic on OpenRouter in a year; Chinese models now hold six of the top ten spots. One startup, Lindy, moved 100% of its traffic to DeepSeek. * The capability gap closed. Zhipu's open-weight GLM-5.2 landed within a point of Anthropic's Opus 4.8 on a key agentic benchmark, at roughly a fifth of the cost — and you can run it on your own servers. * The theft question. Anthropic alleges Alibaba ran ~25,000 fake accounts and 28.8 million Claude conversations to distill its models (Alibaba denies). Senators are now moving to attach a sanctions amendment to the NDAA. Host Stephen Forte on what model sovereignty means for your stack, your budget, and your leverage — and the two moves to make before your next budget review. Sources: CNBC; The Strategy Stack; Nate's Newsletter.

Gestern8 min

Graded by Clients, Cloned by Criminals

For two years, AI was an internal project you rolled out at your own pace. This week, two stories say that era is over: your clients are using AI to grade you, and criminals are using it to rob you. In this episode: * Graded by your clients. Thomson Reuters finds roughly $143 billion of professional-services revenue is under active reconsideration, with only 6% of clients satisfied that their providers deliver on AI and 78% calling it essential. The move: audit whether your clients can actually feel your AI, and arm your best people first. * Cloned by criminals. Deepfake CFO video calls are wiring real money out of real companies: one finance team sent $25.6 million after a call where every other participant was an AI fake. US deepfake-fraud losses tripled to $1.1 billion last year. The move: a one-page out-of-band verification rule for wire approvals. Hosted by Stephen Forte.

29. Juni 20268 min

Everybody's Building Their Own Stack

Three deals this week looked unrelated. They are the same deal. A chipmaker bought the software layer, a software giant built its own models, and the biggest model-maker built its own chip — and the strategic logic behind all three is identical. Stephen Forte connects them into one idea: everybody is building their own stack. In this episode: * Qualcomm buys Modular (~$3.9B) — why acquiring software that runs AI across any chip is an attack on Nvidia's real moat, the CUDA software lock-in. * Microsoft's MAI models — the largest backer of OpenAI quietly builds the capability to not need OpenAI, and what that says about vendor dependence. * The bull-vs-bear debate — Yann LeCun's warning that the economics cannot persist, given a fair hearing and a direct answer. * What's coming: Google Gemini 3.5 Pro — the expected 2-million-token context window explained in plain terms, and why "ask the AI about your entire business at once" is the real unlock. The YPO Technology Network AI Brief is a daily briefing on the AI news that matters to CEOs and senior operators, hosted by Stephen Forte.

26. Juni 20269 min

Inference Just Got Cheaper. The Market Panicked.

25. Juni 20269 min

From Pilot to Payroll

AI agents just crossed the line from demo to deployment — and that changes what a CEO has to decide this year. The pilot era is ending; the question shifts from "should we try AI" to "how do we deploy agents to everyone, and who supervises them." In this episode, Stephen Forte covers: * The deployment proof — Samsung is rolling out ChatGPT Enterprise and OpenAI's Codex coding agent to every employee in Korea and across its global Device eXperience division. When a 250,000-employee manufacturer goes company-wide, the "are these things real" debate is over. * Agents doing real work — Cognition's Devin is an autonomous software-engineer agent reportedly doing ~$492M of real engineering work a year. The valuation is the least interesting number; the adoption is the story. The org-design question: what work do you hand to an agent, and who reviews it. * Deploy without getting burned — Sakana's Fugu shows the smart pattern: route across many models as one, so you're never locked to a single vendor. The cautionary tale: Claude Fable 5 was pulled offline globally in 90 minutes by a US export-control order and is still down. Architect for portability. Plus the CEO playbook: kill the pilot mindset and name a deployment owner; redesign the workflow so the agent drafts and a trained person owns the output; and architect for model portability from day one. Sources: * Samsung deploys ChatGPT Enterprise + Codex company-wide — OpenAI / Let's Data Science [https://letsdatascience.com/news] * Cognition's Devin autonomous software engineer (~$492M ARR) — Bloomberg via WEEX [https://www.weex.com/news/detail/ai-programming-startup-cognition-completes-over-1-billion-in-funding-with-a-valuation-of-26-billion-with-participation-from-lux-capital-and-others-wdpx2q99vvqsb4hh0jtenwsy] * Sakana AI launches Fugu multi-model orchestration — Future Tools [https://futuretools.io/news] * Claude Fable 5 / Mythos 5 pulled offline by export-control order — Sonnet Code [https://www.sonnetcode.com/blog/fable-5-mythos-5-export-control-suspension-june-12-model-portability-procurement-risk-june-2026] The AI Brief from the YPO Technology Network is a daily executive briefing on the AI developments that matter to business leaders. Hosted by Stephen Forte.

24. Juni 20267 min

Inference Just Got Cheaper. The Market Panicked.

Beschreibung

Kommentare

2 Monate für 1 €

Alle Folgen