GPU Hoarding is Over. The $401B Reality Check

Descripción

Enterprise GPU hoarding is over. LinkedIn CTO Erran Berger and VentureBeat analyst Rob Strechay break down what comes next — and the infrastructure math most enterprises are only now being forced to confront. VentureBeat's Q1 research shows GPU availability anxiety dropped from 20.8% to 15.4% among enterprise teams, while cost-per-inference and TCO concerns jumped from 34% to 41% — a number that's still climbing. The hoarding phase is giving way to an audit phase, and the companies that didn't build the instrumentation to understand their workloads are now paying for it. Erran Berger explains how LinkedIn runs one of the few remaining at-scale applied ML shops outside the hyperscalers — owning the full stack from bare metal GPU clusters to member-facing products. That means LinkedIn engineers can optimize custom CUDA kernels, compress embeddings, prune models for throughput, and adapt networking and storage per workload — trade-offs that are simply unavailable on public cloud instance menus. The result: a rigorous ROI framework that evaluates not just current traffic costs, but the traffic shape agents will drive in 2–3 years. On the market side, 72% of enterprises admit they lack sufficient control over their AI infrastructure. Open-source inference tools like vLLM and LLMD are seeing rapid adoption, while 17% of organizations have moved to full-stack ownership. Hyperscalers report 60–80% of workloads have already shifted from training to inference — and most enterprise teams are still figuring out how to staff and instrument for that reality. 🎙️ GUEST: Erran Berger | CTO, LinkedIn 🎙️ ANALYST: Rob Strechay | VentureBeat 🎙️ HOST: Matt Marshall | CEO, VentureBeat --- 00:00 Intro: The GPU Hoarding Hangover 00:10 Guest Introductions 02:00 VentureBeat Q1 Data: GPU Panic Fades, TCO Concerns Rise 03:00 LinkedIn's Early Shift to Inference ROI Discipline 04:00 Budget Moving Into Inference Optimization and Control 07:00 LinkedIn's Full-Stack Advantage: Kernels, Pruning, Embedding Compression 08:00 Private AI and Sovereign Stacks: What the Q1 Data Shows 09:00 Open Source Inference Tooling: vLLM, LLMD, RDMA 10:00 Data Sovereignty at LinkedIn Scale: Member Data and Board-Level ROI Framing 12:00 Why Instrumentation Beats GPU Hoarding 13:00 Planning for Ambient Agent Traffic — Not Just Today's Workloads 14:00 Closing Advice for the Enterprise CTO Staring at 5% GPU Utilization --- Subscribe to VentureBeat: https://www.youtube.com/@VentureBeat Apple Podcasts: https://podcasts.apple.com/us/podcast/venturebeat/id1839285239 Spotify: https://open.spotify.com/show/4Zti73yb4hmiTNa7pEYls4 Website: https://venturebeat.com LinkedIn: https://www.linkedin.com/company/venturebeat Newsletter: https://venturebeat.com/newsletters #EnterpriseAI #AIInfrastructure #MLOps #InferenceOptimization #GenerativeAI --- Learn more about your ad choices. Visit megaphone.fm/adchoices [https://megaphone.fm/adchoices]

The Protocol Stack AI Is Missing

Cisco's OutShift deployed a multi-agent network configuration system that raised error detection from 10–15% to 100% and cut full change validation from 2–3 weeks to 6–7 minutes. The reason it worked — and why most enterprise multi-agent deployments still fail — comes down to a single gap nobody is talking about: agents can connect, but they cannot think together. Vijoy Pandey, SVP and General Manager of OutShift by Cisco, joins Matt and Sam to explain why A2A, MCP, and existing agent protocols solve connectivity but leave out an entire layer: shared cognition. OutShift's research identifies this as a missing "Layer 9" — a semantic and cognitive communication stack above today's syntactic protocols — and they're already building it. The conversation covers the four pillars of enterprise-grade multi-agent infrastructure (discovery, identity/access, communication, observability), why standard IAM models break when agents enter the picture, and how OutShift extended OpenTelemetry with Microsoft to cover multi-agent evaluation. Vijoy introduces three new cognition-state protocols — SSTP (Semantic State Transfer), LSTP (Latent Space Transfer), and CSTP (Compressed State Transfer) — and explains the staged rollout path for each, including a published MIT collaboration called the Ripple Effect Protocol. The healthcare scheduling case study is particularly instructive: three independent third-party agents — insurance, diagnostics, scheduling — each with competing optimization functions and siloed context, and zero shared intent. That's the real multi-vendor, multi-org enterprise problem. Vijoy explains what an orchestrator can't fix, and what a cognitive fabric layer would. 🎙️ GUEST: Vijoy Pandey | SVP & General Manager, OutShift by Cisco 🎙️ HOSTS: Matt Marshall | VentureBeat, Sam Witteveen | VentureBeat --- **CHAPTERS** 00:00 Intro & Cold Open: Agents Connect But Can't Think Together 00:03 Welcome & Guest Introduction: Vijoy Pandey, OutShift by Cisco 00:04 Do Agents Work Outside Coding & Customer Support? Challenging Amjad Masad's Diagnosis 00:05 What's Wrong With A2A and MCP? The Four Pillars of AGNTCY 00:08 Identity & Access Management for Agents: Why IAM Breaks and What TBAC Fixes 00:12 The Network Digital Twin: How OutShift Achieved 100% Error Detection in Production 00:13 From 2–3 Weeks to 6–7 Minutes: Real Results From Deployed Multi-Agent Networking 00:15 Agents Can Connect But Can't Think Together: The Core Thesis 00:20 The Cognitive Revolution Analogy: Shared Intent, Shared Context, Collective Innovation 00:25 The Healthcare Scheduling Case Study: Three Competing Agents, Zero Shared Intent 00:31 Why Orchestrators Fail in Multi-Vendor, Multi-Org Environments 00:36 Introducing Layer 9: SSTP, LSTP, and CSTP — The Cognition-State Protocol Stack 00:41 What OutShift Is Building Now: Protocols, Fabric, and Cognition Engines 00:44 MIT Collaboration: The Ripple Effect Protocol and Phase One Rollout 00:46 Cisco's 40-Year Networking Playbook Applied to the Internet of Cognition 00:49 Closing: Where to Find the Research, AGNTCY, and OpenClaw Integration --- Subscribe to VentureBeat: https://www.youtube.com/@VentureBeat Apple Podcasts: https://podcasts.apple.com/us/podcast/venturebeat/id1839285239 Spotify: https://open.spotify.com/show/4Zti73yb4hmiTNa7pEYls4 Website: https://venturebeat.com LinkedIn: https://www.linkedin.com/company/venturebeat Newsletter: https://venturebeat.com/newsletters #EnterpriseAI #AIAgents #MultiAgentSystems #AIInfrastructure #LLM — “Scaling Out Superintelligence” [https://outshift.cisco.com/internet-of-cognition/whitepaper?utm_campaign=fy26q3_ioc_ww_paid-media_ioc-vbep1-wp_podcast&utm_channel=podcast&utm_source=podcast] Vijoy Pandey, January 2026. The technical whitepaper detailing the Internet of Cognition architecture, three-layer stack, and cognition state protocols. Internet of Cognition Interactive Demo [https://outshift.cisco.com/internet-of-cognition/explore?utm_campaign=fy26q3_ioc_ww_paid-media_ioc-vbep1-wpdemo_podcast&utm_channel=podcast&utm_source=podcast] Clickable walkthrough showing per-agent activity, intent, context, and collective reasoning across a multi-agent SRE system. “A Layered Protocol Architecture for the Internet of Agents” [https://arxiv.org/abs/2511.19699] Fleming, Muscariello, Pandey, Kompella. The OSI Layer 8/9 extension. AGNTCY [https://agntcy.org/] Open source multi-agent infrastructure under Linux Foundation governance. Covers discovery, identity, communication, observability. Formative members: Cisco, Dell Technologies, Google Cloud, Oracle, Red Hat. Learn more about your ad choices. Visit megaphone.fm/adchoices [https://megaphone.fm/adchoices]

15 de abr de 202650 min

GPU Hoarding is Over. The $401B Reality Check

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios