Beyond The Pilot: Enterprise AI in Action

GPU Hoarding is Over. The $401B Reality Check

15 min · 13 de may de 2026
portada del episodio GPU Hoarding is Over. The $401B Reality Check

Descripción

Enterprise GPU hoarding is over. LinkedIn CTO Erran Berger and VentureBeat analyst Rob Strechay break down what comes next — and the infrastructure math most enterprises are only now being forced to confront. VentureBeat's Q1 research shows GPU availability anxiety dropped from 20.8% to 15.4% among enterprise teams, while cost-per-inference and TCO concerns jumped from 34% to 41% — a number that's still climbing. The hoarding phase is giving way to an audit phase, and the companies that didn't build the instrumentation to understand their workloads are now paying for it. Erran Berger explains how LinkedIn runs one of the few remaining at-scale applied ML shops outside the hyperscalers — owning the full stack from bare metal GPU clusters to member-facing products. That means LinkedIn engineers can optimize custom CUDA kernels, compress embeddings, prune models for throughput, and adapt networking and storage per workload — trade-offs that are simply unavailable on public cloud instance menus. The result: a rigorous ROI framework that evaluates not just current traffic costs, but the traffic shape agents will drive in 2–3 years. On the market side, 72% of enterprises admit they lack sufficient control over their AI infrastructure. Open-source inference tools like vLLM and LLMD are seeing rapid adoption, while 17% of organizations have moved to full-stack ownership. Hyperscalers report 60–80% of workloads have already shifted from training to inference — and most enterprise teams are still figuring out how to staff and instrument for that reality. 🎙️ GUEST: Erran Berger | CTO, LinkedIn 🎙️ ANALYST: Rob Strechay | VentureBeat 🎙️ HOST: Matt Marshall | CEO, VentureBeat --- 00:00 Intro: The GPU Hoarding Hangover 00:10 Guest Introductions 02:00 VentureBeat Q1 Data: GPU Panic Fades, TCO Concerns Rise 03:00 LinkedIn's Early Shift to Inference ROI Discipline 04:00 Budget Moving Into Inference Optimization and Control 07:00 LinkedIn's Full-Stack Advantage: Kernels, Pruning, Embedding Compression 08:00 Private AI and Sovereign Stacks: What the Q1 Data Shows 09:00 Open Source Inference Tooling: vLLM, LLMD, RDMA 10:00 Data Sovereignty at LinkedIn Scale: Member Data and Board-Level ROI Framing 12:00 Why Instrumentation Beats GPU Hoarding 13:00 Planning for Ambient Agent Traffic — Not Just Today's Workloads 14:00 Closing Advice for the Enterprise CTO Staring at 5% GPU Utilization --- Subscribe to VentureBeat: https://www.youtube.com/@VentureBeat Apple Podcasts: https://podcasts.apple.com/us/podcast/venturebeat/id1839285239 Spotify: https://open.spotify.com/show/4Zti73yb4hmiTNa7pEYls4 Website: https://venturebeat.com LinkedIn: https://www.linkedin.com/company/venturebeat Newsletter: https://venturebeat.com/newsletters #EnterpriseAI #AIInfrastructure #MLOps #InferenceOptimization #GenerativeAI --- Learn more about your ad choices. Visit megaphone.fm/adchoices [https://megaphone.fm/adchoices]

Comentarios

0

Sé la primera persona en comentar

¡Regístrate ahora y forma parte de la comunidad de Beyond The Pilot: Enterprise AI in Action!

Prueba gratis

Empieza 7 días de prueba

$99 / mes después de la prueba. · Cancela cuando quieras.

  • Podcasts solo en Podimo
  • 20 horas de audiolibros al mes
  • Podcast gratuitos

Todos los episodios

30 episodios

episode Building a 30% Better AI: The Taste Graph Moat artwork

Building a 30% Better AI: The Taste Graph Moat

Pinterest's open-source AI stack costs 90% less than frontier models — and their custom-trained recommender outperforms off-the-shelf alternatives by 30% in accuracy. Pinterest CTO Matt Madrigal breaks down exactly how they did it, and what enterprise AI teams can actually replicate. Madrigal walks through the full architecture behind Navigator 1, Pinterest's conversational shopping assistant built on Qwen 3 VL — and the specific decision to rip out its native vision encoder and replace it with PinCLIP, Pinterest's proprietary multimodal embedding layer. That swap alone closes a 20x inference latency gap and makes the economics work at 620 million monthly active users. This is the clearest public explanation yet of how a scaled platform operationalizes the "core vs. context" principle for model selection: open-source and custom-built where it touches the user, frontier models where speed-to-prototype matters more than cost. The conversation also covers the Taste Graph — Pinterest's knowledge graph across hundreds of billions of pins and 15 billion boards — and how post-training on that proprietary data lets a smaller, fit-for-purpose model beat a larger frontier model on production metrics. Madrigal details their eval framework: gold set benchmarks, product-level evals tied to engagement and merchant click outcomes, and a structured A/B test pipeline that runs from engineer PRs through to live user signal. On the organizational side: how Pinterest manages a "default yes" multi-IDE policy (Cursor, Windsurf, Claude Code, Codex) without collapsing security posture, how they segment sandbox environments between ML engineers with Taste Graph access and general application developers, and why Madrigal measures AI coding ROI in token usage and experimentation velocity — not lines of code. 🎙️ GUEST: Matt Madrigal | CTO, Pinterest 🎙️ HOSTS: Matt Marshall | VentureBeat, Sam Witteveen | VentureBeat 00:00 Show Intro and Guest 01:17 Open Source Cost Breakdown 02:20 Pinterest Multimodal Roots 02:37 PinClip and Embeddings 05:46 Core vs Context Models 07:43 Navigator 1 Assistant Stack 11:52 Benchmarking and Evals 13:29 Accuracy from Proprietary Data 17:16 Taste Graph Explained 18:29 Taste Graph in Training 22:22 Fighting AI Slop 25:16 Developer Tools and Velocity 27:57 Tool Choice and Governance 28:56 Security Sandboxes and CICD 30:57 Wrap Up Subscribe to VentureBeat: https://www.youtube.com/@VentureBeat Apple Podcasts: https://podcasts.apple.com/us/podcast/venturebeat/id1839285239 Spotify: https://open.spotify.com/show/4Zti73yb4hmiTNa7pEYls4 Website: https://venturebeat.com LinkedIn: https://www.linkedin.com/company/venturebeat Newsletter: https://venturebeat.com/newsletters #EnterpriseAI #OpenSourceAI #AIInfrastructure #LLM #MachineLearning Learn more about your ad choices. Visit megaphone.fm/adchoices [https://megaphone.fm/adchoices]

27 de may de 202633 min
episode GPU Hoarding is Over. The $401B Reality Check artwork

GPU Hoarding is Over. The $401B Reality Check

Enterprise GPU hoarding is over. LinkedIn CTO Erran Berger and VentureBeat analyst Rob Strechay break down what comes next — and the infrastructure math most enterprises are only now being forced to confront. VentureBeat's Q1 research shows GPU availability anxiety dropped from 20.8% to 15.4% among enterprise teams, while cost-per-inference and TCO concerns jumped from 34% to 41% — a number that's still climbing. The hoarding phase is giving way to an audit phase, and the companies that didn't build the instrumentation to understand their workloads are now paying for it. Erran Berger explains how LinkedIn runs one of the few remaining at-scale applied ML shops outside the hyperscalers — owning the full stack from bare metal GPU clusters to member-facing products. That means LinkedIn engineers can optimize custom CUDA kernels, compress embeddings, prune models for throughput, and adapt networking and storage per workload — trade-offs that are simply unavailable on public cloud instance menus. The result: a rigorous ROI framework that evaluates not just current traffic costs, but the traffic shape agents will drive in 2–3 years. On the market side, 72% of enterprises admit they lack sufficient control over their AI infrastructure. Open-source inference tools like vLLM and LLMD are seeing rapid adoption, while 17% of organizations have moved to full-stack ownership. Hyperscalers report 60–80% of workloads have already shifted from training to inference — and most enterprise teams are still figuring out how to staff and instrument for that reality. 🎙️ GUEST: Erran Berger | CTO, LinkedIn 🎙️ ANALYST: Rob Strechay | VentureBeat 🎙️ HOST: Matt Marshall | CEO, VentureBeat --- 00:00 Intro: The GPU Hoarding Hangover 00:10 Guest Introductions 02:00 VentureBeat Q1 Data: GPU Panic Fades, TCO Concerns Rise 03:00 LinkedIn's Early Shift to Inference ROI Discipline 04:00 Budget Moving Into Inference Optimization and Control 07:00 LinkedIn's Full-Stack Advantage: Kernels, Pruning, Embedding Compression 08:00 Private AI and Sovereign Stacks: What the Q1 Data Shows 09:00 Open Source Inference Tooling: vLLM, LLMD, RDMA 10:00 Data Sovereignty at LinkedIn Scale: Member Data and Board-Level ROI Framing 12:00 Why Instrumentation Beats GPU Hoarding 13:00 Planning for Ambient Agent Traffic — Not Just Today's Workloads 14:00 Closing Advice for the Enterprise CTO Staring at 5% GPU Utilization --- Subscribe to VentureBeat: https://www.youtube.com/@VentureBeat Apple Podcasts: https://podcasts.apple.com/us/podcast/venturebeat/id1839285239 Spotify: https://open.spotify.com/show/4Zti73yb4hmiTNa7pEYls4 Website: https://venturebeat.com LinkedIn: https://www.linkedin.com/company/venturebeat Newsletter: https://venturebeat.com/newsletters #EnterpriseAI #AIInfrastructure #MLOps #InferenceOptimization #GenerativeAI --- Learn more about your ad choices. Visit megaphone.fm/adchoices [https://megaphone.fm/adchoices]

13 de may de 202615 min
episode Agents Ate the UI: Data is Your Only Moat with LlamaIndex artwork

Agents Ate the UI: Data is Your Only Moat with LlamaIndex

The CEO who built one of the most-starred RAG frameworks on GitHub (47,000 stars) just publicly declared that frameworks like his are becoming obsolete — and then pivoted his entire company around that conclusion. Jerry Liu, CEO and co-founder of LlamaIndex, joins Matt Marshall and Sam Witteveen to explain exactly what broke in the AI stack, why 95% of his team's code is now AI-generated, and where the real defensibility in enterprise AI infrastructure actually lives in 2026. The conversation covers the specific architectural shift that made RAG orchestration frameworks less central: agent reasoning has improved to the point where dumb tools plus smart agents outperform sophisticated retrieval pipelines, coding agents have collapsed the cost of custom integrations, and model providers like Anthropic are consolidating the harness layer around MCP, sandboxes, and session state. Jerry walks through Anthropic's managed agent diagram as a real architectural reference point and explains why engineering leaders should prioritize modular interfaces over implementation investment — because parts of your current stack will need to be thrown away in months, not years. On SaaS survival, Jerry argues the companies that retain value are those becoming systems of record — and that the real opportunity is building AI agents that automate labor on top of their platforms, not defending UI/UX that agents are now bypassing. On LlamaIndex's own bet: document understanding — parsing PDFs, tables, charts, and forms at higher accuracy and lower cost than frontier models — is the context layer every agent stack needs regardless of which model wins the next benchmark cycle. LlamaParse and the newly released open-source ParseBench (April 13) are the commercial expression of that thesis. If you're evaluating your AI stack architecture, deciding how much to build vs. buy, or trying to understand where horizontal tooling still has a moat, this episode is the conversation. 🎙️ GUEST: Jerry Liu | CEO & Co-Founder, LlamaIndex 🎙️ HOSTS: Matt Marshall | VentureBeat, Sam Witteveen | VentureBeat --- **CHAPTERS** 00:00 Intro — LlamaIndex's origin and RAG framework origins 02:00 How LlamaIndex started: GPT-3, 4K context windows, and GPT Index 04:00 Why AI frameworks are becoming less useful in the agentic era 07:00 What changed in the stack: agent reasoning, coding agents, and RAG's evolution 09:00 How Anthropic's managed agent diagram reframes enterprise architecture 13:00 The lock-in question: managed agents, session state, and stack modularity 16:00 Should you build horizontal tooling? Why Jerry says probably not 18:00 Open vs. closed: the Apple/Android analogy applied to frontier labs 21:00 The abstraction level is rising — English is the new programming language 24:00 SaaS market cap destruction: who survives agents eating software 28:00 The "full stack builder" emergence and the future of SaaS seats 31:00 Buy vs. build for agents: the AI recruiter thought experiment 33:00 LlamaIndex's pivot: document understanding as defensible infrastructure 36:00 Why frontier models won't commoditize specialized document parsing 38:00 LlamaParse deep dive: zero-shot accuracy, tables, charts, handwriting 41:00 LightParse, ParseBench, and designing for agent consumers 44:00 Wrap-up and where to follow LlamaIndex --- Subscribe to VentureBeat: https://www.youtube.com/@VentureBeat Apple Podcasts: https://podcasts.apple.com/us/podcast/venturebeat/id1839285239 Spotify: https://open.spotify.com/show/4Zti73yb4hmiTNa7pEYls4 Website: https://venturebeat.com LinkedIn: https://www.linkedin.com/company/venturebeat Newsletter: https://venturebeat.com/newsletters #EnterpriseAI #AIAgents #LLMInfrastructure #RAG #AIArchitecture --- Learn more about your ad choices. Visit megaphone.fm/adchoices [https://megaphone.fm/adchoices]

29 de abr de 202645 min
episode The Protocol Stack AI Is Missing artwork

The Protocol Stack AI Is Missing

Cisco's OutShift deployed a multi-agent network configuration system that raised error detection from 10–15% to 100% and cut full change validation from 2–3 weeks to 6–7 minutes. The reason it worked — and why most enterprise multi-agent deployments still fail — comes down to a single gap nobody is talking about: agents can connect, but they cannot think together. Vijoy Pandey, SVP and General Manager of OutShift by Cisco, joins Matt and Sam to explain why A2A, MCP, and existing agent protocols solve connectivity but leave out an entire layer: shared cognition. OutShift's research identifies this as a missing "Layer 9" — a semantic and cognitive communication stack above today's syntactic protocols — and they're already building it. The conversation covers the four pillars of enterprise-grade multi-agent infrastructure (discovery, identity/access, communication, observability), why standard IAM models break when agents enter the picture, and how OutShift extended OpenTelemetry with Microsoft to cover multi-agent evaluation. Vijoy introduces three new cognition-state protocols — SSTP (Semantic State Transfer), LSTP (Latent Space Transfer), and CSTP (Compressed State Transfer) — and explains the staged rollout path for each, including a published MIT collaboration called the Ripple Effect Protocol. The healthcare scheduling case study is particularly instructive: three independent third-party agents — insurance, diagnostics, scheduling — each with competing optimization functions and siloed context, and zero shared intent. That's the real multi-vendor, multi-org enterprise problem. Vijoy explains what an orchestrator can't fix, and what a cognitive fabric layer would. 🎙️ GUEST: Vijoy Pandey | SVP & General Manager, OutShift by Cisco 🎙️ HOSTS: Matt Marshall | VentureBeat, Sam Witteveen | VentureBeat --- **CHAPTERS** 00:00 Intro & Cold Open: Agents Connect But Can't Think Together 00:03 Welcome & Guest Introduction: Vijoy Pandey, OutShift by Cisco 00:04 Do Agents Work Outside Coding & Customer Support? Challenging Amjad Masad's Diagnosis 00:05 What's Wrong With A2A and MCP? The Four Pillars of AGNTCY 00:08 Identity & Access Management for Agents: Why IAM Breaks and What TBAC Fixes 00:12 The Network Digital Twin: How OutShift Achieved 100% Error Detection in Production 00:13 From 2–3 Weeks to 6–7 Minutes: Real Results From Deployed Multi-Agent Networking 00:15 Agents Can Connect But Can't Think Together: The Core Thesis 00:20 The Cognitive Revolution Analogy: Shared Intent, Shared Context, Collective Innovation 00:25 The Healthcare Scheduling Case Study: Three Competing Agents, Zero Shared Intent 00:31 Why Orchestrators Fail in Multi-Vendor, Multi-Org Environments 00:36 Introducing Layer 9: SSTP, LSTP, and CSTP — The Cognition-State Protocol Stack 00:41 What OutShift Is Building Now: Protocols, Fabric, and Cognition Engines 00:44 MIT Collaboration: The Ripple Effect Protocol and Phase One Rollout 00:46 Cisco's 40-Year Networking Playbook Applied to the Internet of Cognition 00:49 Closing: Where to Find the Research, AGNTCY, and OpenClaw Integration --- Subscribe to VentureBeat: https://www.youtube.com/@VentureBeat Apple Podcasts: https://podcasts.apple.com/us/podcast/venturebeat/id1839285239 Spotify: https://open.spotify.com/show/4Zti73yb4hmiTNa7pEYls4 Website: https://venturebeat.com LinkedIn: https://www.linkedin.com/company/venturebeat Newsletter: https://venturebeat.com/newsletters #EnterpriseAI #AIAgents #MultiAgentSystems #AIInfrastructure #LLM — “Scaling Out Superintelligence” [https://outshift.cisco.com/internet-of-cognition/whitepaper?utm_campaign=fy26q3_ioc_ww_paid-media_ioc-vbep1-wp_podcast&utm_channel=podcast&utm_source=podcast]  Vijoy Pandey, January 2026. The technical whitepaper detailing the Internet of Cognition architecture, three-layer stack, and cognition state protocols.  Internet of Cognition Interactive Demo [https://outshift.cisco.com/internet-of-cognition/explore?utm_campaign=fy26q3_ioc_ww_paid-media_ioc-vbep1-wpdemo_podcast&utm_channel=podcast&utm_source=podcast] Clickable walkthrough showing per-agent activity, intent, context, and collective reasoning across a multi-agent SRE system.  “A Layered Protocol Architecture for the Internet of Agents” [https://arxiv.org/abs/2511.19699] Fleming, Muscariello, Pandey, Kompella. The OSI Layer 8/9 extension.  AGNTCY [https://agntcy.org/] Open source multi-agent infrastructure under Linux Foundation governance. Covers discovery, identity, communication, observability.  Formative members: Cisco, Dell Technologies, Google Cloud, Oracle, Red Hat.  Learn more about your ad choices. Visit megaphone.fm/adchoices [https://megaphone.fm/adchoices]

15 de abr de 202650 min
episode 100M Agents: Scaling the New Execution stack with Intuit artwork

100M Agents: Scaling the New Execution stack with Intuit

A QuickBooks customer discovered significant fraud by asking their AI assistant follow-up questions about transaction amounts that didn't add up. This isn't a demo — it's one of 3 million customers now using Intuit's AI agents in production, with 80.5% returning to use them again. Marianna Tessel, EVP and GM of QuickBooks (formerly CTO of Intuit), walks through the architecture decisions behind one of the first enterprise AI deployments at true scale. Intuit's "done-for-you" agents now automate book closing, reconciliation, transaction categorization, and payroll — but the breakthrough came when they realized chatbots alone weren't enough. Businesses wanted human experts integrated directly into AI workflows, creating what Intuit calls the "AI + HI" model (artificial intelligence + human intelligence). The results: invoices paid 5 days faster, 90% more paid in full, 30% reduction in manual work, and 62% of users reporting bookkeeping is easier. Tessel reveals the technical evolution: moving from monolithic agents to a dynamic orchestration layer that routes queries across multiple LLMs (including Intuit's proprietary FinLM built on open-source), 24,000 bank connections, and 600,000 customer attributes. The system now handles proactive anomaly detection, benchmarking against similar businesses, and even nascent vibe coding — all without requiring users to understand they're essentially programming workflows through natural language. She also addresses the "SaaS apocalypse" narrative head-on, explaining why QuickBooks saw 18% growth last quarter while competitors faced market pressure: durable data advantages and customer trust in financial accuracy matter more than ever when AI enters the mix. For enterprise builders navigating agent architecture, data grounding, and human-in-the-loop design, this is a rare look inside a working system serving millions. 🎙️ GUEST: Marianna Tessel | EVP & GM, QuickBooks (Intuit) 🎙️ HOSTS: Matt Marshall | VentureBeat, Sam Witteveen | VentureBeat 00:00 Intro — Customer discovers fraud using QuickBooks AI 03:26 Intuit Intelligence: Agents, BI, and human expertise integration 05:20 First-time AI users and going beyond chatbots 08:02 How Intuit decides which workflows to automate 10:16 Sponsor: Outshift by Cisco 10:38 Human-in-the-loop: When to insert experts vs. full automation 13:00 The AI + HI model: Why customers want human verification 15:24 Human expertise as confidence layer, not just AI check 16:14 Proprietary data advantage: 24K bank connections, 600K attributes 18:39 Benchmarking: "Businesses like me" — using aggregate data for competitive insights 19:52 First-party vs. third-party data strategy 21:38 Addressing the "SaaS apocalypse" narrative — why Intuit grew 18% last quarter 24:39 Proactive AI: Anomaly detection for marketing expense spikes 25:20 Builder perspective: Leaning on LLM orchestration, not use-case-by-use-case builds 27:32 Architecture evolution: From monolithic agents to dynamic tools and skills 29:10 Composite UX: Chat side-by-side with traditional workflows 30:35 Multi-model strategy: Genos platform, FinLM, and model routing 31:16 Vibe coding and actions: Letting users automate without realizing they're coding 32:47 Personalization wave: Memory, persistence, and user-defined workflows 35:08 Docker background and primitives that survive disruption 36:00 Open Claw and agent automation: Real revolution or risky experimentation? #EnterpriseAI #AIAgents #QuickBooks #Intuit #LLMOrchestration #AgenticAI Presented by Outshift by Cisco Outshift is Cisco’s emerging tech incubation engine and driver of Agentic AI, quantum, and next-gen infrastructure. Learn more at outshift.cisco.com [https://outshift.cisco.com]. About VentureBeat: VentureBeat equips enterprise technology leaders with the clearest, expert guidance on AI – and on the data and security foundations that turn it into working reality. 🔗 CONNECT WITH US Subscribe to our Newsletters for technical breakdowns: https://venturebeat.com/newsletters Visit VentureBeat: Venturebeat.com . . . Subscribe to VentureBeat:     /  @VentureBeat   . . Subscribe to the full podcast here: Apple: https://podcasts.apple.com/us/podcast/venturebeat/id1839285239 Spotify: https://open.spotify.com/show/4Zti73yb4hmiTNa7pEYls4 YouTube: https://www.youtube.com/VentureBeat Learn more about your ad choices. Visit megaphone.fm/adchoices [https://megaphone.fm/adchoices]

1 de abr de 202638 min