Gimlet's Cross-Vendor Inference Cloud

48 min · 12 de may de 2026

Descripción

Gimlet Labs runs an inference cloud built on heterogeneous silicon. Their software traces a PyTorch workload, segments it into its component parts, and schedules each piece onto the best-suited hardware — connecting chips from different vendors on a single high-speed fabric. In this interview, Gimlet co-founder Natalie Serrino and former Intel executive Beltir walk through the architecture (graph trace, optimal split points, lowering each segment to TensorRT on NVIDIA and equivalents elsewhere), the three customer segments they sell into (frontier labs, sovereign clouds, AI natives), and a concrete demo: on GPT-OSS 120B at 8K input / 1K output, running the speculative decoder on a d-Matrix Corsair card while NVIDIA B200s handle the verifier shifts the throughput-vs-interactivity Pareto frontier roughly 4× over GPU-only speculative decode. The most surprising takeaway: most Neoclouds gave significant equity to a single silicon vendor in exchange for capacity. Hardware amortization is around 70% of their annual costs, and the equity terms prevent them from diversifying their silicon. So the only software innovation they can ship is disaggregation on top of one vendor's stack — never across vendors. Gimlet's two-track model (deploying orchestration software inside customer data centers, plus running their own Neocloud built on mixed silicon) is the answer to that constraint. Read the full transcript on Chipstrat. Chapters: 0:00 Intro and the chips no one's connected before 0:33 Inference cloud for agents 1:02 From Intel to Gimlet 2:14 The case for heterogeneous inference 4:03 Disaggregating inference by resource profile 6:24 Tracing PyTorch into a schedulable graph 8:08 Connecting chips never connected before 10:52 CPUs as the agentic workhorse 12:01 Tool calls in the same data center as the LLM 13:21 Latency vs throughput on a shared fabric 14:57 Three customer buckets 15:54 Sovereigns: make an API call, not a porting project 19:37 "Cracked software is the platform" 22:24 Why merchant silicon vendors need partners 25:18 Hyperscalers outsourcing CapEx, not just kernels 28:49 AI natives: latency budgets, not just price 32:06 The d-Matrix partnership 33:31 The Pareto frontier chart 35:56 Speculative decode on Corsair: 4× shift 37:27 4× faster, or 3× more customers? 41:22 Why most Neoclouds can't follow this model 42:34 Gimlet's two-track business model 44:30 CoreWeave vs Together vs Gimlet 45:15 Series A and hiring Relevant reading: The Information on Gimlet helping OpenAI optimize for Cerebras: https://www.theinformation.com/newsletters/ai-agenda/startup-helping-openai-optimize-ai-cerebras-chips Sachin Katti and Zain Asgar coauthored research at Stanford: https://arxiv.org/abs/2507.19635 Follow Chipstrat: Newsletter: https://www.chipstrat.com X: https://x.com/chipstrat

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y forma parte de la comunidad de Semi Doped!

Prueba gratis

Todos los episodios

30 episodios

Lithography Masterclass

Spend one hour here and you've caught up on the entire arc of semiconductor lithography. Austin and Vik run a masterclass on the technology that decides who gets to make leading-edge chips, and why so few companies can afford to. The thread is economics. An EUV machine runs about $400 million, a new fab needs roughly 15 of them, and the total bill clears $20-30 billion before a single wafer ships. Austin and Vik trace the whole story: Rock's Law and the cost of a fab, what it actually takes to build one, the evolution from 193nm DUV through multi-patterning to 13.5nm EUV, how ASML generates EUV light by exploding falling tin droplets, and the move to high NA and its mirrors. Along the way, the fun history — i-line, krypton fluoride, immersion lithography, and the engineer who started it all by flipping a microscope upside down. Then the part that matters most: where lithography goes next. Two startups, xLight and Substrate, are attacking the cost problem from first principles. xLight wants to decouple the light source from the scanner with a free-electron laser and sell photons as a service. Substrate wants to skip EUV entirely and revive X-ray lithography. If either works, the economics of who can build a fab change completely. Chapters: 0:00 The 13F panic, and today's topic 2:23 Why the real story is economics, not physics 6:18 Austin in the clean room: graphene and bunny suits 10:06 Rock's Law and the $20 billion fab 18:08 DUV, the Sharpie, and a history of light 24:58 Multi-patterning, explained with a football field 34:45 How EUV makes 13.5nm light from tin droplets 41:14 High NA, anamorphic optics, and the half-field tax 46:45 The startups rethinking lithography: xLight and Substrate Relevant reading: Chipstrat — The economics of lithography: https://www.chipstrat.com/p/lithography-economics [https://www.chipstrat.com/p/lithography-economics] Chipstrat — xLight and photons as a service: https://www.chipstrat.com/p/photons-as-a-service [https://www.chipstrat.com/p/photons-as-a-service] Chipstrat — Substrate and X-ray lithography: https://www.chipstrat.com/p/substrate [https://www.chipstrat.com/p/substrate] Vik's Newsletter — the viability of X-ray lithography: https://www.viksnewsletter.com/p/an-in-depth-look-at-the-viability [https://www.viksnewsletter.com/p/an-in-depth-look-at-the-viability] Fred Chen — LELE multipatterning and EUV stochastics (Substack): https://frederickchen.substack.com/p/can-lele-multipatterning-help-against [https://frederickchen.substack.com/p/can-lele-multipatterning-help-against] Chip War, Chris Miller Focus, Marc Hijink (the ASML book): https://www.amazon.com/Focus-Inside-struggle-complex-machine-ebook/dp/B0CW1FLCD4 [https://www.amazon.com/Focus-Inside-struggle-complex-machine-ebook/dp/B0CW1FLCD4] Follow Chipstrat: Newsletter: https://www.chipstrat.com [https://www.chipstrat.com] X: https://x.com/chipstrat [https://x.com/chipstrat] Follow Vik: Newsletter: https://www.viksnewsletter.com/ [https://www.viksnewsletter.com/] X: https://x.com/vikramskr [https://x.com/vikramskr] Follow Semi Doped: Get more of Austin and Vik daily, free! Sign up: https://www.semidoped.com/ [https://www.semidoped.com/]

22 de may de 20261 h 3 min

Cerebras IPO

Cerebras IPO is the only thing to talk about this week. 🔥 IPO prices at $185/share. Pops nearly 70% right after. The first wafer-scale chip company to make it public — after a 40-year curse killed every prior attempt. A water-cooler-style convo on what Cerebras actually builds, why a 23 kW wafer is a power and cooling nightmare, why 44 GB of SRAM is both the magic and the wall for LLM inference, and the cursed Trilogy Systems saga that Gene Amdahl tried — and failed — to pull off in 1983. Why does Cerebras leave the whole wafer intact instead of dicing it? How do they route around defects to harvest ~900K working cores out of ~1M? Why is power delivery vertical, and why does the wafer literally expand a tenth of a millimeter when it heats up? What does the OpenAI deal actually buy — wafers, or tokens? And why does that distinction matter? Chapters: 0:00 Cold open: 23 kW per wafer 0:15 Cerebras IPO day at $185 2:39 What's a wafer-scale engine 10:30 Power, cooling, and thermal expansion 18:12 The 44 GB wall 26:35 The Trilogy Systems curse 32:11 Supercomputing → training → inference 39:36 The OpenAI deal and the Wild West Relevant reading: Vik's Substack post on the Cerebras IPO and OpenAI deal: https://www.viksnewsletter.com/ [https://www.viksnewsletter.com/] Follow Chipstrat: Newsletter: https://www.chipstrat.com [https://www.chipstrat.com] X: https://x.com/austinsemis [https://x.com/austinsemis] Follow Vik: Newsletter: https://www.viksnewsletter.com/ [https://www.viksnewsletter.com/] X: https://x.com/vikramskr [https://x.com/vikramskr] Follow Semi Doped: Get more of Austin and Vik daily, free! Sign up: https://www.semidoped.com/ [https://www.semidoped.com/]

15 de may de 202650 min

Gimlet's Cross-Vendor Inference Cloud

12 de may de 202648 min

Power as the Next Physics Wall for AI

What's common to optics and power that ruins everything in the era of AI? Resistance. The same physics that drove interconnects to optics is now driving low-voltage power delivery up to 800V. Austin Lyons (Chipstrat) and Vik Sekar (Vik's Newsletter) unpack it using the Kyber rack as an example. At 600kW and 48V, you're pushing 12,500 amps through a single rack. Power loss scales with I². The math doesn't work. The fix is 800V — and the parts come straight from the EV traction inverter ecosystem (SiC, GaN, IGBTs). We cover the full grid-to-GPU power conversion chain (substation, utility room, PSU, intermediate bus converter, VRM), why vertical power delivery is the CPO equivalent for power, and why the power industry is a much wider open problem than optics or HBM. Plus the new topology fight: 800V → 48V (reuse the existing 48V infrastructure) vs 800V → 6V (skip 48V entirely, like TI and Navitas are pushing). We also touch Coherent's six-inch indium phosphide ramp at Järfälla, Sweden, and why margins are the real read-through next quarter. Relevant reading: Vik's Substack post on power: https://www.viksnewsletter.com/p/power-delivery-as-the-next-physics-wall [https://www.viksnewsletter.com/p/power-delivery-as-the-next-physics-wall] Google TPU 8i / 8t blog (Boardfly deep dive): https://cloud.google.com/blog/products/compute/tpu-8t-and-tpu-8i-technical-deep-dive [https://cloud.google.com/blog/products/compute/tpu-8t-and-tpu-8i-technical-deep-dive] Get more of Austin and Vik daily, free! Sign up here: https://www.semidoped.com/ Follow Chipstrat: Newsletter: https://www.chipstrat.com [https://www.chipstrat.com/] X: https://x.com/austinsemis [https://x.com/austinsemis] Follow Vik: Newsletter: https://www.viksnewsletter.com/ [https://www.viksnewsletter.com/] X: https://x.com/vikramskr [https://x.com/vikramskr] Chapters (00:00) Intro (01:41) Memory tax: inflation, not innovation (03:46) Boardfly: 16 hops to 7 (05:12) Coherent's six-inch indium phosphide ramp (12:15) Power is the next physics wall (15:08) Why 48V breaks at 600kW: 12,500 amps (23:05) 800V and vertical power delivery: CPO for power (30:34) Grid to GPU: every stage is a different supply chain (39:20) 800V → 48V or skip straight to 6V?

8 de may de 202641 min

CapEx is just Memory Tax Now, Deepseek V4 NAND impact

The hyperscaler memory tax quarter. More CapEx? Pssh. We knew flops needed scaling. But $25B at Microsoft alone just to pay higher component prices? A memory tax. That's the news. NAND? Sold out. HBM? Sold out. What we cover: * SanDisk revenue +97% sequential. * 78% gross margin. Guidance above 80% next quarter. * Samsung HBM4 first to ship. Demand outstripping supply. * DeepSeek v4 goes SSD-centric. KV cache offloads to flash. * Microsoft: $25B of 2026 CapEx is just memory pricing. * Jassy: memory shortage pushes on-prem to AWS. * Qualcomm: mystery custom ASIC. Ships December. New Semi Doped with @vikramskr and @austinsemis. Check out our Substacks - https://www.viksnewsletter.com/ - https://www.chipstrat.com/ Chapters: 0:00 Intro and Vik goes full-time 5:15 Earnings week: the memory tax 7:26 Samsung HBM4 and the Gbps race 14:42 Is the memory tax worth it? 17:37 SanDisk and the SunDisk origin 23:22 78% gross margins and 5-year supply lock-ins 29:29 DeepSeek v4 and SSD-centric inference 38:49 Hyperscaler CapEx and the cloud pull 42:49 AI accelerators: TPU, Trainium, MTIA

4 de may de 202645 min

Gimlet's Cross-Vendor Inference Cloud

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios