The Private AI Lab
In this episode of The Private AI Lab, Frank Denneman returns as the first recurring guest to go deeper into one of the most misunderstood challenges in AI: đ Resource management for GPU workloads Building on our previous conversation, this episode shifts from why it matters to how to actually design it right. We dive into real-world challenges like GPU fragmentation, siloed capacity, and why traditional infrastructure thinking breaks down when AI enters the data center. Frank shares practical insights from his latest research, blog series, and toolsâhelping architects and platform engineers understand how to design efficient, scalable AI environments. đ What youâll learn in this episode * Why GPU workloads behave fundamentally differently from CPU/memory workloads * What GPU fragmentation really is (and why it kills utilization) * The difference between same-size vs mixed-mode placement * How placement IDs turn GPU scheduling into âTetrisâ * Why âright-sizingâ beats âperfect fittingâ in AI environments * How to design a GPU profile catalog that actually scales * The role of state, agents, and storage in next-gen AI platforms đ§ Tools & Resources mentioned Frank created practical tools to help you design and validate your GPU environments: * đ vGPU Silo Capacity Calculator https://frankdenneman.ai/tools/vgpu-silo-capacity-calculator/ * đ Same-size vs Mixed-mode Placement Tool https://frankdenneman.ai/tools/same-size-vs-mixed-mode/ * đ Deep dive on unified memory & modern AI workloads https://frankdenneman.ai/posts/2026-03-23-understanding-unified-memory-dgx-spark-nemoclaw-nemotron/ Chapters: 00:00 Intro â Frank Denneman returns 01:30 AI hype vs real engineering 03:00 DGX Spark, NemoClaw & local AI agents 10:30 From LLMs to agents & stateful systems 12:00 Why AI infrastructure is different 15:00 What is GPU fragmentation? 19:30 Same-size vs mixed-mode placement 23:00 GPU âTetrisâ and placement IDs explained 27:00 Right-sizing vs perfect fitting 32:00 The tools: capacity & placement simulation 36:00 GPU silos vs stranded capacity 41:00 Model sizing, KV cache & dynamic usage 48:00 Future of AI: smaller models & orchestration 55:00 AI-assisted coding & real-world impact 59:00 Key lessons learned 01:02:00 Closing thoughts
17 episodios
Comentarios
0SĂ© la primera persona en comentar
ÂĄRegĂstrate ahora y Ășnete a la comunidad de The Private AI Lab!