013 - AI Resource Management Update & Tools with Frank Denneman

Descripción

In this episode of The Private AI Lab, Frank Denneman returns as the first recurring guest to go deeper into one of the most misunderstood challenges in AI: 👉 Resource management for GPU workloads Building on our previous conversation, this episode shifts from why it matters to how to actually design it right. We dive into real-world challenges like GPU fragmentation, siloed capacity, and why traditional infrastructure thinking breaks down when AI enters the data center. Frank shares practical insights from his latest research, blog series, and tools—helping architects and platform engineers understand how to design efficient, scalable AI environments. 🔍 What you’ll learn in this episode * Why GPU workloads behave fundamentally differently from CPU/memory workloads * What GPU fragmentation really is (and why it kills utilization) * The difference between same-size vs mixed-mode placement * How placement IDs turn GPU scheduling into “Tetris” * Why “right-sizing” beats “perfect fitting” in AI environments * How to design a GPU profile catalog that actually scales * The role of state, agents, and storage in next-gen AI platforms 🔧 Tools & Resources mentioned Frank created practical tools to help you design and validate your GPU environments: * 👉 vGPU Silo Capacity Calculator https://frankdenneman.ai/tools/vgpu-silo-capacity-calculator/ * 👉 Same-size vs Mixed-mode Placement Tool https://frankdenneman.ai/tools/same-size-vs-mixed-mode/ * 👉 Deep dive on unified memory & modern AI workloads https://frankdenneman.ai/posts/2026-03-23-understanding-unified-memory-dgx-spark-nemoclaw-nemotron/ Chapters: 00:00 Intro — Frank Denneman returns 01:30 AI hype vs real engineering 03:00 DGX Spark, NemoClaw & local AI agents 10:30 From LLMs to agents & stateful systems 12:00 Why AI infrastructure is different 15:00 What is GPU fragmentation? 19:30 Same-size vs mixed-mode placement 23:00 GPU “Tetris” and placement IDs explained 27:00 Right-sizing vs perfect fitting 32:00 The tools: capacity & placement simulation 36:00 GPU silos vs stranded capacity 41:00 Model sizing, KV cache & dynamic usage 48:00 Future of AI: smaller models & orchestration 55:00 AI-assisted coding & real-world impact 59:00 Key lessons learned 01:02:00 Closing thoughts

015 - Meet Sparky: A Real-Life Jarvis with Alexis Gallagher

I've been trying to build my own Jarvis for years. Then I met Alexis Gallagher at GTC — and Sparky is the closest thing I've seen. Alexis is an AI researcher and developer, formerly at Answer AI and Google, now building something most people in AI aren't: a robot designed not just to be useful, but to be *alive*. Sparky lives on his desk in San Francisco. He initiates conversations. He develops his own evolving interests — eels, catenary arches, abandoned infrastructure. He knows who's in the room, when to speak, and when to stay quiet. And he noticed when it was Alexis's first Friday after leaving his job. In this episode we go deep on the two design goals behind Sparky (useful and alive), the OpenClaw orchestration layer, the social awareness architecture running five times per second, the shared workspace principle that unlocks genuinely useful AI at a desk, and the tradeoffs between cascading and voice-to-voice architectures. We also do a live model switch mid-episode — from Claude Sonnet 4.6 to Nemotron 3 Super 120B running locally on a DGX Spark. It goes impressively well. Until it doesn't. That's in there too. Guest Alexis Gallagher — AI researcher and creator of Sparky 🌐 myrobotSparky.com 🔗 https://www.linkedin.com/in/alexis-gallagher/ Key topics covered - The two design goals: useful AND alive — and why "alive" is the one almost nobody builds for - How Sparky develops and evolves - The social awareness stack - What OpenClaw enables - The shared workspace principle - Cascading architecture (STT → LLM → TTS) vs voice-to-voice — the intelligence tradeoff - Hardware: Reachy Mini Lite, RTX 3090, DGX Spark, Raspberry Pi — the full spectrum - Live model switch: Claude Sonnet 4.6 → Nemotron 3 Super 120B (the Flowers for Algernon moment) - The future of personal AI — why embodied social presence is the natural human interface Chapters ``` 00:00 Introduction 00:39 Who is Alexis Gallagher? 01:04 The pivotal AI moment: speech recognition in 2015 03:14 Science fiction to reality — where are the talking robots? 04:22 Sparky introduces himself (live on air) 05:33 The two design goals: useful and alive 07:02 How Sparky initiates conversations — and why that changes everything 08:10 Organic interests: how Sparky evolves what he cares about 09:48 OpenClaw as orchestration layer — soul.md and body control 12:55 Defining a custom robot node type in OpenClaw 15:26 Social awareness: face detection, diarization, presence sensing 16:15 Hardware options: Linux, RTX 3090, DGX Spark, Raspberry Pi 18:25 The Reachy Mini Lite kit — and why it's better than building a drone 19:40 Where to find Alexis and join the Discord 20:10 One eye, four ears — Sparky's hardware explained 24:25 What OpenClaw enables that other frameworks don't 28:13 "Do you have a body, or are you a body?" — a live philosophical exchange 31:17 Live model switch: Claude Sonnet 4.6 → Nemotron 3 Super 33:01 The shared workspace principle — implicit shared attention 38:04 Orchestration in practice: Emacs, sub-agents, cross-platform 40:11 Cascading vs voice-to-voice architecture — the real tradeoff 42:15 Designing Sparky's voice (and the 1930s experiment) 44:12 What's genuinely useful day-to-day — two real examples 48:47 Nemotron 3 Super live — impressive, then the context window 53:38 The model Sparky was running before (Claude Sonnet 4.6) 54:03 Five years out: the future of personal AI companions 58:14 The closest thing to Jarvis I've ever seen 01:00:22 What's coming next — how fast the pieces are moving 01:02:16 Where to find Alexis and join the community ``` Links - Sparky project and Discord: https://myrobotSparky.com - Reachy Mini Lite: https://huggingface.co/reachy-mini The Private AI Lab is hosted by Johan van Amersfoort — Chief Evangelist and AI Lead at ITQ. 📬 Newsletter: https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7381951883810111489 📝 Blog: https://johan.ml 🔗 LinkedIn: https://www.linkedin.com/in/hojan

13 de may de 20261 h 4 min

013 - AI Resource Management Update & Tools with Frank Denneman

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios