016 - Nemotron 3 Ultra: NVIDIA’s Open-Weights Frontier Agent Brain (1M Context, 5x Faster)

Beschrijving

Johan breaks down NVIDIA’s ComputeEx 2026 announcement of Nemotron 3 Ultra 550B-A 55B, an open-weights mixture-of-experts model with 550B total parameters and 55B active, positioned as an orchestration “agent brain” for multi-step tasks behind the firewall. He reviews NVIDIA’s benchmarks versus GLM 5.1, Kimi K 2.6, and Qwen 3.5, highlighting best-in-class instruction following (82%), long-context performance (95%) with a 1M-token window, strong agent productivity (91%), and weaker coding results on TerminalBench versus Kimi. Johan emphasizes reported advantages in speed (~300 tokens/sec, ~5x faster), cost (up to ~30% cheaper on SWE-bench tests), and deployability via a unified NVFP4 checkpoint optimized for H100 and B200 GPUs, plus NemoClaw as the agent blueprint. He closes with an early-access demo comparing two agents researching Netherlands’ 2026 World Cup odds, showing Nemotron’s more granular path analysis and a 5.8% win estimate.00:00 Private AI Lab Intro01:19 Nemotron Ultra Explained02:22 Agent Brain Focus03:07 Benchmark Reality Check05:14 Speed And Cost Edge06:11 Training And Precision08:02 NeMo Claw Agents08:58 World Cup Agent Demo12:22 Why This Matters13:17 Wrap Up And Links

015 - Meet Sparky: A Real-Life Jarvis with Alexis Gallagher

I've been trying to build my own Jarvis for years. Then I met Alexis Gallagher at GTC — and Sparky is the closest thing I've seen. Alexis is an AI researcher and developer, formerly at Answer AI and Google, now building something most people in AI aren't: a robot designed not just to be useful, but to be *alive*. Sparky lives on his desk in San Francisco. He initiates conversations. He develops his own evolving interests — eels, catenary arches, abandoned infrastructure. He knows who's in the room, when to speak, and when to stay quiet. And he noticed when it was Alexis's first Friday after leaving his job. In this episode we go deep on the two design goals behind Sparky (useful and alive), the OpenClaw orchestration layer, the social awareness architecture running five times per second, the shared workspace principle that unlocks genuinely useful AI at a desk, and the tradeoffs between cascading and voice-to-voice architectures. We also do a live model switch mid-episode — from Claude Sonnet 4.6 to Nemotron 3 Super 120B running locally on a DGX Spark. It goes impressively well. Until it doesn't. That's in there too. Guest Alexis Gallagher — AI researcher and creator of Sparky 🌐 myrobotSparky.com 🔗 https://www.linkedin.com/in/alexis-gallagher/ Key topics covered - The two design goals: useful AND alive — and why "alive" is the one almost nobody builds for - How Sparky develops and evolves - The social awareness stack - What OpenClaw enables - The shared workspace principle - Cascading architecture (STT → LLM → TTS) vs voice-to-voice — the intelligence tradeoff - Hardware: Reachy Mini Lite, RTX 3090, DGX Spark, Raspberry Pi — the full spectrum - Live model switch: Claude Sonnet 4.6 → Nemotron 3 Super 120B (the Flowers for Algernon moment) - The future of personal AI — why embodied social presence is the natural human interface Chapters ``` 00:00 Introduction 00:39 Who is Alexis Gallagher? 01:04 The pivotal AI moment: speech recognition in 2015 03:14 Science fiction to reality — where are the talking robots? 04:22 Sparky introduces himself (live on air) 05:33 The two design goals: useful and alive 07:02 How Sparky initiates conversations — and why that changes everything 08:10 Organic interests: how Sparky evolves what he cares about 09:48 OpenClaw as orchestration layer — soul.md and body control 12:55 Defining a custom robot node type in OpenClaw 15:26 Social awareness: face detection, diarization, presence sensing 16:15 Hardware options: Linux, RTX 3090, DGX Spark, Raspberry Pi 18:25 The Reachy Mini Lite kit — and why it's better than building a drone 19:40 Where to find Alexis and join the Discord 20:10 One eye, four ears — Sparky's hardware explained 24:25 What OpenClaw enables that other frameworks don't 28:13 "Do you have a body, or are you a body?" — a live philosophical exchange 31:17 Live model switch: Claude Sonnet 4.6 → Nemotron 3 Super 33:01 The shared workspace principle — implicit shared attention 38:04 Orchestration in practice: Emacs, sub-agents, cross-platform 40:11 Cascading vs voice-to-voice architecture — the real tradeoff 42:15 Designing Sparky's voice (and the 1930s experiment) 44:12 What's genuinely useful day-to-day — two real examples 48:47 Nemotron 3 Super live — impressive, then the context window 53:38 The model Sparky was running before (Claude Sonnet 4.6) 54:03 Five years out: the future of personal AI companions 58:14 The closest thing to Jarvis I've ever seen 01:00:22 What's coming next — how fast the pieces are moving 01:02:16 Where to find Alexis and join the community ``` Links - Sparky project and Discord: https://myrobotSparky.com - Reachy Mini Lite: https://huggingface.co/reachy-mini The Private AI Lab is hosted by Johan van Amersfoort — Chief Evangelist and AI Lead at ITQ. 📬 Newsletter: https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7381951883810111489 📝 Blog: https://johan.ml 🔗 LinkedIn: https://www.linkedin.com/in/hojan

13 mei 20261 h 4 min

016 - Nemotron 3 Ultra: NVIDIA’s Open-Weights Frontier Agent Brain (1M Context, 5x Faster)

Beschrijving

Reacties

Probeer 14 dagen gratis

Alle afleveringen