The Private AI Lab
Johan breaks down NVIDIA’s ComputeEx 2026 announcement of Nemotron 3 Ultra 550B-A 55B, an open-weights mixture-of-experts model with 550B total parameters and 55B active, positioned as an orchestration “agent brain” for multi-step tasks behind the firewall. He reviews NVIDIA’s benchmarks versus GLM 5.1, Kimi K 2.6, and Qwen 3.5, highlighting best-in-class instruction following (82%), long-context performance (95%) with a 1M-token window, strong agent productivity (91%), and weaker coding results on TerminalBench versus Kimi. Johan emphasizes reported advantages in speed (~300 tokens/sec, ~5x faster), cost (up to ~30% cheaper on SWE-bench tests), and deployability via a unified NVFP4 checkpoint optimized for H100 and B200 GPUs, plus NemoClaw as the agent blueprint. He closes with an early-access demo comparing two agents researching Netherlands’ 2026 World Cup odds, showing Nemotron’s more granular path analysis and a 5.8% win estimate.00:00 Private AI Lab Intro01:19 Nemotron Ultra Explained02:22 Agent Brain Focus03:07 Benchmark Reality Check05:14 Speed And Cost Edge06:11 Training And Precision08:02 NeMo Claw Agents08:58 World Cup Agent Demo12:22 Why This Matters13:17 Wrap Up And Links
17 afleveringen
Reacties
0Wees de eerste die een reactie plaatst
Meld je nu aan en word lid van de The Private AI Lab community!