Ep 52: NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8.

7 min · 18. maj 2026

Description

Models & Agents NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8. What You Need to Know: NVIDIA released a full 4-bit pretraining stack (NVFP4) that was validated on a 12B Mamba-Transformer trained for 10 trillion tokens. The approach combines selective BF16 layers, Hadamard transforms, and stochastic rounding to stay within 0.04 points of an FP8 baseline on MMLU-Pro. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

Comments

Be the first to comment

Get Started

All episodes

61 episodes

Ep 61: Anthropic just published concrete sandboxing patterns that let agents scale capabilities without expanding their blast radius.

Models & Agents Anthropic just published concrete sandboxing patterns that let agents scale capabilities without expanding their blast radius. What You Need to Know: Anthropic released a detailed engineering post on how they contain Claude agents through evolving access controls and sandbox limits. EAGLE 3.1 fixes attention drift in speculative decoding for more stable production inference. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

Yesterday9 min

Ep 60: Local builders can now treat markdown skill files as optimizable parameters with automated validation gates instead of manual tweaking.

Models & Agents Local builders can now treat markdown skill files as optimizable parameters with automated validation gates instead of manual tweaking. What You Need to Know: A new paper formalizes SkillOpt, using frontier models to propose bounded edits to markdown skills and accepting only those that improve a held-out validation set. Qwen3.5 and Qwen3.6 receive new uncensored and diffusion variants with detailed training notes for consumer hardware. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

26. maj 202611 min

Ep 59: Datasette's new slash-key jump menu now launches agent conversations directly from your databases.

Models & Agents Datasette's new slash-key jump menu now launches agent conversations directly from your databases. What You Need to Know: Simon Willison shipped Datasette 1.0a30 with a keyboard-driven "jump to" menu that plugins can extend, plus a datasette-agent plugin that adds a conversation starter form. NuExtract3, a new 4B vision-language model, arrived on Hugging Face for structured extraction and Markdown conversion from documents. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

25. maj 202610 min

Ep 58: Looking back at 6 episodes from 2026-05-18 to 2026-05-24 — the stories that mattered, what we learned, and what to watch next.

Models & Agents — Weekly Recap Looking back at 6 episodes from 2026-05-18 to 2026-05-24 — the stories that mattered, what we learned, and what to watch next. This Week's Top Stories From Ep 52 (2026-05-18): What You Need to Know: What You Need to Know: NVIDIA released a full 4-bit pretraining stack (NVFP4) that was validated on a 12B Mamba-Transformer trained for 10 trillion tokens. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

24. maj 20267 min

Ep 57: OpenAI just added goal mode and screen-aware context to Codex, letting agents work autonomously for hours on real tasks.

Models & Agents OpenAI just added goal mode and screen-aware context to Codex, letting agents work autonomously for hours on real tasks. What You Need to Know: OpenAI rolled out Goal mode, Appshots, and advanced annotation in Codex across app, IDE, and CLI. Anthropic reported finding over 10,000 high-severity vulnerabilities through Project Glasswing using Claude models. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

23. maj 202611 min

Ep 52: NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8.

Description

Comments

2 months for 19 kr.

All episodes