Ep 52: NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8.

7 min · 18. mai 2026

Beskrivelse

Models & Agents NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8. What You Need to Know: NVIDIA released a full 4-bit pretraining stack (NVFP4) that was validated on a 12B Mamba-Transformer trained for 10 trillion tokens. The approach combines selective BF16 layers, Hadamard transforms, and stochastic rounding to stay within 0.04 points of an FP8 baseline on MMLU-Pro. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

Kommentarer

Vær den første til å kommentere

Registrer deg nå og bli medlem av Models & Agents sitt community!

Kom i gang

Alle episoder

61 Episoder

Ep 61: Anthropic just published concrete sandboxing patterns that let agents scale capabilities without expanding their blast radius.

Models & Agents Anthropic just published concrete sandboxing patterns that let agents scale capabilities without expanding their blast radius. What You Need to Know: Anthropic released a detailed engineering post on how they contain Claude agents through evolving access controls and sandbox limits. EAGLE 3.1 fixes attention drift in speculative decoding for more stable production inference. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

27. mai 20269 min

Ep 60: Local builders can now treat markdown skill files as optimizable parameters with automated validation gates instead of manual tweaking.

Models & Agents Local builders can now treat markdown skill files as optimizable parameters with automated validation gates instead of manual tweaking. What You Need to Know: A new paper formalizes SkillOpt, using frontier models to propose bounded edits to markdown skills and accepting only those that improve a held-out validation set. Qwen3.5 and Qwen3.6 receive new uncensored and diffusion variants with detailed training notes for consumer hardware. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

I går11 min

Ep 59: Datasette's new slash-key jump menu now launches agent conversations directly from your databases.

Models & Agents Datasette's new slash-key jump menu now launches agent conversations directly from your databases. What You Need to Know: Simon Willison shipped Datasette 1.0a30 with a keyboard-driven "jump to" menu that plugins can extend, plus a datasette-agent plugin that adds a conversation starter form. NuExtract3, a new 4B vision-language model, arrived on Hugging Face for structured extraction and Markdown conversion from documents. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

25. mai 202610 min

Ep 58: Looking back at 6 episodes from 2026-05-18 to 2026-05-24 — the stories that mattered, what we learned, and what to watch next.

Models & Agents — Weekly Recap Looking back at 6 episodes from 2026-05-18 to 2026-05-24 — the stories that mattered, what we learned, and what to watch next. This Week's Top Stories From Ep 52 (2026-05-18): What You Need to Know: What You Need to Know: NVIDIA released a full 4-bit pretraining stack (NVFP4) that was validated on a 12B Mamba-Transformer trained for 10 trillion tokens. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

24. mai 20267 min

Ep 57: OpenAI just added goal mode and screen-aware context to Codex, letting agents work autonomously for hours on real tasks.

Models & Agents OpenAI just added goal mode and screen-aware context to Codex, letting agents work autonomously for hours on real tasks. What You Need to Know: OpenAI rolled out Goal mode, Appshots, and advanced annotation in Codex across app, IDE, and CLI. Anthropic reported finding over 10,000 high-severity vulnerabilities through Project Glasswing using Claude models. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

23. mai 202611 min

Ep 52: NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8.

Beskrivelse

Kommentarer

2 Måneder for 19 kr

Alle episoder