Ep 52: NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8.

7 min · 18 de may de 2026

portada del episodio Ep 52: NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8.

Descripción

Models & Agents NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8. What You Need to Know: NVIDIA released a full 4-bit pretraining stack (NVFP4) that was validated on a 12B Mamba-Transformer trained for 10 trillion tokens. The approach combines selective BF16 layers, Hadamard transforms, and stochastic rounding to stay within 0.04 points of an FP8 baseline on MMLU-Pro. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y forma parte de la comunidad de Models & Agents!

Prueba gratis

Todos los episodios

60 episodios

Ep 60: Local builders can now treat markdown skill files as optimizable parameters with automated validation gates instead of manual tweaking.

Models & Agents Local builders can now treat markdown skill files as optimizable parameters with automated validation gates instead of manual tweaking. What You Need to Know: A new paper formalizes SkillOpt, using frontier models to propose bounded edits to markdown skills and accepting only those that improve a held-out validation set. Qwen3.5 and Qwen3.6 receive new uncensored and diffusion variants with detailed training notes for consumer hardware. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

Ayer11 min

Ep 59: Datasette's new slash-key jump menu now launches agent conversations directly from your databases.

Models & Agents Datasette's new slash-key jump menu now launches agent conversations directly from your databases. What You Need to Know: Simon Willison shipped Datasette 1.0a30 with a keyboard-driven "jump to" menu that plugins can extend, plus a datasette-agent plugin that adds a conversation starter form. NuExtract3, a new 4B vision-language model, arrived on Hugging Face for structured extraction and Markdown conversion from documents. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

25 de may de 202610 min

Ep 58: Looking back at 6 episodes from 2026-05-18 to 2026-05-24 — the stories that mattered, what we learned, and what to watch next.

Models & Agents — Weekly Recap Looking back at 6 episodes from 2026-05-18 to 2026-05-24 — the stories that mattered, what we learned, and what to watch next. This Week's Top Stories From Ep 52 (2026-05-18): What You Need to Know: What You Need to Know: NVIDIA released a full 4-bit pretraining stack (NVFP4) that was validated on a 12B Mamba-Transformer trained for 10 trillion tokens. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

24 de may de 20267 min

Ep 57: OpenAI just added goal mode and screen-aware context to Codex, letting agents work autonomously for hours on real tasks.

Models & Agents OpenAI just added goal mode and screen-aware context to Codex, letting agents work autonomously for hours on real tasks. What You Need to Know: OpenAI rolled out Goal mode, Appshots, and advanced annotation in Codex across app, IDE, and CLI. Anthropic reported finding over 10,000 high-severity vulnerabilities through Project Glasswing using Claude models. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

23 de may de 202611 min

Ep 56: OpenAI just gave Codex the ability to control locked Macs and run multi-day goals, turning it into a true background agent you can launch from your phone.

Models & Agents OpenAI just gave Codex the ability to control locked Macs and run multi-day goals, turning it into a true background agent you can launch from your phone. What You Need to Know: OpenAI shipped several Codex updates today including secure computer use on locked Macs, Goal mode for hours-long autonomous work, and advanced annotation tools. Microsoft released Fara1.5, a family of browser agents that beat OpenAI Operator and Gemini 2.5 on web tasks. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

22 de may de 202610 min

Ep 52: NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8.

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios