Models & Agents

Ep 52: NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8.

7 min · 18 mei 2026
aflevering Ep 52: NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8. cover

Beschrijving

Models & Agents NVIDIA's 4-bit pretraining technique cuts memory needs for large hybrid models while keeping accuracy nearly identical to FP8. What You Need to Know: NVIDIA released a full 4-bit pretraining stack (NVFP4) that was validated on a 12B Mamba-Transformer trained for 10 trillion tokens. The approach combines selective BF16 layers, Hadamard transforms, and stochastic rounding to stay within 0.04 points of an FP8 baseline on MMLU-Pro. ... AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.

Reacties

0

Wees de eerste die een reactie plaatst

Meld je nu aan en word lid van de Models & Agents community!

Begin hier

2 maanden voor € 1

Daarna € 9,99 / maand · Elk moment opzegbaar.

  • Podcasts die je alleen op Podimo hoort
  • 20 uur luisterboeken / maand
  • Gratis podcasts

Alle afleveringen

61 afleveringen