The Principles of Diffusion Models - with Jesse Lai (Sony AI)

Descripción

We host Chieh-Hsin (Jesse) Lai, Staff Research Scientist at Sony AI and visiting professor at National Yang Ming Chiao Tung University, Taiwan, for a conversation about diffusion models, the technology behind tools like Stable Diffusion, and most of the AI image and video generators you've seen in the last few years. Jesse recently co-authored The Principles of Diffusion Models with Stefano Ermon, and the book is quickly becoming a go-to reference in the field. We start with what a generative model actually is, and what it means to "generate" an image or a sound. Jesse explains the core idea behind diffusion in plain terms. You start with pure noise, and a neural network gradually cleans it up, step by step, until a realistic image emerges. From there, we talk about why diffusion has come to dominate so much of generative AI. Because the model builds an image gradually, you can guide it along the way, nudging the output toward what you actually want, refining details, or combining it with other controls. We also discuss the common critique that diffusion is slow and how the field has largely addressed it through new techniques. We zoom out to the bigger picture, too. Jesse shares his view on world models and whether diffusion is the right foundation for them. We talk about what makes a generative model genuinely good versus just good at gaming benchmarks, and why evaluating creativity and realism is so much harder than scoring a multiple-choice test. ---------------------------------------- Timeline 00:12 — Intro and welcoming Jesse 00:47 — Why Jesse wrote the book, and who it's for 03:29 — The three families of diffusion models, and why they're really one idea 05:14 — What makes a good generative model 07:39 — How do you even measure if a generated image is good 08:59 — Why diffusion beats autoregressive models for images 10:33 — Is diffusion still slow? How fast generation got fast 11:12 — A simple intuition for what a "score" is 14:12 — How the different flavors of diffusion connect under the hood 14:42 — Diffusion for text and proteins 17:12 — Consistency models and the push for one-step generation 22:12 — Diffusion for world models: simulating reality in real time 26:12 — Do world models need to understand language 35:12 — Is diffusion the right tool, or just a convenient one 38:12 — What benchmarks actually tell us, and what they miss 46:12 — Closing thoughts and where to find the book ---------------------------------------- Music: * "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0. * "Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0. * Changes: trimmed ---------------------------------------- About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

Inside xAI, and the Bet on AI Math - with Christian Szegedy (Math Inc)

We talked with Christian Szegedy, co-inventor of Inception and Batch Normalization, founding scientist at xAI, now at Math Inc, about what it takes to build a frontier lab, and why he left xAI to work on formal mathematics. Christian thinks Lean and auto-formalization are the missing piece for trustworthy AI: a machine-checkable layer underneath all reasoning, where proofs are guaranteed correct without anyone having to read them. We got into his bet with François Chollet that AI will hit superhuman mathematician level by 2026, and what that actually unlocks beyond math itself: verified software instead of vibe-coded apps that break when you refactor, AI systems you can actually trust because their reasoning is checkable, and a path to handling protein folding, chemistry, and parts of biology with real guarantees instead of hand-waving. Christian also walked us through how Math Inc's Gauss system pulled off a proof in two weeks that human experts had estimated would take another year. We also covered xAI's first 12-person year, why Christian no longer buys the original batch normalization story, why he's sure transformers won't be the dominant architecture in five years, what mathematicians do in a world of cheap proofs, and his take on whether humanity will handle AI well. He distrusts humanity more than he distrusts AI. ---------------------------------------- TIMELINE 00:12 — Intros: Christian's background (Inception, Batch Norm, xAI, Math Inc) 01:29 — Building a frontier lab from scratch: the first 12 people at xAI 04:15 — Hiring for proven track records when 200K GPUs are at stake 06:07 — Elon's "dependency graph" and balancing long-term vision with investor demos 07:28 — Gauss formalizes the strong prime number theorem in 2 weeks 12:25 — What "formalization" actually means (and why it's not what most people think) 14:39 — Why Lean gives 100% certainty and why that matters for RL 15:26 — ProofBridge and joint embeddings across mathematical subfields 18:07 — Does math formalization transfer to coding and other fields? 21:44 — Can every domain be mathematized? 23:14 — Verified software, chip design, and why vibe-coded apps are dangerous 26:35 — Scaling Mathlib by 100–1000x 28:27 — Artisan formalizers vs. invisible machine-language formalists 33:26 — Can verification generalize? 45:19 — Revisiting Batch Norm: covariate shift, loss landscape, and what really happens 48:22 — Is normalization even necessary? 50:10 — What's actually fundamental in modern AI architectures 51:41 — Why Christian thinks transformers won't last 5 years 52:38 — The 2026 superhuman AI mathematician bet 55:15 — What's missing: better verification + a much larger formalized math repository 56:13 — Lean vs. Coq vs. HOL Light - does the proof assistant actually matter? 59:26 — The role of mathematicians in 5–10 years 1:02:00 — A human element to mathematics: Newton, Leibniz, and competitive proving 1:03:25 — The telescope analogy: AI as the instrument that lets us see the math universe 1:05:19 — Job apocalypse or Jevons paradox? 1:08:41 — Advice for students 1:09:50 — Can we formally verify AI alignment? 1:11:52 — Closing thanks ---------------------------------------- Music: * "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0. * "Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0. * Changes: trimmed ---------------------------------------- About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

4 de may de 20261 h 12 min

The Principles of Diffusion Models - with Jesse Lai (Sony AI)

Descripción

Comentarios

2 meses por 1 €

Todos los episodios