Jürgen Schmidhuber - World Models, RL, and the Year that changed AI (Part 1)

Beskrivelse

In this episode, we host Jürgen Schmidhuber - the man, the legend, one of the godfathers of modern AI. His lab worked out many ideas behind today’s systems (LSTM, world models, artificial curiosity, Transformer variants, and even GAN-style setups) decades before they became fashionable, and he’s just as well known for making sure people remember who did what first. This is the first of two conversations with him. We go back to his lab in the early 90s and ask how one small group came up with so many of the ideas that are now being scaled to a thousand billion dollars, back when compute was ten million times more expensive. A lot of the episode comes down to one distinction he keeps making: prediction vs. decision-making. His take is that LLMs are very good prediction machines that imitate the web, but that’s only half the problem. To actually act in the world, you need a controller that uses a world model to plan. He talks about his 1990 work on world models and artificial curiosity, where the controller gets rewarded for running experiments that improve its own model (an adversarial setup years before GANs), why planning millisecond by millisecond doesn’t scale, and why you need sub-goals instead. We also talk about compression as the core of understanding, from falling apples to Kepler to Einstein, and why we still don’t have a robot that can do what a plumber does, even though the AI behind the screen keeps getting better. Then the conversation moves to credit assignment: how “to Schmidhuber” became a verb, what he thinks is broken about the award system, and a long exchange on PMAX vs. JEPA. He ends on the real origins of deep learning and a prediction about self-replicating machines in space. ---------------------------------------- Timeline 00:00 Intro 00:55 1991 in Munich, and why that lab mattered 02:38 "I'm not very smart" and why compute getting 10× cheaper every 5 years changed everything 04:25 Chess as an AI proxy 08:27 Artificial curiosity in the 90s vs. today's RL exploration 09:10 Why RL is harder than supervised learning 20:48 Coding agents vs. robots, and how a baby learns its own hands 26:20 Compression as understanding 33:40 What's actually missing on the road to AGI 37:30 Why millisecond-by-millisecond planning is stupid 47:44 Convergence to LLMs, GPUs, and how far we still are from the Bremermann limit 51:49 Unsupervised learning, factorial codes, and predictability minimization 58:12 Credit assignment: the fights with LeCun and the Nobel critique 1:02:13 On his last name becoming a verb 1:05:17 The award system's missing peer review 1:07:03 Closed labs and the decline of open research 1:13:23 Audience questions 1:34:02 Closing: who really invented deep learning? ---------------------------------------- Music: * "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0. * "Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0. * Changes: trimmed ---------------------------------------- About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

Language, Cognition, and the Limits of LLMs - with Tal Linzen (NYU/Google)

We host Tal Linzen, Associate Professor at NYU and Research Scientist at Google, for a conversation on the intersection of cognitive science and large language models. We discussed why children can learn language from around 100 million words while LLMs need trillions, and the surprising finding that as models get better at predicting the next word, they become worse models of how humans actually process language. Tal walked us through how his lab uses eye-tracking and reading-time data to compare model behavior to human behavior, and what that reveals about prediction, working memory, and the limits of current architectures. We also got into nature versus nurture and how inductive biases can be instilled by pre-training on synthetic languages, world models and whether transformers actually use the geometric structure they encode, the BabyLM challenge and data-efficient language learning, and what mechanistic interpretability can offer cognitive science beyond just fixing model bugs. The conversation closed on academia versus industry, the role of PhDs in the current AI moment, and how AI coding tools are changing the way Tal teaches and evaluates students at NYU. ---------------------------------------- Timeline * 00:13 — Intro and what cognitive science means * 02:16 — Using computational simulations to understand how humans learn language * 05:26 — How children learn language vs. how LLMs are pre-trained * 07:53 — Why mainstream LLMs are not good models of humans * 10:07 — Comparing humans and models with eye-tracking and reading behavior * 13:52 — Sensory modalities, smell, and how much you can learn from language alone * 16:03 — Animal cognition and decoding animal communication * 17:00 — Nature vs. nurture, inductive biases, and what transformers can and can't learn * 21:21 — Instilling inductive biases through synthetic languages * 27:34 — The bouba/kiki effect and cross-linguistic sound symbolism * 28:33 — Latent causal structure in language and whether models discover it * 31:13 — Does knowing linguistics help build better models? * 35:07 — World models: what they mean, and why transformers encode geometry but don't use it * 39:13 — Tokenization, and why Tal doesn't like it * 41:35 — Scaling laws and the inverse-U curve of model quality vs. human fit * 44:34 — Where the human–model mismatch comes from: architecture, memory, and data * 47:08 — Diffusion language models and sentence planning * 48:21 — Data quality, synthetic data, and curriculum effects * 50:54 — Comparing models at different training stages to human development; BabyLM * 54:40 — What level of the model should we actually probe? Representations vs. behavior * 1:01:04 — Mechanistic interpretability, Deep Dream, and human dreaming * 1:02:11 — Cognitive neuroscience, intracranial recordings, and working memory * 1:10:31 — Should you still do a PhD in 2026? * 1:12:31 — Will software engineers lose their jobs to AI? * 1:17:43 — Teaching in the age of coding agents: what changes in the classroom * 1:20:54 — What's next: human-like LLMs as user simulators, and recruiting ---------------------------------------- Music: * "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0. * "Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0. * Changes: trimmed ---------------------------------------- About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

17. mai 20261 h 23 min

Jürgen Schmidhuber - World Models, RL, and the Year that changed AI (Part 1)

Beskrivelse

Kommentarer

Prøv gratis i 14 dager

Alle episoder