Ep. 57: Kevin Wang, NeurIPS Best Paper Author and OpenAI Researcher

46 min · 28 de may de 2026

Descripción

Kevin Wang is the first author of the NeurIPS 2025 Best Paper, titled "1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities". He's currently a researcher at OpenAI, where he works on RL/reasoning. Before coming to OpenAI, Kevin studied CS at Princeton.Delta Institute (deltainstitutes.org) supports exceptional researchers and engineers, from academia to industry and beyond. They host technical events to bring great people together, a podcast that gives industry/academic leaders a platform to share their experiences, a small fellows program that builds a tight-knit community of exceptional people, and a grant program that provides compute/mentorship for research projects.Timestamps:00:00 Introduction00:26 Overview of the 1000 Layer Networks Paper00:42 Motivation and Background of the Research01:37 Self-Supervised Reinforcement Learning Paradigm04:16 Challenges and Innovations in Data Scaling06:23 Hindsight Experience Replay and Its Impact08:56 Classification vs Regression in Reinforcement Learning12:25 Training Stability and Architectural Components14:23 Key Results and Performance Gains17:23 Qualitative Behaviors and Representation Learning19:44 Scaling Depth and Batch Size23:06 Limits of Scaling in Reinforcement Learning23:55 Exploring Actor Loss and Layer Depth in Training24:51 Scaling Layers for Complex Tasks28:28 Challenges and Innovations in Deep Network Training30:36 Future Directions in Reinforcement Learning37:32 Personal Journey and Career Path

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de Delta Podcast!

Prueba gratis

Todos los episodios

60 episodios

Ep. 60: Ronak Malde, Trajectory CEO and Former DeepMind Researcher

Ronak Malde is the CEO of Trajectory (trajectory.ai), where he's working on bringing continual learning to enterprises. Before Trajectory, he worked on research at DeepMind and trained the SWE-1 model at Windsurf.Delta Institute (deltainstitutes.org) supports exceptional researchers and engineers, from academia to industry and beyond. They host technical events to bring great people together, a podcast that gives industry/academic leaders a platform to share their experiences, a small fellows program that builds a tight-knit community of exceptional people, and a grant program that provides compute/mentorship for research projects.Timestamps:00:00 Why RL Doesnt Scale00:40 Founder Story Trajectory01:16 Continual Learning Vision02:33 Product Signal As Reward03:47 Roadmap Three Stages05:02 Self Distillation Explained07:17 Building The SDK09:21 Customer Onboarding Example11:43 Control Plane Improvements12:50 Competition And Infrastructure15:03 Research Meets Product17:35 Self Serve Vs Services19:55 Leaving Labs To Found24:10 Hiring And Team Culture27:45 Scaling To Enterprises

28 de may de 202629 min

Ep. 59: Alex Shan, Judgment Labs CEO

Alex Shan is the CEO of Judgment Labs (judgmentlabs.ai), where he's working on building agent behavior monitoring infrastructure. Before Judgment, he worked at Juniper Networks and Stanford AI Lab. Delta Institute (deltainstitutes.org) supports exceptional researchers and engineers, from academia to industry and beyond. They host technical events to bring great people together, a podcast that gives industry/academic leaders a platform to share their experiences, a small fellows program that builds a tight-knit community of exceptional people, and a grant program that provides compute/mentorship for research projects.Timestamps:00:00 Mission and Evals Focus00:30 Founder Background02:55 Childhood Co-Founders04:49 Stanford to Industry Pivot07:32 Juniper Agents Experience08:55 Founding Judgment Labs11:14 Why Existing Tools Fail13:23 Deep Agent Observability Model15:56 JudgeEval Open Core Strategy18:56 Evals Advice and Pitfalls23:24 Production Grounded Evals24:12 Rubric Discovery Signals25:06 Benchmarks That Evolve26:24 Legal Redlines Case Study27:22 From Edits To Rubrics30:40 Monitoring First Strategy32:09 Self Improving Agent Loop34:12 Competitive Differentiation36:13 Deep Context Evals42:43 Future Data Intelligence45:19 Closing Thoughts

28 de may de 202645 min

Ep. 58: Andrew Dai, Elorian CEO and Former DeepMind Research Director

Andrew Dai is the co-founder and CEO of Elorian, a new visual reasoning research and product lab. Before Elorian, Andrew was a research director at DeepMind, where he was the Gemini data area co-lead, PaLM 2 pre-training lead, and GLaM MoE LLM co-lead.Delta Institute (deltainstitutes.org) supports exceptional researchers and engineers, from academia to industry and beyond. They host technical events to bring great people together, a podcast that gives industry/academic leaders a platform to share their experiences, a small fellows program that builds a tight-knit community of exceptional people, and a grant program that provides compute/mentorship for research projects.Timestamps:00:00 Introduction00:16 Google Brain Origins01:22 From GPT Roots to Gemini02:20 Why Build Elorian03:57 Anthropic vs Gemini Vision04:48 Measuring Multimodal Quality06:08 Defining Visual AGI07:57 Building the Multimodal Lab09:41 Reasoning in Visual Space11:46 What Makes Models Better15:57 Research Milestones and Benchmarks18:09 How to Think Architecturally21:55 Go To Market and Verticals27:13 Where Accuracy Matters Most29:15 Defending Against Churn30:40 Think Big Closing Thoughts

28 de may de 202632 min

Ep. 57: Kevin Wang, NeurIPS Best Paper Author and OpenAI Researcher

28 de may de 202646 min

Ep. 56: Grace Li, Design Arena Co-Creator and Arcada Labs Co-Founder

Grace Li is the co-founder of Arcada Labs (arcada.dev), creators of Design Arena, Prediction Arena, and Social Arena. Arcada's vision is to build portals that bridge AI to the real world by building real-world evaluations for things that are often hard to benchmark, like design, ability to make money in prediction markets, and ability to write posts that do well on social media. Before starting Arcada, Grace studied CS at Harvard and spent some time at Apple.Delta Institute (deltainstitutes.org) supports exceptional researchers and engineers, from academia to industry and beyond. They host technical events to bring great people together, a podcast that gives industry/academic leaders a platform to share their experiences, a small fellows program that builds a tight-knit community of exceptional people, and a grant program that provides compute/mentorship for research projects.Timestamps:00:00 Introduction00:53 Founder's Background and Journey01:45 Startup Path and Early Experiences03:53 Pivoting to Game Engine Development06:03 Scaling and Expanding Arenas09:53 Building and Measuring Model Capabilities13:14 Agent Runner and Open Source Contributions14:33 Future Vision and Customer Focus18:56 Evaluating Next JS Models: Static vs. Live Benchmarks20:20 Challenges of Crowdsourced Benchmarks21:28 Understanding User and Researcher Needs23:23 Introduction to Prediction Arena24:30 Technical Implementation and Investor Skepticism26:45 Model Performance and Future Improvements31:28 Exploring Social Arena36:08 Company Culture and Future Goals37:54 Conclusion and Final Thoughts

28 de may de 202638 min

Ep. 57: Kevin Wang, NeurIPS Best Paper Author and OpenAI Researcher

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios