A User-Centric Perspective on LLM Inference | AM Podcast #3

49 min · 31 de mar de 2026

Descripción

Woosuk Kwon is CTO of Inferact and creator of the vLLM inference library. Woosuk shares what it takes to build the most popular open-source LLM inference engine from a human-centered perspective. Outline: 0:00 - Prelude: Introducing Woosuk and Inferact 3:00 - Woosuk’s First PhD Project 6:00 - How the vLLM Project Got Started 9:18 - AI Infra Needs More Than Just Efficiency 14:08 - How AI Infra and Human-centered AI Are Connected 15:01 - How to Prioritize Feature Requests for Popular AI Infra 18:18 - Streaming Requests and Realtime API 24:05 - Multi-turn, Agentic, Proactive LLMs 27:03 - How to Design AI Infra in a Principled Way 29:13 - How to Design an AI Inference Engine for Continue Learning with RL 35:05 - Would LoRA Training Affect RL Infra Design? 37:28 - Why Start an AI Inference Infra Startup? 40:46 - What Effortless Inference with Open-source Models Means for Developers 43:46 - A Vision for On-device AI Inference 46:19- Can Today’s Coding Agents Create vLLM? References: Inferact: https://inferact.ai/ Efficient Memory Management for Large Language Model Serving with PagedAttention: https://arxiv.org/abs/2309.06180 Streaming Requests & Realtime API in vLLM: https://vllm.ai/blog/streaming-realtime RL’s Razor: Why Online Reinforcement Learning Forget Less: https://arxiv.org/abs/2509.04259 Podcast Links: Podcast website: https://augmented-mind.github.io/ Apple Podcasts: https://podcasts.apple.com/us/podcast/augmented-mind-podcast/id1868102170 Spotify: https://open.spotify.com/show/40KculkYTe2tOpqJm6TAYr?si=PU_UncsMT4mXjVNCRwoXog&nd=1&dlsi=6d9bed7a43d64085 RSS: https://anchor.fm/s/10dbf5b7c/podcast/rss About the Hosts: The AM Podcast is hosted by Yijia Shao, Shannon Shen, and Michael Ryan, CS PhD students at Stanford University and MIT.

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y forma parte de la comunidad de Augmented Mind Podcast!

Prueba gratis

Todos los episodios

5 episodios

The Privacy Layer of Personal Intelligence with Ken Liu | AM Podcast #4

Ken Liu is a Stanford CS PhD student and founder of The Open Anonymity Project. Ken’s pioneering work explores the intersection between language models and data & user privacy. Outline: 0:00 - Teaser 1:08 - Prelude: Introducing Ken Liu 1:41 - Monologue: The Open Anonymity Project 3:41 - Ken’s Path to Privacy Research 6:31 - The Biggest Privacy Concern for LLM Users 9:39 - Three Perspectives on Tackling AI Privacy 10:57 - “AI presents a Uniquely Worse Privacy Problem” 13:44 - The Open Anonymity (OA) Project: Unlinkable Inference 17:50 - Blind Signatures as Unlinkable Authentication 20:52 - Secure Inference Proxies 28:31 - Threat Model in the OA Project 31:39 - What If People Give Away Information In Their Prompts 35:58 - OpenClaw, Privacy Nightmare In Agents 43:00 - The Stories Behind the OA Project 50:14 - Intelligence Neutrality 52:22 - Safety Concerns in a World with Private AI Inference References: Ken Liu’s Home Page: https://ai.stanford.edu/~kzliu/ [https://ai.stanford.edu/~kzliu/] The Open Anonymity Project: https://openanonymity.ai/ [https://openanonymity.ai/] Unlinkable Inference as a User Privacy Architecture: https://openanonymity.ai/blog/unlinkable-inference/ [https://openanonymity.ai/blog/unlinkable-inference/] Podcast Links: Podcast website: https://augmented-mind.github.io/ [https://augmented-mind.github.io/] Apple Podcasts: https://podcasts.apple.com/us/podcast/augmented-mind-podcast/id186810217 [https://podcasts.apple.com/us/podcast/augmented-mind-podcast/id1868102170]Spotify: https://open.spotify.com/show/40KculkYTe2tOpqJm6TAYr?si=PU_UncsMT4mXjVNCRwoXog&nd=1&dlsi=6d9bed7a43d6408 [https://open.spotify.com/show/40KculkYTe2tOpqJm6TAYr?si=PU_UncsMT4mXjVNCRwoXog&nd=1&dlsi=6d9bed7a43d64085]RSS: https://anchor.fm/s/10dbf5b7c/podcast/rss [https://anchor.fm/s/10dbf5b7c/podcast/rss] About the Hosts: The AM Podcast is hosted by Yijia Shao, Shannon Shen, and Michael Ryan, CS PhD students at Stanford University and MIT.

4 de may de 202657 min

A User-Centric Perspective on LLM Inference | AM Podcast #3

31 de mar de 202649 min

Building AI Systems for Imperfect Humans with Sherry Wu | AM Podcast #2

Sherry Wu is a professor at CMU whose research sits at the intersection of human-computer interaction and natural language processing. From making AI work for imperfect humans to making humans work better with AI — Sherry's work challenges us to rethink both sides of the equation. Outline: * 0:00 - Teaser * 1:13 - Prelude: Introducing Sherry Wu * 2:30 - How the AI Field Has Changed in the Last Four Years * 4:22 - Making AI Systems Work for Imperfect Humans * 6:54 - Models vs. Scaffolding * 10:36 - Understanding Human Imperfection in Teaching Contexts * 19:28 - AI Literacy Skills * 22:04 - How AI Is Changing CS Education * 25:38 - Suppose We Have AGI, What Does It Mean to Be Human? * 29:14 - Training Models to be More Human-centered * 31:46 - Checklists Are Better Than Reward Models https://arxiv.org/abs/2507.18624 * 36:56 - Challenge in Aligning Models * 43:22 - Advice for Interdisciplinary Research * 45:37 - Reflection on Her Own Research References: * Sherry Wu’s Research Homepage: https://www.cs.cmu.edu/~sherryw/ [https://www.cs.cmu.edu/~sherryw/] * Sherry Wu’s course page (PMDS, Spring 2025): https://www.cs.cmu.edu/~sherryw/courses/2025s-pmds.html * AI Fluency Index: https://www.anthropic.com/research/AI-fluency-index * Checklists Are Better Than Reward Models: https://arxiv.org/abs/2507.18624 * Not Everyone Wins with LLMs: https://arxiv.org/pdf/2509.21890 Podcast Links: * Podcast website: https://augmented-mind.github.io/ * Apple Podcasts: https://podcasts.apple.com/us/podcast/augmented-mind-podcast/id1868102170 * Spotify: https://open.spotify.com/show/40KculkYTe2tOpqJm6TAYr?si=PU_UncsMT4mXjVNCRwoXog * RSS: https://anchor.fm/s/10dbf5b7c/podcast/rss About the Hosts: The AM Podcast is hosted by Yijia Shao, Shannon Shen, and Michael Ryan, CS PhD students at Stanford University and MIT.

27 de feb de 202646 min

Bridging Human-AI Grounding Gaps with Omar Shaikh | AM Podcast #1

Omar Shaikh is a Stanford PhD student, HCI and NLP researcher, and author of the award-winning UIST 2025 paper “Creating General User Models from Computer Use”. Omar’s pioneering work aims to bridge the Human-AI grounding gap. Outline: 0:00 - Teaser 1:21 - Prelude: Introducing Omar Shaikh 2:07 - Monologue: Better Context for AI 4:22 - Bridging the Human-AI Grounding Gap 6:14 - Confidence scores in General User Models (GUMs) 7:32 - Calibration of General User Models 13:20 - Uses of General User Models 15:01 - Mixed Initiative Interactions 22:10 - Motivation for GUM 25:31 - Tabracadabra: tab everywhere! 27:01 - Design decisions in GUM 28:26 - Designing Interactive Experiences 32:11 - DITTO: https://arxiv.org/abs/2406.00888 33:06 - Work on Domains without Existing Benchmarks 34:45 - Challenges of the GUM Project 37:26 - Privacy and Data Ownership 38:57 - Finetuning a User Model 44:09 - Mindblowing GUM Inferences 49:02 - Social Problems of GUMs 50:27 - GUM as a Reflection Tool References: * Omar Shaikh’s research homepage: https://oshaikh.com/ * Creating General User Models from Computer Use: https://arxiv.org/abs/2505.10831 [https://arxiv.org/abs/2505.10831] * Tabracadabra: https://x.com/oshaikh13/status/1967626897837494479 [https://x.com/oshaikh13/status/1967626897837494479?s=20] * Aligning Language Models with Demonstrated Feedback: https://arxiv.org/abs/2406.00888 [https://arxiv.org/abs/2406.00888] * Principles of Mixed-Initiative User Interfaces: https://erichorvitz.com/chi99horvitz.pdf [https://erichorvitz.com/chi99horvitz.pdf] * Verification of Forecasts Expressed in Terms of Probability: https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml [https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml] Podcast Links: * Podcast website: https://augmented-mind.github.io/ * Apple Podcasts: https://podcasts.apple.com/us/podcast/augmented-mind-podcast/id1868102170 * Spotify: https://open.spotify.com/show/40KculkYTe2tOpqJm6TAYr?si=PU_UncsMT4mXjVNCRwoXog * RSS: https://anchor.fm/s/10dbf5b7c/podcast/rss About the Hosts: The AM Podcast is hosted by Yijia Shao, Shannon Shen, and Michael Ryan, CS PhD students at Stanford University and MIT.

23 de ene de 202652 min

Introducing The Augmented Mind: A Podcast for Technical Human-centered AI

Introducing The Augmented Mind Podcast (The AM Podcast). We explore techniques for building AI models that collaborate with people and augment human intelligence. In Episode 0, we share who we are, why we started this podcast, and what we're looking forward to. Outline: * 0:00 - Prelude: the problems we care about * 1:48 - Host introduction * 2:03 - Why we started the AM Podcast * 2:31 - Hot takes on human-centered AI * 2:45 - Hot take #1: learning on outcome rewards over long horizons will directly solve human-agent collaboration * 3:00 - The Bitter Lesson * 3:53 - How to define rewards is a human problem * 4:50 - Empathetic AI * 5:48 - Hot take #2: even with an automation-vs-augmentation view, as AI gets stronger, there will be less for us to work on * 6:09 - Creative Destruction * 7:21 - Task vs. goal * 10:45 - Format of our podcast * 11:28 - Unique technical challenges in human-centered AI * 11:43 - Example #1: human variation * 13:58 - Example #2: revolution of annotation and data collection * 15:10 - Example #3: making sense of noisy data * 16:45 - Let the journey begin! External Clips Referenced: * Eric Horvitz; 1:02:38 - 1:03:07 https://www.youtube.com/watch?v=ddjNTxtyEnw [https://www.youtube.com/watch?v=ddjNTxtyEnw**] * Fei-Fei Li ; 12:40 - 12:58 https://www.youtube.com/watch?v=be0gLzeBX5w [https://www.youtube.com/watch?v=be0gLzeBX5w] Podcast Links: * Podcast website: https://augmented-mind.github.io/ * Apple Podcasts: https://podcasts.apple.com/us/podcast/augmented-mind-podcast/id1868102170 * Spotify: https://open.spotify.com/show/40KculkYTe2tOpqJm6TAYr?si=PU_UncsMT4mXjVNCRwoXog * RSS: https://anchor.fm/s/10dbf5b7c/podcast/rss About the Hosts: The AM Podcast is hosted by Yijia Shao, Shannon Shen, and Michael Ryan, CS PhD students at Stanford University and MIT.

21 de ene de 202617 min

A User-Centric Perspective on LLM Inference | AM Podcast #3

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios