LLMs Research Podcast
State space models like Mamba promised linear scaling and constant memory. They delivered on efficiency, but researchers kept hitting the same wall: ask Mamba to recall something specific from early in a long context, and performance drops. Three papers at ICLR 2026 independently attacked this limitation. That convergence tells you how fundamental the problem is. This podcast breaks down: - Why Mamba's fixed-size state causes "lossy compression" of context - How Mixture of Memories (MoM) adds multiple internal memory banks - How Log-Linear Attention finds a middle ground between SSM and full attention - Why one paper proves SSMs fundamentally can't solve certain tasks without external tools The pattern across all three: you can add more state, but you have to pay somewhere. Parameters, mechanism complexity, or system infrastructure. No free lunch. 📄 Papers covered: - MoM: Linear Sequence Modeling with Mixture-of-Memories https://arxiv.org/abs/2502.13685 [https://arxiv.org/abs/2502.13685] - Log-Linear Attention https://openreview.net/forum?id=mOJgZWkXKW [https://openreview.net/forum?id=mOJgZWkXKW] - To Infinity and Beyond: Tool-Use Unlocks Length Generalization in SSMs https://openreview.net/forum?id=sSfep4udCb [https://openreview.net/forum?id=sSfep4udCb] 📬 Newsletter: https://llmsresearch.substack.com [https://llmsresearch.substack.com] 🐦 Twitter/X: https://x.com/llmsresearch [https://x.com/llmsresearch] 💻 GitHub: https://github.com/llmsresearch [https://github.com/llmsresearch] #Mamba #SSM #StateSpaceModels #ICLR2026 #LLM #MachineLearning #AIResearch #Transformers #DeepLearningChapters timestamp0:00 Mamba's secret weakness 0:42 The promise: linear scaling, constant memory 1:14 The catch: forgetting specific details 1:34 Memory bottleneck explained 1:43 Attention = perfect recall filing cabinet 2:10 SSM = single notepad with fixed pages 2:49 The core tradeoff 2:57 Three solutions to fix it 3:00 Solution 1: Mixture of Memories (MoM) 3:51 Solution 2: Log-Linear Attention 4:48 Solution 3: External tool use 5:49 The "no free lunch" pattern 6:41 What wins for longer contexts? 7:04 Subscribe for more research deep dives This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit llmsresearch.substack.com [https://llmsresearch.substack.com?utm_medium=podcast&utm_campaign=CTA_1]
10 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de LLMs Research Podcast!