AI Post Transformers
This episode explores the Titans paper’s proposal to pair standard attention with a separate learned long-term memory that updates during inference, aiming to preserve distant information without paying full quadratic attention costs across very long sequences. It situates that idea against earlier approaches such as Neural Turing Machines, Transformer-XL, Compressive Transformers, Memorizing Transformers, and linear-attention recurrent models, highlighting the recurring tradeoff between precise recall and scalable memory. The discussion focuses on the paper’s most distinctive claim: memory writes are driven by a loss-based notion of surprise, making test-time memory updates look more like small online learning steps than a simple cache. Listeners would find it interesting because it gets at a central open question in modern AI systems design: whether neural networks can gain durable, useful memory at inference time without becoming too unstable, expensive, or operationally awkward to deploy. Sources: 1. Titans: Learning to Memorize at Test Time https://arxiv.org/pdf/2501.00663 2. Neural Turing Machines — Alex Graves, Greg Wayne, Ivo Danihelka, 2014 https://arxiv.org/abs/1410.5401 3. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context — Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov, 2019 https://arxiv.org/abs/1901.02860 4. Compressive Transformers for Long-Range Sequence Modelling — Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Chloe Hillier, Timothy P. Lillicrap, 2020 https://openreview.net/forum?id=SylKikSYDH 5. Memorizing Transformers — Yuhuai Wu, Markus N. Rabe, DeLesley Hutchins, Christian Szegedy, 2022 https://arxiv.org/abs/2203.08913 6. Learning to (learn at test time): RNNs with expressive hidden states — Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, and Sanmi Koyejo, 2024 https://scholar.google.com/scholar?q=Learning+to+(learn+at+test+time):+RNNs+with+expressive+hidden+states 7. Gated Delta Networks: Improving Mamba2 with Delta Rule — Songlin Yang, Jan Kautz, and Ali Hatamizadeh, 2024 https://scholar.google.com/scholar?q=Gated+Delta+Networks:+Improving+Mamba2+with+Delta+Rule 8. RULER: What's the Real Context Size of Your Long-Context Language Models? — Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, and Boris Ginsburg, 2024 https://scholar.google.com/scholar?q=RULER:+What's+the+Real+Context+Size+of+Your+Long-Context+Language+Models? 9. BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack — Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Ivan Rodkin, Dmitry Sorokin, Artyom Sorokin, and Mikhail Burtsev, 2024 https://scholar.google.com/scholar?q=BABILong:+Testing+the+Limits+of+LLMs+with+Long+Context+Reasoning-in-a-Haystack 10. ATLAS: Learning to Optimally Memorize the Context at Test Time — Ali Behrouz, Zeman Li, Praneeth Kacham, Majid Daliri, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, and Vahab Mirrokni, 2025 https://scholar.google.com/scholar?q=ATLAS:+Learning+to+Optimally+Memorize+the+Context+at+Test+Time 11. KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference — Alireza Nadali, Patrick Cooper, Ashutosh Trivedi, Alvaro Velasquez, 2026 https://scholar.google.com/scholar?q=KV-Fold:+One-Step+KV-Cache+Recurrence+for+Long-Context+Inference 12. SCBench: A KV Cache-Centric Analysis of Long-Context Methods — Yucheng Li, Huiqiang Jiang, Qianhui Wu, Xufang Luo, Surin Ahn, Chengruidong Zhang, Amir H. Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, Lili Qiu, 2024/2025 https://scholar.google.com/scholar?q=SCBench:+A+KV+Cache-Centric+Analysis+of+Long-Context+Methods 13. Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling — Liliang Ren, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, Weizhu Chen, 2024 https://scholar.google.com/scholar?q=Samba:+Simple+Hybrid+State+Space+Models+for+Efficient+Unlimited+Context+Language+Modeling 14. Longhorn: State Space Models are Amortized Online Learners — Bo Liu, Rui Wang, Lemeng Wu, Yihao Feng, Peter Stone, Qiang Liu, 2024 https://scholar.google.com/scholar?q=Longhorn:+State+Space+Models+are+Amortized+Online+Learners 15. Retrieval meets Long Context Large Language Models — Peng Xu, Wei Ping, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan Liu, Sandeep Subramanian, Evelina Bakhturina, Mohammad Shoeybi, Bryan Catanzaro, 2023 https://scholar.google.com/scholar?q=Retrieval+meets+Long+Context+Large+Language+Models 16. Augmenting Language Models with Long-Term Memory — Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei, 2023 https://scholar.google.com/scholar?q=Augmenting+Language+Models+with+Long-Term+Memory 17. Test-Time Training Provably Improves Transformers as In-Context Learners — Halil Alperen Gozeten, M. Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, Samet Oymak, 2025 https://scholar.google.com/scholar?q=Test-Time+Training+Provably+Improves+Transformers+as+In-Context+Learners 18. AI Post Transformers: δ-mem and Online Memory for LLMs — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-13-d-mem-and-online-memory-for-llms-6622fa.mp3 19. AI Post Transformers: In-Place Test-Time Training for Transformers — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-09-in-place-test-time-training-for-transfor-d0b976.mp3 20. AI Post Transformers: Gated Linear Attention for Efficient Long Sequences — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-18-gated-linear-attention-for-efficient-lon-c858ab.mp3 21. AI Post Transformers: MELT: Decoupling Compute From Memory — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-13-melt-decoupling-compute-from-memory-26430c.mp3 22. AI Post Transformers: Long Context Pre-Training with Lighthouse Attention — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-13-long-context-pre-training-with-lighthous-e85bbe.mp3 23. AI Post Transformers: Training Million-Token LLMs Beyond the Memory Barrier — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-04-training-million-token-llms-beyond-the-m-324edc.mp3 24. AI Post Transformers: Recursive Language Models for Arbitrarily Long Prompts — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-recursive-language-models-for-arbitraril-fbcd1c.mp3 25. AI Post Transformers: How Induction Heads Emerge in Transformers — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-03-how-induction-heads-emerge-in-transforme-a7bfcb.mp3 26. AI Post Transformers: DeepSeek-V4 and Practical Million-Token Context — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-25-deepseek-v4-and-practical-million-token-6f4de1.mp3 Interactive Visualization: Titans: Learning to Memorize at Test Time [https://podcast.do-not-panic.com/viz/2026-05-20-titans-learning-to-memorize-at-test-time-054662.html]
658 episoder
Kommentarer
0Vær den første til at kommentere
Tilmeld dig nu og bliv en del af AI Post Transformers-fællesskabet!