Trajectory Summaries for Long-Horizon Coding Agents

Beschrijving

This episode explores a paper on inference-time scaling for coding agents, asking whether extra test-time compute still helps when tasks are long, messy, and require multi-step tool use rather than a single code completion. It focuses on the paper’s main argument that the real bottleneck is not generating more rollout attempts, but representing prior attempts well enough to compare, select, and reuse them, with structured trajectory summaries serving as the key middle layer between raw transcripts and final patches. The discussion examines two mechanisms: a parallel “tournament” style selection method over summaries, and a sequential refinement method that conditions later attempts on distilled lessons from earlier ones. Listeners would find it interesting because the conversation connects agent performance gains to practical questions of context management, selection versus reuse, and whether the reported improvements reflect a deep scaling insight or simply better engineering around long-horizon coding workflows. Sources: 1. Scaling Test-Time Compute for Agentic Coding — Joongwon Kim, Wannan Yang, Kelvin Niu, Hongming Zhang, Yun Zhu, Eryk Helenowski, Ruan Silva, Zhengxing Chen, Srinivasan Iyer, Manzil Zaheer, Daniel Fried, Hannaneh Hajishirzi, Sanjeev Arora, Gabriel Synnaeve, Ruslan Salakhutdinov, Anirudh Goyal, 2026 http://arxiv.org/abs/2604.16529 2. ReAct: Synergizing Reasoning and Acting in Language Models — Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao, 2023 https://scholar.google.com/scholar?q=ReAct:+Synergizing+Reasoning+and+Acting+in+Language+Models 3. Reflexion: Language Agents with Verbal Reinforcement Learning — Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao, 2023 https://scholar.google.com/scholar?q=Reflexion:+Language+Agents+with+Verbal+Reinforcement+Learning 4. ExpeL: LLM Agents Are Experiential Learners — Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, Gao Huang, 2023 https://scholar.google.com/scholar?q=ExpeL:+LLM+Agents+Are+Experiential+Learners 5. Rethinking Thinking Tokens: LLMs as Improvement Operators — Lovish Madaan, Aniket Didolkar, Suchin Gururangan, John Quan, Ruan Silva, Ruslan Salakhutdinov, Manzil Zaheer, Sanjeev Arora, Anirudh Goyal, 2025 https://scholar.google.com/scholar?q=Rethinking+Thinking+Tokens:+LLMs+as+Improvement+Operators 6. CodeMonkeys: Scaling Test-Time Compute for Software Engineering — Ryan Ehrlich, Bradley Brown, Jordan Juravsky, Ronald Clark, Christopher Re, Azalia Mirhoseini, 2025 https://scholar.google.com/scholar?q=CodeMonkeys:+Scaling+Test-Time+Compute+for+Software+Engineering 7. S*: Test Time Scaling for Code Generation — Dacheng Li, Shiyi Cao, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica, 2025 https://scholar.google.com/scholar?q=S*:+Test+Time+Scaling+for+Code+Generation 8. Scaling Test-time Compute for LLM Agents — King Zhu, Hanhao Li, Siwei Wu, Tianshun Xing, Dehua Ma, Xiangru Tang, Minghao Liu, Jian Yang, Jiaheng Liu, Yuchen Eleanor Jiang, Changwang Zhang, Chenghua Lin, Jun Wang, Ge Zhang, Wangchunshu Zhou, 2025 https://scholar.google.com/scholar?q=Scaling+Test-time+Compute+for+LLM+Agents 9. Agentic Test-Time Scaling for WebAgents — Nicholas Lee, Lutfi Eren Erdogan, Chris Joseph John, Surya Krishnapillai, Michael W. Mahoney, Kurt Keutzer, Amir Gholami, 2026 https://scholar.google.com/scholar?q=Agentic+Test-Time+Scaling+for+WebAgents 10. Does SWE-Bench-Verified Test Agent Ability or Model Memory? — Thanosan Prathifkumar, Noble Saji Mathews, Meiyappan Nagappan, 2025 https://scholar.google.com/scholar?q=Does+SWE-Bench-Verified+Test+Agent+Ability+or+Model+Memory? 11. A Benchmark for Procedural Memory Retrieval in Language Agents — Ishant Kohar, Aswanth Krishnan, 2025 https://scholar.google.com/scholar?q=A+Benchmark+for+Procedural+Memory+Retrieval+in+Language+Agents 12. PROCED-MEM: Benchmarking Procedural Memory Retrieval in Language Agents Across Domains — Ishant Kohar, Aswanth Krishnan, 2026 https://scholar.google.com/scholar?q=PROCED-MEM:+Benchmarking+Procedural+Memory+Retrieval+in+Language+Agents+Across+Domains 13. G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems — Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, Shuicheng Yan, 2025 https://scholar.google.com/scholar?q=G-Memory:+Tracing+Hierarchical+Memory+for+Multi-Agent+Systems 14. Scaling Agentic Verifier for Competitive Coding — Zeyao Ma et al., 2026 https://scholar.google.com/scholar?q=Scaling+Agentic+Verifier+for+Competitive+Coding 15. AgentPro: Enhancing LLM Agents with Automated Process Supervision — Yuchen Deng, Shichen Fan, Naibo Wang, Xinkui Zhao, See-Kiong Ng, 2025 https://scholar.google.com/scholar?q=AgentPro:+Enhancing+LLM+Agents+with+Automated+Process+Supervision 16. Recursive Introspection: Teaching Language Model Agents How to Self-Improve — Yuxiao Qu, Tianjun Zhang, Naman Garg, Aviral Kumar, 2024 https://scholar.google.com/scholar?q=Recursive+Introspection:+Teaching+Language+Model+Agents+How+to+Self-Improve 17. Agentic Refactoring: An Empirical Study of AI Coding Agents — Kosei Horikawa, Hao Li, Yutaro Kashiwa, Bram Adams, Hajimu Iida, Ahmed E. Hassan, 2025 https://scholar.google.com/scholar?q=Agentic+Refactoring:+An+Empirical+Study+of+AI+Coding+Agents 18. AI Post Transformers: TMAS: Scaling Test-Time Compute with Multi-Agent Synergy — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-14-tmas-scaling-test-time-compute-with-mult-3abe7a.mp3 19. AI Post Transformers: Benchmarking Test-Time Scaling for General LLM Agents — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-benchmarking-test-time-scaling-for-gener-8f14f9.mp3 20. AI Post Transformers: MiA-Signature and Global Activation for Long Context — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-13-mia-signature-and-global-activation-for-5ad62f.mp3 21. AI Post Transformers: Explicit Information Transmission for Context Compression — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-explicit-information-transmission-for-co-24e3c2.mp3 Interactive Visualization: Trajectory Summaries for Long-Horizon Coding Agents [https://podcast.do-not-panic.com/viz/2026-05-24-trajectory-summaries-for-long-horizon-co-0194be.html]

CXL-GPU and Beyond Onboard Memory

This episode explores a systems paper that extends GPU memory through CXL-attached DRAM and SSDs, asking whether accelerators can reach beyond on-board HBM without the usual overhead of software-driven memory migration. It explains CXL, memory disaggregation, and the difference between local GPU memory, host-managed memory, CXL memory, and storage-backed expansion, while grounding the discussion in earlier work such as Infiniswap, DirectCXL, and Microsoft’s Pond. The conversation focuses on the paper’s main technical claim: custom GPU-side hardware, including RTL CXL controllers, multiple root ports, and latency-hiding policies, could make expanded memory tiers more usable than approaches like UVM or GPUDirect Storage. It is interesting because the speakers both highlight the engineering ambition and press on a central unresolved question: whether these ideas truly help real transformer workloads, rather than only looking good on more conventional benchmark traces. Sources: 1. CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies — Donghyun Gouk, Seungkwan Kang, Seungjun Lee, Jiseon Kim, Kyungkuk Nam, Eojin Ryu, Sangwon Lee, Dongpyung Kim, Junhyeok Jang, Hanyeoreum Bae, Myoungsoo Jung, 2025 http://arxiv.org/abs/2506.15601 2. Disaggregated Memory for Expansion and Sharing in Blade Servers — Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, Thomas F. Wenisch, 2009 https://scholar.google.com/scholar?q=Disaggregated+Memory+for+Expansion+and+Sharing+in+Blade+Servers 3. Efficient Memory Disaggregation with Infiniswap — Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, Kang G. Shin, 2017 https://scholar.google.com/scholar?q=Efficient+Memory+Disaggregation+with+Infiniswap 4. Direct Access, High-Performance Memory Disaggregation with DirectCXL — Donghyun Gouk, Sangwon Lee, Miryeong Kwon, Myoungsoo Jung, 2022 https://scholar.google.com/scholar?q=Direct+Access,+High-Performance+Memory+Disaggregation+with+DirectCXL 5. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms — Huaicheng Li, Daniel S. Berger, Lisa Hsu, Daniel Ernst, Pantea Zardoshti, Stanko Novakovic, Monish Shah, Samir Rajadnya, Scott Lee, Ishwar Agarwal, Mark D. Hill, Marcus Fontoura, Ricardo Bianchini, 2023 https://scholar.google.com/scholar?q=Pond:+CXL-Based+Memory+Pooling+Systems+for+Cloud+Platforms 6. SMT: Software-Defined Memory Tiering for Heterogeneous Computing Systems with CXL Memory Expander — K. Kim, H. Kim, J. So, W. Lee, J. Im, S. Park, J. Cho, H. Song, 2023 https://scholar.google.com/scholar?q=SMT:+Software-Defined+Memory+Tiering+for+Heterogeneous+Computing+Systems+with+CXL+Memory+Expander 7. TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory — Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit Kanaujia, Prakash Chauhan, 2023 https://scholar.google.com/scholar?q=TPP:+Transparent+Page+Placement+for+CXL-Enabled+Tiered-Memory 8. NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures — Jie Zhang, David Donofrio, John Shalf, Mahmut T. Kandemir, Myoungsoo Jung, 2015 https://scholar.google.com/scholar?q=NVMMU:+A+Non-volatile+Memory+Management+Unit+for+Heterogeneous+GPU-SSD+Architectures 9. Overcoming the Memory Wall with CXL-Enabled SSDs — Shao-Peng Yang, Minjae Kim, Sanghyun Nam, Juhyung Park, Jin-yong Choi, Eyee Hyun Nam, Eunji Lee, Sungjin Lee, Bryan S. Kim, 2023 https://scholar.google.com/scholar?q=Overcoming+the+Memory+Wall+with+CXL-Enabled+SSDs 10. NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering — Zhe Zhou, Yiqi Chen, Tao Zhang, Yang Wang, Ran Shu, Shuotao Xu, Peng Cheng, Lei Qu, Yongqiang Xiong, Jie Zhang, Guangyu Sun, 2024 https://scholar.google.com/scholar?q=NeoMem:+Hardware/Software+Co-Design+for+CXL-Native+Memory+Tiering 11. ARIADNE: Adaptive UVM Management for Efficient GPU Memory Oversubscription — approx. recent systems authors, 2024/2025 https://scholar.google.com/scholar?q=ARIADNE:+Adaptive+UVM+Management+for+Efficient+GPU+Memory+Oversubscription 12. MOST: Memory Oversubscription-Aware Scheduling for Tensor Migration on GPU Unified Storage — approx. recent systems authors, 2024/2025 https://scholar.google.com/scholar?q=MOST:+Memory+Oversubscription-Aware+Scheduling+for+Tensor+Migration+on+GPU+Unified+Storage 13. Selective memory compression for GPU memory oversubscription management — approx. recent architecture authors, 2024/2025 https://scholar.google.com/scholar?q=Selective+memory+compression+for+GPU+memory+oversubscription+management 14. Phoenix: A Refactored I/O Stack for GPU Direct Storage without Phony Buffers — approx. recent storage/systems authors, 2024/2025 https://scholar.google.com/scholar?q=Phoenix:+A+Refactored+I/O+Stack+for+GPU+Direct+Storage+without+Phony+Buffers 15. Managing Scalable Direct Storage Accesses for GPUs with GoFS — approx. recent storage/systems authors, 2024/2025 https://scholar.google.com/scholar?q=Managing+Scalable+Direct+Storage+Accesses+for+GPUs+with+GoFS 16. CCCL: Node-Spanning GPU Collectives with CXL Memory Pooling — approx. recent distributed systems authors, 2024/2025 https://scholar.google.com/scholar?q=CCCL:+Node-Spanning+GPU+Collectives+with+CXL+Memory+Pooling 17. Efficient Tensor Offloading Based on CXL Memory Pool For Extreme Scale Deep Learning — approx. recent ML systems authors, 2024/2025 https://scholar.google.com/scholar?q=Efficient+Tensor+Offloading+Based+on+CXL+Memory+Pool+For+Extreme+Scale+Deep+Learning 18. UHM: Unified Transferring and Pooling over Heterogeneous GPU Memories — approx. recent memory-systems authors, 2024/2025 https://scholar.google.com/scholar?q=UHM:+Unified+Transferring+and+Pooling+over+Heterogeneous+GPU+Memories 19. GPUVM: GPU-driven unified virtual memory — approx. recent architecture authors, 2024/2025 https://scholar.google.com/scholar?q=GPUVM:+GPU-driven+unified+virtual+memory 20. Salus: Efficient security support for cxl-expanded gpu memory — approx. recent security/systems authors, 2024/2025 https://scholar.google.com/scholar?q=Salus:+Efficient+security+support+for+cxl-expanded+gpu+memory 21. AI Post Transformers: Vistara Brings CXL Memory to Hyperscale — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-11-vistara-brings-cxl-memory-to-hyperscale-b5199e.mp3 22. AI Post Transformers: FengHuang for Rack-Scale LLM Inference Memory — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-12-fenghuang-for-rack-scale-llm-inference-m-62708e.mp3 23. AI Post Transformers: Why LLM Serving Needs Mathematical Optimization — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-why-llm-serving-needs-mathematical-optim-647fc6.mp3 24. AI Post Transformers: Affordable Large-Scale Decoding Through Model-System Co-Design — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-19-affordable-large-scale-decoding-through-e1d7ed.mp3 Interactive Visualization: CXL-GPU and Beyond Onboard Memory [https://podcast.do-not-panic.com/viz/2026-05-27-cxl-gpu-and-beyond-onboard-memory-98f5ff.html]

Gisteren1 h 0 min

Trajectory Summaries for Long-Horizon Coding Agents

Beschrijving

Reacties

2 maanden voor € 1

Alle afleveringen