AI Odyssey

AI Agents Are Not Agents Yet

22 min · 27. juni 2026
episode AI Agents Are Not Agents Yet cover

Beskrivelse

What if today’s “AI agents” are mostly automation pipelines wearing a more ambitious label? This episode explores Critique of Agent Model, a paper that draws a sharp line between agentic systems, which look autonomous because engineers scaffold workflows around them, and agentive systems, where goals, identity, decisions, self-regulation, and learning are internal to the system itself. The authors propose a Goal-Identity-Configurator (GIC) architecture as a path toward genuine machine agency, while keeping the central safety question unavoidable: greater autonomy also makes oversight significantly more difficult. Inspired by the work of Eric Xing, Mingkai Deng, and Jinyu Hou, this episode was created using Google’s NotebookLM. Read the original paper here: https://arxiv.org/abs/2606.23991

Kommentarer

0

Vær den første til å kommentere

Registrer deg nå og bli medlem av AI Odyssey sitt community!

Prøv gratis

Prøv gratis i 14 dager

99 kr / Måned etter prøveperioden. · Avslutt når som helst.

  • Eksklusive podkaster
  • 20 timer lydbøker i måneden
  • Gratis podkaster

Alle episoder

82 Episoder

episode AI Agents Are Not Agents Yet cover

AI Agents Are Not Agents Yet

What if today’s “AI agents” are mostly automation pipelines wearing a more ambitious label? This episode explores Critique of Agent Model, a paper that draws a sharp line between agentic systems, which look autonomous because engineers scaffold workflows around them, and agentive systems, where goals, identity, decisions, self-regulation, and learning are internal to the system itself. The authors propose a Goal-Identity-Configurator (GIC) architecture as a path toward genuine machine agency, while keeping the central safety question unavoidable: greater autonomy also makes oversight significantly more difficult. Inspired by the work of Eric Xing, Mingkai Deng, and Jinyu Hou, this episode was created using Google’s NotebookLM. Read the original paper here: https://arxiv.org/abs/2606.23991

27. juni 202622 min
episode Your Best Colleague Is Now a Skill cover

Your Best Colleague Is Now a Skill

What if an AI agent could preserve a colleague’s judgment without pretending to become that person? COLLEAGUE.SKILL turns chats, documents, emails, screenshots, and other traces into inspectable agent skills: portable folders of instructions, examples, metadata, and correction history. The key idea is expert knowledge distillation : the extraction of useful human expertise into a bounded technical artifact. For enterprises, this points to a new operating model. Scarce expertise can become reusable, auditable, and updateable, but only if provenance, consent, and limits remain visible. Inspired by the work of Tianyi Zhou, Dongrui Liu, Leitao Yuan, Jing Shao, and Xia Hu, this episode was created using Google's NotebookLM. Read the original paper : https://arxiv.org/abs/2605.31264

7. juni 202619 min
episode AI Agents Just Learned to Train Their Own Skills cover

AI Agents Just Learned to Train Their Own Skills

What if the next leap in AI agents is not a bigger model, but a skill document that learns from failure? SkillOpt treats agent skills as trainable external memory: a separate optimizer edits a compact procedure, then keeps only changes that improve held-out validation, meaning tests not used for the edit. Across 52 model, benchmark, and harness settings, the method is best or tied every time, with gains above 20 points on GPT-5.5 in several loops. For enterprises, this points to a new layer of governance: skills that improve, transfer, and remain auditable. Inspired by the work of Yifan Yang, Ziyang Gong, Weiquan Huang, Qihao Yang, Ziwei Zhou, Zisu Huang, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, Chong Luo, this episode was created using Google's NotebookLM. Read the original paper here: https://arxiv.org/abs/2605.23904

31. mai 202622 min
episode AI Agents Fail the Spreadsheet Test cover

AI Agents Fail the Spreadsheet Test

What happens when AI agents are asked to build the spreadsheets finance teams actually use? WorkstreamBench, a benchmark for end-to-end financial spreadsheet work, exposes the gap between impressive demos and professional deliverables. It tests complete multi-sheet workbooks, not single formulas or table questions. The benchmark scores accuracy, formula quality, and formatting, because in finance a model must be auditable, readable, and easy to modify. Claude Web leads with 69.1 out of 100, but even the best systems degrade as tasks become more complex. Enterprise AI still has a spreadsheet reliability problem. Inspired by the work of Thomson Yen, Julian Poeltl, Harshith Srinivas Gear, Yilin Meng, Joshua Fan, Adam Shen, Yili Liu, Ali Bauyrzhan, Siri Du, Haoyang Liu, Daniel Guetta, and Hongseok Namkoong, this episode was created using Google's NotebookLM. Read the original paper here: https://arxiv.org/pdf/2605.22664

25. mai 202623 min