Billede af showet AI Research Today

AI Research Today

Podcast af Aaron

engelsk

Videnskab & teknologi

Begrænset tilbud

2 måneder kun 19 kr.

Derefter 99 kr. / månedOpsig når som helst.

  • 20 lydbogstimer pr. måned
  • Podcasts kun på Podimo
  • Gratis podcasts
Kom i gang

Læs mere AI Research Today

AI Research Today unpacks the latest advancements in artificial intelligence, one paper at a time. We go beyond abstracts and headlines, walking through architectures, experiments, training details, ablations, failure modes, and the implications for future work. Each episode will choose between one and three new, impactful research papers and go through them in depth. We will discuss the papers at the level of an industry practitioner or AI researcher. If you want to understand the newest topics in AI research but don't have the time to dig through the papers yourself, this is your solution.

Alle episoder

10 episoder

episode OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation cover

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation

Send us Fan Mail [https://www.buzzsprout.com/2559699/fan_mail/new] In this episode, we break down the new paper “OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation,” which explores how AI agents can be benchmarked across real occupational domains like healthcare, logistics, manufacturing, customs processing, and more. The paper introduces OccuBench, a large-scale benchmark spanning 100 professional task scenarios across 65 specialized domains. One of the most interesting ideas is the use of Language Environment Simulators (LESs), where LLMs simulate enterprise environments and tool responses for domains that normally have no public APIs or accessible evaluation environments. We discuss: * Why current agent benchmarks miss most real-world enterprise work * How simulated environments can evaluate professional AI agents * Fault injection testing and robustness evaluation * Cross-industry capability differences between frontier models * What this means for autonomous enterprise systems and AI agents in production Paper: https://arxiv.org/abs/2604.10866 [https://arxiv.org/abs/2604.10866] PDF: https://arxiv.org/pdf/2604.10866 [https://arxiv.org/pdf/2604.10866] Arkitekt AI: arkitekt-ai.com [https://arkitekt-ai.com/?utm_source=chatgpt.com] Contact: support@arkitekt-ai.com [support@arkitekt-ai.com]

12. maj 2026 - 32 min
episode GradMem: Teaching LLMs to Remember (Without Retraining) cover

GradMem: Teaching LLMs to Remember (Without Retraining)

Send us Fan Mail [https://www.buzzsprout.com/2559699/fan_mail/new] In this episode, we break down GradMem, a new approach to memory in large language models: https://arxiv.org/pdf/2603.13875v1 [https://arxiv.org/pdf/2603.13875v1] Instead of relying on the transformer KV cache or repeatedly reprocessing documents (like in RAG), GradMem introduces a different idea—learn a compact memory representation at inference time. Using a few steps of gradient descent, the model “writes” important information from a context into a small set of memory tokens, allowing it to answer future queries without needing the original context. We cover: *  Why KV cache is a brute-force solution to long context  *  How test-time optimization turns memory into something learnable  *  The difference between storing text vs. storing information  *  What this means for agents, RAG systems, and long-horizon tasks  Big takeaway: > Instead of reading context over and over, models can learn to compress and reuse it intelligently. Learn more / build with AI https://www.arkitekt-ai.com/ [https://www.arkitekt-ai.com/]

23. apr. 2026 - 29 min
episode Language Models are Injective and Hence Invertible cover

Language Models are Injective and Hence Invertible

Send us Fan Mail [https://www.buzzsprout.com/2559699/fan_mail/new] In this episode, we break down a fascinating new result from recent research: that modern Transformer language models are almost surely injective—meaning different prompts map to unique internal representations, with no information loss. We dig into the paper: Read the paper on arXiv [https://arxiv.org/abs/2510.15511] At the core of the proof is a surprisingly deep mathematical idea: Transformers are real analytic functions of their parameters, which allows researchers to rigorously reason about when “collisions” (two prompts producing the same representation) can occur. The result? Collisions only happen on a measure zero set—mathematically possible, but practically never observed.  We unpack: * What it means for a function to be real analytic * Why this implies near-perfect uniqueness of representations * How gradient descent preserves this property during training * And what this says about interpretability, privacy, and reversibility of LLMs We also explore the practical implications—if models are truly invertible, could we reconstruct inputs from activations? What does that mean for safety and data leakage? About the Host This episode is brought to you by Arkitekt AI — an automated enterprise software development platform that builds full analytics, ML, and data systems from natural language. Learn more: https://arkitekt-ai.com [https://arkitekt-ai.com]

23. mar. 2026 - 26 min
episode Learning to Reason in 13 Parameters cover

Learning to Reason in 13 Parameters

Send us Fan Mail [https://www.buzzsprout.com/2559699/fan_mail/new] Link to arxiv: https://arxiv.org/pdf/2602.04118 [https://arxiv.org/pdf/2602.04118] Large language models have recently shown impressive reasoning abilities, often learned through reinforcement learning and low-rank adaptation techniques like LoRA. But these approaches still assume that effective reasoning requires relatively large adaptation layers. This new paper challenges that assumption by asking a provocative question: how small can a reasoning update really be? In this episode, we explore Learning to Reason in 13 Parameters, which introduces TinyLoRA, a method that compresses low-rank adapters down to the extreme — in some cases to just a single parameter. Instead of relying on large adaptation matrices, TinyLoRA demonstrates that reasoning behavior can be steered using ultra-minimal parameter updates, dramatically reducing the computational and memory footprint required to teach models new reasoning skills.  We break down: * Why conventional LoRA and low-rank adapters hit a floor at model dimensionality, * How TinyLoRA scales reasoning adapters down to near-zero parameter counts, * What this reveals about where reasoning ability actually lives inside neural networks, * And why tiny adaptation layers could reshape efficient fine-tuning, on-device intelligence, and rapid deployment. The results suggest that reasoning competence may not require massive structural changes — only precisely targeted parameter nudges. This challenges assumptions about scaling, efficiency, and the true complexity of learned reasoning.

16. feb. 2026 - 26 min
episode SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search cover

SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search

Send us Fan Mail [https://www.buzzsprout.com/2559699/fan_mail/new] Large Language Models often struggle with complex planning tasks that require exploration, backtracking, and self-correction. Once an LLM commits to an early mistake, its linear chain-of-thought reasoning makes recovery difficult. While search methods like Monte Carlo Tree Search (MCTS) offer a way to explore alternatives, they typically rely on sparse rewards and fail to fully exploit the semantic strengths of language models. In this episode, we dive into SPIRAL (Symbolic LLM Planning via Grounded and Reflective Search), a new framework that fundamentally rethinks how planning and search interact in LLM-based agents. Instead of treating MCTS as a brute-force optimizer, SPIRAL embeds a cognitive architecture of three specialized LLM roles directly into the search loop: * A Planner proposes creative next actions, * A Simulator grounds those actions by predicting realistic outcomes, and * A Critic reflects on the results to provide dense, informative reward signals. This planner–simulator–critic loop transforms search into a guided, self-correcting reasoning process, allowing agents to recover from mistakes, evaluate alternatives more effectively, and plan with far greater robustness. Paper link: https://arxiv.org/pdf/2512.23167 [https://arxiv.org/pdf/2512.23167] Repo: https://github.com/IBM/SPIRAL [https://github.com/IBM/SPIRAL]

26. jan. 2026 - 28 min
En fantastisk app med et enormt stort udvalg af spændende podcasts. Podimo formår virkelig at lave godt indhold, der takler de lidt mere svære emner. At der så også er lydbøger oveni til en billig pris, gør at det er blevet min favorit app.
En fantastisk app med et enormt stort udvalg af spændende podcasts. Podimo formår virkelig at lave godt indhold, der takler de lidt mere svære emner. At der så også er lydbøger oveni til en billig pris, gør at det er blevet min favorit app.
Rigtig god tjeneste med gode eksklusive podcasts og derudover et kæmpe udvalg af podcasts og lydbøger. Kan varmt anbefales, om ikke andet så udelukkende pga Dårligdommerne, Klovn podcast, Hakkedrengene og Han duo 😁 👍
Podimo er blevet uundværlig! Til lange bilture, hverdagen, rengøringen og i det hele taget, når man trænger til lidt adspredelse.

Vælg dit abonnement

Mest populære

Begrænset tilbud

Premium

20 timers lydbøger

  • Podcasts kun på Podimo

  • Ingen reklamer i podcasts fra Podimo

  • Opsig når som helst

2 måneder kun 19 kr.
Derefter 99 kr. / måned

Kom i gang

Premium Plus

100 timers lydbøger

  • Podcasts kun på Podimo

  • Ingen reklamer i podcasts fra Podimo

  • Opsig når som helst

Prøv gratis i 7 dage
Derefter 129 kr. / måned

Prøv gratis

Kun på Podimo

Populære lydbøger

Kom i gang

2 måneder kun 19 kr. Derefter 99 kr. / måned. Opsig når som helst.