Max Agency

Podcast von LangChain

Englisch

Wissenschaft & Technologie

Loslegen

Begrenztes Angebot

2 Monate für 1 €

Dann 4,99 € / MonatJederzeit kündbar.

20 Stunden Hörbücher / Monat
Podcasts nur bei Podimo
Alle kostenlosen Podcasts

Loslegen

Mehr Max Agency

Welcome to Max Agency, a podcast about how the best AI agents are actually being built. Hosted by Harrison Chase, CEO of LangChain, each episode goes deep with the builders designing, deploying, and learning from real agent systems in the wild. From architecture decisions to evals, tooling, and failure modes, Max Agency is for people who want to understand what it really takes to build useful agents.

Alle Folgen

6 Folgen

The tool design tricks behind Benchling's AI agents

Nick Larus-Stone is the Head of AI at Benchling, the R&D data platform that life science companies use to store and manage their experiments, samples, instruments, and analysis. Benchling has been around for since 2012. In October 2025, it launched Benchling AI, an intelligence layer with a chat interface, backed by an agent, that helps scientists find data, design experiments, and write reports. Nick came to Benchling through its acquisition of Sphinx Bio, the analysis startup he founded. In this conversation, Nick walks through what it takes to build agents for scientific work, and where the playbook from coding agents holds up and where it breaks down. – We also discuss: * Why Benchling invests so heavily in getting clean data upfront * How they cross-check answers between models to get more out of each one * Why and how Benchling leans on production traces * Where AI actually helps science today, and where it still gets stuck * Why understanding LLMs is closer to biology than software engineering – Timestamps: 00:00 Intro 01:22 What Benchling AI is, and the 14-year data platform underneath it 04:36 Why a decade of structured data is a core advantage 05:57 The architecture under the hood 08:28 Similarities and differences compared to a coding harness 11:14 Benchling’s multi-agent architectures 14:36 Dealing with verifiable vs non-verifiable tasks 16:19 Doing evals when clean benchmarks aren’t possible 18:13 Context engineering: SQL vs. file-based harnesses 22:11 Memory: agents that create and update their own skills 25:30 What user education for scientists looks like 30:33 Why understanding LLMs is closer to biology than software 33:28 When will agents discover a novel cure for disease? 44:58 The future of harnesses in science 48:13 Why fine-tuning on biology hasn't beaten frontier models – References: * Agent Skills (Claude Docs) [https://docs.claude.com/en/docs/agents-and-tools/agent-skills/overview] * Benchling’s Deep Research Agent [https://www.benchling.com/blog/complex-questions-fast-answers-benchling-deep-research] * Claude (Anthropic) [https://www.anthropic.com/claude] * Design of experiments (DOE) [https://en.wikipedia.org/wiki/Design_of_experiments] * FDA Investigational New Drug (IND) application [https://www.fda.gov/drugs/types-applications/investigational-new-drug-ind-application] * Gemini (Google) [https://gemini.google.com/] * Google AI co-scientist [https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/] * LangSmith [https://www.langchain.com/langsmith] * Model Context Protocol (MCP) [https://modelcontextprotocol.io/] * The Ralph (Wiggum) Loop (Geoffrey Huntley) [https://ghuntley.com/ralph/] * Sphinx Bio [https://www.benchling.com/blog/resync-bio-and-sphinx-bio-join-benchling] – Where to find Nick: * Benchling [https://www.benchling.com/] * LinkedIn [https://www.linkedin.com/in/nlarusstone/] * Twitter/X [https://x.com/nlarusstone] – Where to find Harrison: * LinkedIn [https://www.linkedin.com/in/harrison-chase-961287118/] * Twitter/X [https://x.com/hwchase17] – Where to find LangChain: * Website [https://www.langchain.com/] * Docs [https://docs.langchain.com/] – Send feedback or questions to maxagency@langchain.dev [maxagency@langchain.dev]

4. Juni 2026 - 50 min

How Cogent builds AI agents that have to be right every single time | Geng Sng (Co-founder & CTO - Cogent)

Geng Sng is co-founder and CTO of Cogent, which builds autonomous agents that remediate vulnerabilities for enterprise security teams. Today, Cogent's agents process billions of security events per day, maintaining a live context graph of every asset and vulnerability across customer environments. In this conversation, Geng walks through Cogent's hot vs cold context split, the sub-agents that handle side quests, and the two graphs they run in parallel. – We also discuss: * Why defensive security is harder for AI than offensive * Under the hood of Cogent's three agents * Inside Cogent's “read only” by-default sandboxes * Why graph databases don't scale for security data * Cogent Research and the move into formal verification * Why interactive agents need a deeper planning phase to one-shot – Referenced: * Abnormal AI [https://abnormal.ai/] * Amazon S3 [https://aws.amazon.com/s3/] * Anthropic [https://www.anthropic.com/] * Bash [https://www.gnu.org/software/bash/] * ChatGPT [https://chatgpt.com/] * Claude Code [https://www.anthropic.com/claude-code] * Claude Mythos [https://red.anthropic.com/2026/mythos-preview/] * CodeMender [https://deepmind.google/blog/introducing-codemender-an-ai-agent-for-code-security/] * Codex [https://openai.com/codex/] * Cogent [https://www.cogent.com/] * Cursor [https://cursor.com/] * Google DeepMind [https://deepmind.google/] * GPT-5.5-Cyber [https://openai.com/index/gpt-5-5-with-trusted-access-for-cyber/] * Jupyter [https://jupyter.org/] * Letta [https://www.letta.com/] * Mozilla [https://www.mozilla.org/] * OpenAI [https://openai.com/] * Opus 4.6 [https://www.anthropic.com/news/claude-opus-4-6] * Opus 4.7 [https://www.anthropic.com/news/claude-opus-4-7] * Vercel [https://vercel.com/] – Where to find Geng: * LinkedIn [https://www.linkedin.com/in/geng-sng/] – Where to find Harrison: * LinkedIn [https://www.linkedin.com/in/harrison-chase-961287118/] * Twitter/X [https://x.com/hwchase17] – Where to find LangChain: * Website [https://www.langchain.com/] * Docs [https://docs.langchain.com/] – Send feedback or questions to maxagency@langchain.dev [maxagency@langchain.dev] – Timestamps: 00:00 Why mean time to exploit collapsed from years to minutes 02:08 Inside Cogent's Agent Lake architecture 05:11 Why Cogent rejected graph databases 10:48 The trust ladder before agents touch production 15:13 The three types of agents inside Cogent 17:07 How Cogent sandboxes its agents 19:16 Short-circuiting interactive agents with a deeper planning phase 24:31 What to do when users believe agents too much 31:21 Why sub-agents let agents go on side quests 34:59 Two-tiered evals and the metric that catches bad prompts 40:00 Cogent’s unique approach to context 48:39 Cogent Research and the move into formal verification 51:33 The single trait Cogent hires for 54:00 Open-sourcing models within six months 57:07 Why defensive security won’t be commoditized anytime soon 1:00:51 The founding insight behind Cogent

22. Mai 2026 - 1 h 14 min

How Ramp built an AI agent that can think outside of tokens | Alex Shevchenko

Alexander Shevchenko is the head of applied research at Ramp, where he leads Ramp Labs – the team behind Ramp Sheets and a steady stream of public AI engineering experiments. Ramp Sheets started as an internal process mining tool that turned Loom videos of accountants into Markov diagrams, before evolving into the agentic spreadsheet editor that shipped in November. In this conversation, Alex walks through the architecture under the hood, why Ramp biases the agent toward Excel formulas over Python code gen, and two recent Labs experiments: Latent Briefing and a user-steerable revival of Golden Gate Claude. We also discuss: * Under the hood of Ramp Sheets * Inspect, Ramp's internal coding agent, and the self-improving monitor loop it powers * Why finance professionals rejected code gen as too "black box" * Why Anthropic models tend to excel at agentic spreadsheet manipulation * The case for putting the agent outside the sandbox, not inside it * The Loom-to-Markov-diagram process mining pipeline * RLMs and how subagents can share memory in latent space * Latent Briefing and KV-cache communication between subagents * Reviving Golden Gate Claude with steering vectors on Gemma Referenced: * Alex Levinson [https://www.linkedin.com/in/alex-levinson/] * Anthropic [https://www.anthropic.com/] * Ben Geist [https://www.linkedin.com/in/benjamin-geist/] * Claude [https://www.anthropic.com/claude] * Efficient Memory Sharing for Multi-Agent Systems via KV Cache Compaction (Ben Geist) [https://x.com/RampLabs/status/2042660310851449223] * Gemma [https://ai.google.dev/gemma] * Golden Gate Claude [https://www.anthropic.com/news/golden-gate-claude] * Graphviz [https://graphviz.org/] * Inspect [https://builders.ramp.com/post/why-we-built-our-background-agent] * Latent Briefing [https://x.com/RampLabs/status/2042672773747589588] * Loom [https://www.loom.com/] * Modal [https://modal.com/] * OpenAI [https://openai.com/] * Opus [https://www.anthropic.com/claude/opus] * Qwen [https://qwen.ai/] * Ramp [https://ramp.com/] * Ramp Labs [https://ramplabs.substack.com/] * Ramp Sheets [https://labs.ramp.com/sheets] * Recursive Language Models (Alex Zhang) [https://alexzhang13.github.io/blog/2025/rlm/] * Retool [https://retool.com/] * Self-maintaining Ramp Sheets [https://ramplabs.substack.com/p/self-maintaining] * Steer AI [https://labs.ramp.com/steer-ai] Where to find Alex: * LinkedIn [https://www.linkedin.com/in/shevalex] * Twitter/X [https://x.com/shevchenkoaalex] * Website [https://www.alshevchenko.com/] Where to find Harrison: * LinkedIn [https://www.linkedin.com/in/harrison-chase-961287118/] * Twitter/X [https://x.com/hwchase17] Where to find LangChain: * Website [http://langchain.com] * Docs [https://docs.langchain.com/] Send feedback or questions to maxagency@langchain.dev [maxagency@langchain.dev] Timestamps: 00:00 Introduction 01:13 The origin of Ramp Sheets 02:27 The Loom-to-Markov-diagram process mining pipeline 04:28 Why code gen approaches felt too "black box" to finance 06:13 Meeting finance where they already are: inside the spreadsheet 09:08 How far process mining got them 10:31 Text descriptions and Graphviz DAGs as output 12:41 Under the hood of Ramp Sheets 14:52 Why the agent uses Python only as an escape hatch 15:47 Why Anthropic models excel at agentic spreadsheet manipulation 17:12 Frankensteining the OpenAI Agents SDK 17:43 The Ramp Sheets UX and fast vs. expert mode 19:58 Agent in a sandbox vs. agent with a sandbox 21:55 Vibe evals with expert humans 23:40 Inspect, the internal coding agent 24:13 The self-monitoring loop and auto-PRs 28:01 Other wacky experiments on Sheets 28:43 Memory experiments that didn't pan out 31:16 Latent Briefing and KV-cache subagent communication 35:13 Reviving Golden Gate Claude 37:47 Contrastive pairs and steering vectors 39:47 Picking the right layers in Gemma 41:37 What Ramp Labs looks for when hiring

7. Mai 2026 - 44 min

How Listen is building a system of AI Agents & subagents for specialized tasks | Florian Juengermann, CTO

Florian Juengermann is the co-founder and CTO of Listen, an AI startup that turns qualitative research across hundreds of interviews, surveys, and focus groups into structured, traceable insights. Listen's agents analyze responses at scale, and Florian has rearchitected the system multiple times to get there. In this conversation, he walks through the virtual table architecture at the core of their Research Agent, how small models run map-reduce classification across thousands of open-ended responses, and the self-reviewing feedback subagent that catches errors during long async runs. We also discuss: * The three agents inside Listen's platform * How Listen rearchitected from a simple RAG bot to a multi-agent system multiple times * Why the PowerPoint subagent was completely rebuilt using Claude's code SDK * Contextual prompt engineering as an alternative to skills * How Listen keeps report numbers live as new interview responses come in * When to trigger the long-running agent vs. showing early results * What Florian looks for when hiring agent engineers References: * Anthropic [https://www.anthropic.com/] * ChatGPT [https://chatgpt.com/] * Claude [https://claude.ai/] * Claude Code SDK [https://docs.anthropic.com/en/docs/claude-code/sdk] * E2B [https://e2b.dev/] * Emotional Intelligence [https://listenlabs.ai/features/emotional-intelligence] * GPT Mini [https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/] * Haiku [https://www.anthropic.com/claude/haiku] * Listen [https://listenlabs.ai/] * OpenAI [https://openai.com/] * Pandas [https://pandas.pydata.org/] * Postgres [https://www.postgresql.org/] * Python [https://www.python.org/] * Research Agent [https://listenlabs.ai/features/research-agent] * Render [https://render.com/] * Zoom [https://zoom.us/] Where to find Florian: * LinkedIn [https://www.linkedin.com/in/juengermann/] * Twitter/X [https://x.com/florian_jue] Where to find Harrison: * LinkedIn [https://www.linkedin.com/in/harrison-chase-961287118/] * Twitter/X [https://x.com/hwchase17] Where to find LangChain: * Website [http://langchain.com] * Docs [https://docs.langchain.com/] Send feedback or questions to maxagency@langchain.dev Timestamps 00:00 Introduction 01:25 The three agents inside Listen's platform 03:15 Live chat vs. long async runs, and how Listen tunes for each 05:33 Under the hood of the Research Agent 06:37 Listen's virtual table architecture 07:34 How small models classify thousands of open-ended responses 10:05 Running code in a sandbox: how E2B fits in 11:52 Why Listen rebuilt the PowerPoint subagent from scratch 14:11 Contextual prompt engineering instead of skills 16:32 The feedback subagent that reviews its own reports 18:14 How Listen runs evals in production 19:47 Unexpected ways users push the agent to its limits 21:42 How many times Listen has rearchitected, and why 24:59 Trace observability: depth over breadth 26:10 Lessons from running Claude Code SDK inside E2B 27:42 Memory: what's solved and what isn't 29:10 The Composer agent UX: co-editing a document with AI 35:50 How Listen keeps report numbers live as new responses come in 43:47 What Listen looks for when hiring agent engineers

23. Apr. 2026 - 47 min

How Hex builds AI agents that reason like human data analysts | Izzy Miller, AI Engineer

Izzy Miller is an AI engineer at Hex, an AI analytics platform that was one of the first companies to ship data agents to real paying users. Today, Hex runs a multi-agent system with nearly 100K tokens of tools, and Izzy is building a 90-day simulation to evaluate whether those agents actually get smarter over time. In this conversation, he walks through the harness decisions that shaped their architecture, the failure modes Hex is seeing at scale, and what it takes to build an eval that no current model can pass. We also discuss: * Why data agents are harder to verify than coding agents * Under the hood of Hex’s agents * How Hex is unifying separate agents * Why most eval sets are bad * The 90-day simulation for long-horizon evals * How Izzy went from marketing to AI engineer References: * Andon Labs [https://andonlabs.com/] * Anthropic [https://www.anthropic.com/] * Barry McCardel [linkedin.com/in/barrymccardel] * ChatGPT [http://chatgpt.com] * Claude Code [https://code.claude.com/docs/en/overview] * Claude Sonnet 4.6 [https://www.anthropic.com/news/claude-sonnet-4-6] * DBT [https://www.getdbt.com/] * GPT-3.5 Turbo [https://developers.openai.com/api/docs/models/gpt-3.5-turbo] * GPT-5.3 Codex Spark [https://openai.com/index/introducing-gpt-5-3-codex-spark/] * GPT-5.4 [https://openai.com/index/introducing-gpt-5-4/] * Hex [https://hex.tech/] * LangChain [https://www.langchain.com/] * LangSmith [https://www.smith.langchain.com/] * Looker [https://lookerstudio.google.com/] * OpenAI [https://openai.com/] * Opus 4.6 [https://www.anthropic.com/news/claude-opus-4-6] * Satya Nadella [https://www.linkedin.com/in/satyanadella] * Snowflake [https://www.snowflake.com/en/] * Vending Machine [https://andonlabs.com/vending] Where to find Izzy: * LinkedIn [https://www.linkedin.com/in/izzy-miller/] * Twitter/X [https://x.com/isidoremiller] Where to find Harrison: * LinkedIn [https://www.linkedin.com/in/harrison-chase-961287118/] * Twitter/X [https://x.com/hwchase17] Where to find LangChain: * Website [http://langchain.com] * Docs [https://docs.langchain.com/] Send feedback or questions to maxagency@langchain.dev Timestamps: 01:35 Where Hex's notebook agent started 03:46 The moment Hex knew it was time for agents 07:36 Why data agents are harder to verify than coding agents 09:30 How Hex is unifying separate agents 13:28 Under the hood of the notebook agent 15:41 The harness features that are now holding the agent back 17:41 Why Hex built their own orchestrator 18:59 Managing nearly 100K tokens of tools 20:49 Ephemeral queries and agent behavior trade-offs 24:46 The UX problem with showing agents' thinking 27:28 Why verification is harder than transparency for data agents 31:00 Memory, context conflicts, and collapse modes 34:38 How Hex built their internal eval system 39:29 Why most eval sets are bad 44:30 The 900% quota eval that every model fails 46:55 Model upgrades and the "in distribution" debate 51:34 How Izzy went from marketer to AI engineer 59:59 The 90-day simulation for long-horizon evals

9. Apr. 2026 - 1 h 8 min

Super gut, sehr abwechslungsreich Podimo kann man nur weiterempfehlen

Ich liebe Podcasts, Hörbücher u. -spiele, Dokus usw. Hier habe ich genügend Auswahl. Macht 👍 weiter so

Wähle dein Abonnement

Am beliebtesten

Begrenztes Angebot

Premium

20 Stunden Hörbücher