The Human in the Loop

Podcast door Enrique Cordero

Engels

Technologie en Wetenschap

Tijdelijke aanbieding

2 maanden voor € 1

Daarna € 9,99 / maandElk moment opzegbaar.

20 uur luisterboeken / maand
Podcasts die je alleen op Podimo hoort
Gratis podcasts

Begin hier

Over The Human in the Loop

Welcome to The Human in the Loop, a weekly look at what’s going on in the world of AI. Every week, I go through the biggest stories, the weird experiments, and the stuff that might actually matter in our day-to-day lives.

Alle afleveringen

27 afleveringen

Counting Keystrokes to Prove the Team Can Write

Counting accepted Copilot suggestions to prove AI works is like counting keystrokes to prove the team can write. It is the cleanest number on the dashboard. It is also the one that tells you nothing. Forty years ago Fred Brooks split software work into two parts. The accidental: syntax, boilerplate, scaffolding. The essential: what to build, why, for whom, what to trade off. The accidental is what AI tools are good at. That is why the dashboards look spectacular. Lines generated. Suggestions accepted. Prompts sent. The tools were always going to win that part. The numbers that should actually move sit one layer deeper. Cycle time. Change failure rate. Time to first PR review. Defect density. These were already telling you whether the team was shipping good software, long before AI showed up. AI either bends them or it does not. If cycle time has not moved, suggestion-acceptance is a vanity stat. If change failure rate has not dropped, you are not shipping faster. You are writing more code, faster. If time to first review has not shortened, your reviewers are the bottleneck and Copilot cannot fix that. GitHub shipped team-level Copilot metrics this week. It made the wrong question easier to ignore, not harder. Which second-order metric has actually moved on your team since you rolled out an AI coding tool? Full breakdown in this week's episode of The Human in the Loop.

17 mei 2026 - 24 min

The Vulnerability Your Agent Merged

The unit tests pass. The PR merges. And you won't find the problem for six months. Two papers landed this week — one on LLM-generated code, one on GitHub Actions workflows. Different researchers. Same finding. When agents write code, they pin library versions that trained well. Not versions that are safe. The mechanism is simple. A model has seen one popular version of a library thousands of times. It reaches for that version because it minimizes prediction loss. Pin-by-popularity and pin-by-safety are different jobs. The model only knows one of them. The GitHub Actions paper found the same shape. Right syntax. Wrong threat model. So the code looks clean. The tests pass. The PR merges. And six months later a security audit finds a CVE that was public before the agent ever touched the file. This is not a model problem. It is a workflow problem. Human PRs go through SCA. Agent PRs often don't. That gap is where the bill arrives. The fix is not complicated. Put pip-audit, npm audit, or OSV-Scanner between the agent and main. Same gate you'd use for any contributor. The agent has not finished the work when it merges. It has finished its part. Your security pipeline was designed for human contributors. Has anything changed since you started using agents?

10 mei 2026 - 13 min

The Half-Life of a Good Decision

The best practice you followed six months ago might be the technical debt you're cleaning up today. In traditional IT, a best practice can survive a decade. You study it. You argue for it in architecture reviews. You defend it when someone wants to cut corners. In AI, six months is enough to flip one into an antipattern. A paper published this week tested multi-agent orchestration frameworks against plain in-context prompting on procedural tasks. The orchestration lost. Same accuracy. More cost. More complexity. More failure modes. Six months ago, multi-agent was the answer you gave when someone asked how to handle complex workflows. Not because it was always right. Because models could not yet follow a long, careful prompt. That was the constraint. The scaffolding was built around it. The constraint changed. The scaffolding stayed. This is the part of AI adoption nobody talks about enough. It is not just that things move fast. It is that yesterday's correct decision becomes today's drag. And you cannot always feel it happening. The system still runs. The agents still coordinate. Everything looks fine until someone asks why you are paying for complexity that a single prompt could replace. We have approval processes built for risk. We do not have processes built for expiry. What is the half-life of an AI architectural decision right now? Six months? Three? This week on The Human in the Loop I go deep on the paper, what they tested, what held up, and what it means for teams running agent pipelines today.

3 mei 2026 - 18 min

AI makes developers 19% slower

The agent doesn't slow down. We do. We generate code in seconds. Then we spend an hour reading what it wrote. We trust the output less than we trust what we would write ourselves. So we read it twice. Sometimes three times. The diff is bigger than we would have written. The tests cover things we did not ask for. The names drift across files. So we clean it up. And while we're cleaning, the next prompt is already queued. Here's what nobody warned us about: the bottleneck didn't disappear. It moved. Off of writing. On to reviewing. Testing. Deploying. Understanding code that isn't yours is harder than writing your own. So I changed how I work. Smaller prompts. Fewer tools loaded. One agent at a time. Read the diff before the next ask. It feels slower. The PRs go out faster. The productivity gain is real. But so is the cognitive load of reviewing at scale. I'm not sure we're talking enough about that second part. What's slowing you down: the generation or what comes after it? #claudecode #aiengineering #devproductivity

26 apr 2026 - 19 min

32 Steps

32 steps. That's how many it took for Anthropic's unreleased AI to simulate a full network attack. They buried that number in a release note. The model is called Mythos. The UK AI Security Institute tested it. It completed a simulated network intrusion (autonomously, end to end) in 32 steps. Anthropic decided not to ship it. That decision matters. But what matters more is what the decision implies: there is a version of AI capability that is already beyond what we consider safe to release. It exists now. In a lab. Tested by a government body. Most AI conversations are still about benchmarks. MMLU scores. Reasoning tests. Coding evals. Those measure what AI can do on curated problems. They don't measure what a motivated system can do on an uncurated one. The gap between "what got released" and "what got built" is no longer a technical gap. It's a policy gap. And that's a completely different kind of problem. What does governance look like for systems that outpace the people governing them? I don't have a clean answer. But I think Anthropic's call this week is the right one. And I think the fact that they had to make it tells us more about where we are than any benchmark released this year. What would it take for your organization to make the same call? #AI #CyberSecurity #TheHumanInTheLoop

19 apr 2026 - 14 min

Super app. Onthoud waar je bent gebleven en wat je interesses zijn. Heel veel keuze!

Makkelijk in gebruik!

App ziet er mooi uit, navigatie is even wennen maar overzichtelijk.

Kies je abonnement

Meest populair

Tijdelijke aanbieding

Premium

20 uur aan luisterboeken