The Human in the Loop
Meta says AI writes 80% of new code. Their own reviewers can't keep up with their own AI. Straight from their engineering blog. They built RADAR to auto-review low-risk diffs because "the share of diffs receiving timely review has declined." Their words. AI-generated code outpaced human review capacity. Read that with the rest of the week's news. Cognition says Devin merged 7x more PRs year-over-year. AI-written commits inside customer codebases jumped from 16% to 80%. Anthropic shipped Opus 4.8 on Wednesday, and every IDE, gateway, and agent runner had it the same day. They also disclosed a $47B revenue run-rate. The "is this a real business" debate is over. But here is what keeps coming back to me: Shipping more code faster is only a win if the systems that catch problems scale at the same rate. This week, the evidence says they aren't. A new arXiv study of 20,574 real coding-agent sessions documents how often agents do something other than what was asked. ITBench-AA, the first serious benchmark for agentic IT work, scored every frontier model below 50%. Adoption is real. The guardrails are not. This week's episode of The Human in the Loop covers all of it: the shipping wave, the cost-control backlash starting inside eng departments, and why ITBench-AA matters more than the score suggests.
29 afleveringen
Reacties
0Wees de eerste die een reactie plaatst
Meld je nu aan en word lid van de The Human in the Loop community!