The Human in the Loop
Meta says AI writes 80% of new code. Their own reviewers can't keep up with their own AI. Straight from their engineering blog. They built RADAR to auto-review low-risk diffs because "the share of diffs receiving timely review has declined." Their words. AI-generated code outpaced human review capacity. Read that with the rest of the week's news. Cognition says Devin merged 7x more PRs year-over-year. AI-written commits inside customer codebases jumped from 16% to 80%. Anthropic shipped Opus 4.8 on Wednesday, and every IDE, gateway, and agent runner had it the same day. They also disclosed a $47B revenue run-rate. The "is this a real business" debate is over. But here is what keeps coming back to me: Shipping more code faster is only a win if the systems that catch problems scale at the same rate. This week, the evidence says they aren't. A new arXiv study of 20,574 real coding-agent sessions documents how often agents do something other than what was asked. ITBench-AA, the first serious benchmark for agentic IT work, scored every frontier model below 50%. Adoption is real. The guardrails are not. This week's episode of The Human in the Loop covers all of it: the shipping wave, the cost-control backlash starting inside eng departments, and why ITBench-AA matters more than the score suggests.
29 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de The Human in the Loop!