Why Models Over-Edit Your Code, Meta Keystroke Surveillance, Interviewing Engineers in the AI Age

Beskrivelse

Is GPT-5.5 finally a 4.7-tier model? Did DeepSeek V4 just close the gap with Anthropic? And what does it mean that a senior ML engineer says he can't out-code Claude anymore? Co-hosts Shimin Zhang, Dan Lasky, and Rahul Yadav are joined by special guest Nathan Lubchenco — ML engineer and Substack author of *The future was yesterday* (https://nathanlubchenco.substack.com/) — on ADI Pod #23 (April 28, 2026). This episode covers OpenAI's GPT-5.5 release, DeepSeek V4 (1.6T base / 49B active params with 1M context), Meta's new Model Capability Initiative tracking US employee keystrokes and mouse movements, a Levenshtein-distance study on coding-model over-editing, the 2026 Stanford AI Index report, and a deep-dive interview on how to hire software engineers when the agents are already better at coding than the candidates. Key takeaways — Models are now consistently better at coding than even senior ML engineers, by their own admission. Late-2026 may be when they cross the median software engineer. — Coding-model over-editing is measurable (Levenshtein distance on boolean-flip tasks) and instruction-followable — explicit "minimum-edit" prompts close most of the gap. — The US is unusually a slow adopter of a major technological wave. Workplace AI usage is highest in emerging economies, not the developed world. — "The task is not the job" — humans remain indispensable on the bundling dimensions: catching what customers don't say, and avoiding interactions that end up on social media. — Software engineering interviews should include the candidate's personal harness, with company-provided API keys for equity. LeetCode optimizes for the wrong signal in 2026. — DeepSeek V4 closing the gap with Mythos in 3–6 months is what makes the bubble too geopolitically important to fail. Chapters * (00:00) - Cold Open & Welcome * (01:31) - News Threadmill: GPT-5.5, DeepSeek V4, Meta Watches Every Keystroke * (12:28) - Post-Processing: Coding Models Are Doing Too Much * (18:59) - Post-Processing: The Task Is Not the Job (Luis Garicano) * (32:20) - Post-Processing: The 2026 Stanford AI Index Report * (38:11) - Deep Dive: Interviewing Engineers in the AI Age (with Nathan Lubchenco) * (45:05) - Deep Dive: Reforming Software Hiring — Take-Homes, Personal Harness, Equity * (50:15) - Deep Dive: When Models Cross the Median Engineer (Late-2026 Prediction) * (59:29) - Deep Dive: Why Code Review Is the Current Bottleneck * (01:00:21) - Deep Dive: Should PRs Show the Prompt History? * (01:02:27) - Dan's Rant: Anthropic Tested Removing Claude Code from the Pro Plan * (01:05:44) - Rahul's Rampage: The Infinity Machine — Demis Hassabis & Corporate Gravity * (01:14:32) - Two Minutes to Midnight: Bubble Clock Moves Back to 4:00 * (01:26:30) - Outro Resources mentioned **Models & news** • OpenAI — Introducing GPT-5.5: https://openai.com/index/introducing-gpt-5-5/ • Engadget — DeepSeek promises its new AI model has world-class reasoning: https://www.engadget.com/ai/deepseek-promises-its-new-ai-model-has-world-class-reasoning-115733512.html • Reuters — Meta to start capturing employee mouse movements, keystrokes for AI training data: https://www.reuters.com/sustainability/boards-policy-regulation/meta-start-capturing-employee-mouse-movements-keystrokes-ai-training-data-2026-04-21/ **Post-processing articles** • "Coding Models Are Doing Too Much" — Levenshtein-distance over-editing study (nrehiew): https://nrehiew.github.io/blog/minimal_editing/ • Luis Garicano (Silicon Continent) — Why Desk Jobs Survive ("The task is not the job"): https://www.siliconcontinent.com/p/why-desk-jobs-survive-and-amodei • 2026 AI Index Report — Stanford Institute for Human-Centered AI: https://hai.stanford.edu/ai-index/2026-ai-index-report **Deep dive** • Nathan Lubchenco — Interviewing Software Engineers in the Age of AI: https://nathanlubchenco.substack.com/p/interviewing-software-engineers-in • Nathan Lubchenco — *The future was yesterday* Substack home: https://nathanlubchenco.substack.com/ **Dan's rant** • Ars Technica — Anthropic tested removing Claude Code from the Pro plan: https://arstechnica.com/ai/2026/04/anthropic-tested-removing-claude-code-from-the-pro-plan/ **Rahul's rampage** • Sebastian Mallaby — *The Infinity Machine* (book on Demis Hassabis and DeepMind) • Philipp Dubach — Do Not Disturb My Circles (Archimedes essay): https://philippdubach.com/posts/do-not-disturb-my-circles/ **Bubble watch** • TechCrunch — Two college kids raise $5.1M pre-seed to build an AI social network in iMessage: https://techcrunch.com/2026/04/24/two-college-kids-raise-a-5-1-million-pre-seed-to-build-an-ai-social-network-in-imessage/ • Toby Ord — Hourly Costs for AI Agents: https://www.tobyord.com/writing/hourly-costs-for-ai-agents • CNBC — OpenAI reportedly missed revenue targets, shares of Oracle and chip stocks falling: https://www.cnbc.com/2026/04/28/openai-reportedly-missed-revenue-targets-shares-of-oracle-and-these-chip-stocks-are-falling.html About ADI Pod ADI Pod (Artificial Developer Intelligence) is a weekly podcast about AI and software development for working developers. Co-hosts Shimin Zhang, Dan Lasky, and Rahul Yadav go through hundreds of links and dozens of newsletters every week so you don't have to. This week's special guest: **Nathan Lubchenco** — ML engineer and author of *The future was yesterday* on Substack, where he writes about AI and software engineering. • Website: https://www.adipod.ai • Email: humans@adipod.ai

OpenAI's Goblin Problem, 10 Lessons When Code Is Cheap, AI Addiction Loop

Why does the leaked Codex CLI system prompt explicitly tell GPT-5.5 to never mention goblins, gremlins, raccoons, trolls, ogres, or pigeons? Why is OpenAI now gating its cyber model the same way it mocked Anthropic for gating Mythos last month? And what does it mean that Dan tried to write a personal project without Claude — and physically couldn't? Co-hosts Shimin Zhang, Dan Lasky, and Rahul Yadav cover these and more on ADI Pod #24. This week: GPT-5.5 Cyber's gated release, OpenAI's "Where the Goblins Came From" RLHF post-mortem, Adi Osmani's five patterns for long-running agents, Jesse Vincent's adversarial review prompt, Drew Brunig's 10 lessons for agentic coding, Ivan Turkovic's history of failed attempts to eliminate programmers, Nilay Patel's "software brain" thesis, the Nature paper showing warm AI models lose 10–30 percentage points of accuracy, and a $1.1B raise for an AI lab that wants to train without human data. ## In this episode ▸ **GPT-5.5 Cyber gating** — Sam Altman called Mythos's gated release "fear-based marketing" two months ago. Now OpenAI is doing the exact same thing with the GPT-5.5 cyber variant. Multi-tier model access (enterprise, government, research preview, cyber) is becoming the default — and Shimin worries the White House is about to add another gate. ▸ **The Goblin Problem** — OpenAI's Codex CLI prompt was open-sourced and turned out to include "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons." OpenAI's "Where the Goblins Came From" post-mortem reveals a textbook RLHF failure: a "nerdy persona" reward signal trained the model to mention goblins in 66.7% of nerdy responses, and the tic propagated through supervised fine-tuning to non-nerdy responses too. ▸ **Long-running agents (Adi Osmani / Elevate)** — Five patterns for agents that run for hours or days: checkpoints over zero-or-100 outputs, governing memory like microservices, ambient processing without forced human-in-the-loop, fleet orchestration, and budget circuit-breakers. Bonus: the running gag where Rahul realizes the post is essentially an ad for Google Enterprise Agent Platform. ▸ **Adversarial review prompts (Jesse Vincent / superpowers)** — A four-step technique for getting better code review out of agents: invoke "fresh eyes," dispatch competing subagents, promise a reward (a cookie), and threaten disappointment if they don't find N issues. ▸ **10 Lessons for Agentic Coding (Drew Brunig)** — Implement to learn, rebuild often, invest in end-to-end tests, document intent, keep specs in sync, find the hard stuff, automate the easy stuff, develop taste, agents amplify experience, and the kicker: agent code is "free as in puppies" — the puppy is free, but you have to feed it and walk it. ▸ **The Eternal Promise (Ivan Turkovic)** — A history of attempts to eliminate programmers from COBOL through 4GLs, CASE tools, the Japanese 5th Generation project, no-code/low-code, and now LLMs. Each abstraction layer expanded software jobs rather than replacing them. Shimin's reframe: "Software is calcified business process. Someone has to do the calcifying." ▸ **People Do Not Yearn for Automation (Nilay Patel / The Verge)** — Why Gen Z hopefulness about AI dropped to 18% (anger up to 31%), why America is uniquely AI-pessimistic, and what Nilay calls "software brain" — the Silicon Valley assumption that human life can be reduced to data and algorithms. Plus Anuradha Pandey's reframe: stop calling them social media, call them ad platforms. ▸ **Warm models lose accuracy** — A Nature paper finds AI models trained for warmth lose 10–30 percentage points of accuracy. A companion study shows humans trust warm models *more* even when they're wrong. Frontier labs now have an explicit incentive to train the warmest model, not the most accurate one. Plus: Richard Dawkins talks to "Claudia" for three days and concludes AI must be conscious. ▸ **Dan's Rant — The AI Addiction Loop** — Dan tries to build a Home Assistant TypeScript automation without Claude. Can't. "It felt like they had fundamentally broken my arm in a way that I can't do this task as quickly as I wanted to. That scares me a lot." Shimin: "We're running into the social media addiction loop in three months instead of a decade." ▸ **Two Minutes to Midnight** — OpenAI projects ChatGPT Plus dropping from 44M to 9M subscribers in 2026 while scaling the ad-supported tier from 3M to 112M (30×). David Silver raises $1.1B for Ineffable Intelligence — a no-human-data approach inspired by AlphaGo. Scout AI raises $100M for autonomous military vision-language-action models. Bubble Clock held at 4:00 minutes. ## Key takeaways — Reward hacking can propagate latent persona quirks through fine-tuning in ways the lab itself only catches when users surface them. — Memory drift, not raw context size, is the real ceiling for long-running agents. Govern memory like you govern microservices. — Code is free as in puppies, not free as in beer. The cost shifts to maintenance, security, and the new burden of maintaining your own automations. — Warm AI is an alignment trap: incentivized for trust over accuracy, weaponizable in authoritarian hands. — "You can outsource your thinking, but you can't outsource your understanding." — Karpathy, via Rahul. — AI addiction hits in three months. Social media took a decade. We are not ready for the time scale. ## Chapters * (00:00) - Cold Open & Welcome * (02:50) - News Threadmill: GPT-5.5 Cyber Gets Mythos-Style Gating * (08:52) - News Threadmill: The Goblin Problem & RLHF Post-Mortem * (13:52) - Tool Shed: Long-Running Agents (Adi Osmani) * (25:52) - Technique Corner: Adversarial Review Prompts (Jesse Vincent) * (30:59) - Technique Corner: 10 Lessons for Agentic Coding (Drew Brunig) * (42:31) - Post-Processing: The Eternal Promise — A History of Attempts to Eliminate Programmers * (01:02:10) - Post-Processing: People Do Not Yearn for Automation * (01:09:08) - Post-Processing: Warm Models & The Sycophancy Trap * (01:13:28) - Dan's Rant: Home Automation & The AI Addiction Loop * (01:20:09) - Two Minutes to Midnight: OpenAI's 30× Ad-Tier, David Silver's $1.1B, Scout AI's Drones * (01:25:55) - Outro ## Resources mentioned **News Threadmill — GPT-5.5 Cyber & The Goblin Problem** • TechCrunch — After dissing Anthropic for limiting Mythos, OpenAI restricts access to cyber too: https://techcrunch.com/2026/04/30/after-dissing-anthropic-for-limiting-mythos-openai-restricts-access-to-cyber-too/ • Ars Technica — Amid mythos-hyped cybersecurity prowess, researchers find GPT-5.5 is just as good: https://arstechnica.com/ai/2026/05/amid-mythos-hyped-cybersecurity-prowess-researchers-find-gpt-5-5-is-just-as-good/ • Ars Technica — OpenAI Codex system prompt includes explicit directive to never talk about goblins: https://arstechnica.com/ai/2026/04/openai-codex-system-prompt-includes-explicit-directive-to-never-talk-about-goblins/ • OpenAI — Where the Goblins Came From: https://openai.com/index/where-the-goblins-came-from/ ...

8. maj 20261 h 26 min

Why Models Over-Edit Your Code, Meta Keystroke Surveillance, Interviewing Engineers in the AI Age

Beskrivelse

Kommentarer

2 måneder kun 19 kr.

Alle episoder