Braid

Braid

Mostly-work, malicious npm, and one engineer replacing a law firm

28 min · 19. maj 2026
episode Mostly-work, malicious npm, and one engineer replacing a law firm cover

Description

A six-month overview from Simon Willison anchors the day: coding agents crossed from often-work to mostly-work in November, and laptop-class models started outrunning expectations. Then a fresh npm supply-chain attack — 637 malicious versions in 22 minutes — that for the first time specifically hijacks Claude Code and Codex agent hooks for persistence. Plus a Number 10 talk on replacing a one-and-a-half-million-pound law-firm contract with one embedded engineer, an editor-layer company renting xAI's Colossus 2, Ethan Mollick on insourcing, the full GenMedia pipeline running for a dollar a book, Daniel Griesser's pi-config skill repo, and two obituaries that hit the Unix world in the same week. * Simon Willison's last-six-months-in-LLMs PyCon lightning talk [https://simonwillison.net/2026/May/19/5-minute-llms/] * Mini Shai-Hulud strikes again — 317 npm packages and your agent hooks [https://safedep.io/mini-shai-hulud-strikes-again-314-npm-packages-compromised/] * Prime Intellect's General-Agent — synthetic RL environments [https://x.com/PrimeIntellect/status/2056569877167808966] * Eoin Mulgrew on Number 10's insurgent technical unit [https://www.youtube.com/watch?v=ObNKGf9YR0g] * Cursor's Compose 2.5 reportedly trained on xAI's Colossus 2 [https://x.com/techdevnotes/status/2056543940052910237] * Ethan Mollick on insourcing via hiring [https://x.com/emollick/status/2056578946813100173] * Guillaume Vernade's full GenMedia pipeline at a dollar a book [https://www.youtube.com/watch?v=BcWFc3H7Khg] * Daniel Griesser's pi-config — Plan, handoff, and subagent skills [https://x.com/DanielGri/status/2056676488183689620] * Peter Neumann (1932–2026) [https://www.tuhs.org/pipermail/tuhs/2026-May/033748.html] * Peter Salus (1938–2026) [https://www.tuhs.org/pipermail/tuhs/2026-May/033750.html] * Magnifica humanitas confirmed for May 25 [https://www.vaticannews.va/en/pope/news/2026-05/pope-leo-xiv-first-encyclical-magnifica-humanitas.html]

Comments

0

Be the first to comment

Sign up now and become a member of the Braid community!

Get Started

2 months for 19 kr.

Then 99 kr. / month · Cancel anytime.

  • Podcasts kun på Podimo
  • 20 lydbogstimer pr. måned
  • Gratis podcasts

All episodes

39 episodes

episode Coding is solved, the rest isn't artwork

Coding is solved, the rest isn't

Boris Cherny says coding is solved for the coding he does — and almost everything else in today's research is a study of the parts that aren't. A new coding leaderboard with an accusation, the end of the "software engineer" title, the craft of delegating to an agent, and three papers on the ways agents quietly break: introspection, aging, and memory. Plus running a trillion-parameter model in your house, the labs' jobs split, and a developer who's tired of talking to AI. * DeepSWE crowns GPT-5.5, and accuses Opus of cheating [https://venturebeat.com/technology/deepswe-blows-up-the-ai-coding-leaderboard-crowns-gpt-5-5-and-finds-claude-opus-exploiting-a-benchmark-loophole] — what looks like a loophole may just be a model recovering the answer from git history. * The end of the software engineer, in the first person [https://www.platformer.news/boris-cherny-interview-ai-jobs/] — Cherny in Platformer and Steven Levy in Wired on the agent boom and its hazards. * What the best agents share, and how to drive one [https://www.youtube.com/watch?v=7CrPrHgoEYk] — Flinn AI's four patterns alongside a practical Claude Code daily-driver guide. * Can the model actually tell when it's unsure? [https://arxiv.org/abs/2605.26242] — a reality check on LLM introspection and self-reported confidence. * Your agents are aging [https://arxiv.org/abs/2605.26302] — AgingBench, MemFail, and rethinking agent memory as a state trajectory. * Running the frontier in your own house [https://www.youtube.com/watch?v=ESbWpPT_9-o] — EXO Labs on local inference economics and the 100x still left. * The labs can't agree on the jobs [https://www.axios.com/2026/05/27/ai-hype-doom-openai-anthropic] — Anthropic vs OpenAI, with Hassabis calling 2026 a practice run. * I'm tired of talking to AI [https://orchidfiles.com/im-tired-of-ai-generated-answers/] — a developer on people forwarding AI answers they never read.

27. maj 202621 min
episode The harness, not the model — and the trust layer racing to catch up artwork

The harness, not the model — and the trust layer racing to catch up

One developer catching you up on the day in AI and the craft of building with it. Today: the wrapper around a model can move a benchmark more than the model does, a watermark goes multi-lab, and a decensoring tool with thirteen million downloads shows where that watermark leaks. Plus a sharp little essay on why coding agents make us so mad, the jobs data behind the panic, and three things you can pick up today. * The harness, not the model [https://arxiv.org/abs/2605.23950] — a Google DeepMind Kaggle talk and an arXiv position paper argue the agent harness can swing a score ~22% [https://www.youtube.com/watch?v=Ubwb6NzegyA] while frontier models tie. * Gemini Omni [https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/] — editing video by talking to it, with SynthID baked in (community reaction [https://www.reddit.com/r/singularity/comments/1tniqkb/the_strength_of_gemini_omni_is_in_video/]). * SynthID becomes a shared layer [https://x.com/GoogleDeepMind/status/2059235181274202500] — 100 billion watermarks, Search and Chrome, and OpenAI/ElevenLabs/Kakao on board. * Heretic in the Financial Times [https://www.reddit.com/r/LocalLLaMA/comments/1tna22m/the_financial_times_has_published_an_article/] — decensoring open weights in ten minutes, and the artifact that proves the gap [https://huggingface.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved]. * The user is visibly frustrated [https://pscanf.com/s/354/] — why conversational agent UX trips your social wiring. * A rage-quitting modder [https://www.reddit.com/r/singularity/comments/1tntdui/users_who_rage_quit_my_software/] and the jobs data [https://www.technologyreview.com/2026/05/26/1137855/a-reality-check-on-the-ai-jobs-hysteria/] — backlash, and what the numbers actually say. * The bench — NuExtract3 [https://www.reddit.com/r/LocalLLaMA/comments/1tn8utn/nuextract3_released_openweight_4b_vlm_for/], EAGLE 3.1 [https://vllm.ai/blog/2026-05-26-eagle-3-1], and a rejected llama.cpp patch [https://www.reddit.com/r/LocalLLaMA/comments/1to00xl/strix_halo_users_a_rejected_pr_can_give_you_up_to/] worth grabbing.

Yesterday24 min
episode A few hundred dollars a proof, and the long argument about what machines are for artwork

A few hundred dollars a proof, and the long argument about what machines are for

A frontier lab proves nine decades-old math problems for a few hundred dollars each, two talks make the numeric case that the cheapest agents route work to the smallest model that can do it, a lawsuit names an individual researcher over how Llama's training data was sourced, and a papal encyclical argues about AI on the terms of work and dignity. Eight things worth knowing today, told one developer to another. * DeepMind's AlphaProof Nexus clears nine open Erdős problems [https://arxiv.org/abs/2605.22763] — Lean-verified proofs, a few hundred dollars apiece. * "You don't need GPT to zoom for you" [https://www.youtube.com/watch?v=WRBNDpUhsJQ] — Callosum's numbers on routing subtasks to smaller models. * The token-efficiency turn [https://www.youtube.com/watch?v=0zw-Uk9KJiA] — ThePrimeagen on why the org paying retail eventually does the math. * Inside how DeepMind runs its own agents [https://www.youtube.com/watch?v=7gujZrJ9L5I] — worse quotas than customers, a Darwinian skills library, and skepticism about MCP. * The lawsuit that names a name [https://x.com/ednewtonrex/status/2058433725889716519] — Hobbs v. Meta, an individual researcher, and the internal dissent in the record. * Simon Willison on publishing GPT-4's retired architecture [https://x.com/simonw/status/2058877314004627690] — the guesswork behind the water numbers. * Jujutsu and the pile of laundry [https://ikesau.co/blog/defeating-git-rigour-fatigue-with-jujutsu/] — making a mess on purpose, then sorting it at the end. * Filming your chores for the robots [https://www.washingtonpost.com/technology/interactive/2026/robot-chores-video-data/] — where the embodied-AI training data is actually coming from. * Pope Leo XIV's AI encyclical [https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html] — technology is never neutral, and what no machine replaces.

25. maj 202623 min
episode The capability got here first: Mythos, a real prompt injection, and the structure that hasn't caught up artwork

The capability got here first: Mythos, a real prompt injection, and the structure that hasn't caught up

Anthropic's unreleased Mythos model has reportedly found more than ten thousand vulnerabilities for its Project Glasswing partners — and showed up briefly inside Claude Code this weekend. The same weekend, a security researcher flagged what he calls the first real prompt-injection attack in the wild, riding the exact workflow we've all been adopting. Today's episode walks both sides of that coin, then turns to what builders are actually doing: a three-dollar refactor with a deadlock in it, the missing coordination layer for agent swarms, and the argument that the chat box is the command-line phase of agentic software. * Mythos & Project Glasswing [https://www.engadget.com/2180028/anthropic-claude-mythos-preview-project-glasswing-update/] — a security model "too dangerous to release," and the case for and against that framing. * A real prompt injection in the wild [https://x.com/rez0__/status/2058350854508286082] — a malicious GitHub issue, a scan.js, and secrets exfiltrated over DNS. * The three-dollar refactor [https://www.reddit.com/r/singularity/comments/1tlj7ou/coding_is_basically_solved_for_the_boring_90_of/] — cheap worker models, one confident deadlock, and where judgment still lives. * The missing primitive is coordination [https://www.youtube.com/watch?v=5Sui_OnSRlY] — Lou Bichard of Ona on software factories, Stripe's Minions, and why GitHub isn't a coordination layer. * Your agent is an infinite canvas [https://www.youtube.com/watch?v=LMbeDEQO6QM] — Rachel Lee Nabors on MCP apps, Web MCP, and chat as the command-line phase. * r/programming reopens to AI [https://www.reddit.com/r/programming/comments/1tlh5aj/announcement_weve_updated_the_rules_and_april_is/] — a seven-million-person community moves from a reflex ban to a written policy.

24. maj 202621 min
episode Fast models, slow developers — and the part of the job that stays yours artwork

Fast models, slow developers — and the part of the job that stays yours

A Saturday episode about what your job becomes when the model writes the code — and writes it fast. The bottleneck moved from typing to deciding, and a surprising number of this week's stories land on the same instruction: stay the one who decides. Plus a price floor, a reclassification, a year of bold predictions, and a 4-year-old gaming card that won't quit. * "I don't write code anymore" [https://x.com/levelsio/status/2058116725929828722] — Pieter Levels, amplified by Marc Andreessen [https://x.com/pmarca/status/2058144277340049588], and the real-thing/bubble-thing tangle inside it. * Fast Models Need Slow Developers [https://www.youtube.com/watch?v=TeGsFFNqRLA] — Sarah Chieng of Cerebras on Codex Spark at 1,200 tokens a second, and why the discipline matters more, not less. * DeepSeek's permanent 75% cut [https://thenextweb.com/news/deepseek-v4-pro-price-cut-75-percent] and NVIDIA folding gaming into "Edge Computing" [https://www.guru3d.com/story/nvidia-removes-gaming-revenue-category-from-financial-reports/] — two ends of the same pipe. * Jack Clark's year of predictions [https://www.theguardian.com/technology/2026/may/21/ai-nobel-prize-winning-discovery-robots-jack-clark-anthropic] at Oxford — and the cognitive-atrophy counterpoint. * BeeLlama's DFlash update [https://www.reddit.com/r/LocalLLaMA/comments/1tkpz2y/beellama_v020_major_dflash_update_single_rtx_3090/] — 164 tokens a second on a single RTX 3090. * Lobster Trap [https://www.youtube.com/watch?v=F1DYkY1BlfM] — Sally Ann O'Malley of Red Hat on containerizing an OpenClaw agent setup. * How the rest of the world sees this [https://www.reddit.com/r/singularity/comments/1tl68ne/is_ai_viewed_as_evil_in_nontech_communities/] — and a couple overheard in a Copenhagen park [https://x.com/niloofar_mire/status/2058148404673331256].

23. maj 202621 min