Braid

Braid

Cold starts, radio stations, and a circuit you can subtract

28 min · 19. Mai 2026
Episode Cold starts, radio stations, and a circuit you can subtract Cover

Beschreibung

Monday's lineup: Modal publishes the full architecture behind a 40x reduction in serverless-GPU cold-start latency, Andon Labs releases the five-month results from letting four frontier models run real radio stations, and a researcher locates and turns off the political-censorship circuit inside Qwen 3.5 9B. Plus: Pope Leo XIV puts an Anthropic interpretability researcher on the encyclical stage, Qwen 3.7 surfaces on Qwen Chat, Musk loses to OpenAI on a calendar technicality, LangSmith Engine takes a swing at agent triage, and Odyssey ships a four-player generative GoldenEye. * Modal's 50-second cold start [https://modal.com/blog/truly-serverless-gpus] * Five months of AI radio [https://andonlabs.com/blog/andon-fm] * Magnifica humanitas at the Vatican [https://www.vaticannews.va/en/pope/news/2026-05/pope-leo-xiv-first-encyclical-magnifica-humanitas.html] * Reading Qwen 3.5's censorship out of its weights [https://vas-blog.pages.dev/qwen-censorship/] * Qwen 3.7 surfaces [https://www.reddit.com/r/LocalLLaMA/comments/1tgpabe/qwen_37_droped_on_qwen_chat/] and Musk loses [https://www.cnbc.com/2026/05/18/musk-altman-openai-trial-verdict.html] * LangSmith Engine takes a swing at agent triage [https://www.langchain.com/blog/introducing-langsmith-engine] * Agora-1 generates a shared GoldenEye [https://odyssey.ml/introducing-agora-1] * Three questions for I/O tomorrow [https://x.com/sundarpichai/status/2056524502746747048]

Kommentare

0

Sei die erste Person, die kommentiert

Melde dich jetzt an und werde Teil der Braid-Community!

Loslegen

2 Monate für 1 €

Dann 4,99 € / Monat · Jederzeit kündbar.

  • Podcasts nur bei Podimo
  • 20 Stunden Hörbücher / Monat
  • Alle kostenlosen Podcasts

Alle Folgen

39 Folgen

Episode Coding is solved, the rest isn't Cover

Coding is solved, the rest isn't

Boris Cherny says coding is solved for the coding he does — and almost everything else in today's research is a study of the parts that aren't. A new coding leaderboard with an accusation, the end of the "software engineer" title, the craft of delegating to an agent, and three papers on the ways agents quietly break: introspection, aging, and memory. Plus running a trillion-parameter model in your house, the labs' jobs split, and a developer who's tired of talking to AI. * DeepSWE crowns GPT-5.5, and accuses Opus of cheating [https://venturebeat.com/technology/deepswe-blows-up-the-ai-coding-leaderboard-crowns-gpt-5-5-and-finds-claude-opus-exploiting-a-benchmark-loophole] — what looks like a loophole may just be a model recovering the answer from git history. * The end of the software engineer, in the first person [https://www.platformer.news/boris-cherny-interview-ai-jobs/] — Cherny in Platformer and Steven Levy in Wired on the agent boom and its hazards. * What the best agents share, and how to drive one [https://www.youtube.com/watch?v=7CrPrHgoEYk] — Flinn AI's four patterns alongside a practical Claude Code daily-driver guide. * Can the model actually tell when it's unsure? [https://arxiv.org/abs/2605.26242] — a reality check on LLM introspection and self-reported confidence. * Your agents are aging [https://arxiv.org/abs/2605.26302] — AgingBench, MemFail, and rethinking agent memory as a state trajectory. * Running the frontier in your own house [https://www.youtube.com/watch?v=ESbWpPT_9-o] — EXO Labs on local inference economics and the 100x still left. * The labs can't agree on the jobs [https://www.axios.com/2026/05/27/ai-hype-doom-openai-anthropic] — Anthropic vs OpenAI, with Hassabis calling 2026 a practice run. * I'm tired of talking to AI [https://orchidfiles.com/im-tired-of-ai-generated-answers/] — a developer on people forwarding AI answers they never read.

27. Mai 202621 min
Episode The harness, not the model — and the trust layer racing to catch up Cover

The harness, not the model — and the trust layer racing to catch up

One developer catching you up on the day in AI and the craft of building with it. Today: the wrapper around a model can move a benchmark more than the model does, a watermark goes multi-lab, and a decensoring tool with thirteen million downloads shows where that watermark leaks. Plus a sharp little essay on why coding agents make us so mad, the jobs data behind the panic, and three things you can pick up today. * The harness, not the model [https://arxiv.org/abs/2605.23950] — a Google DeepMind Kaggle talk and an arXiv position paper argue the agent harness can swing a score ~22% [https://www.youtube.com/watch?v=Ubwb6NzegyA] while frontier models tie. * Gemini Omni [https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/] — editing video by talking to it, with SynthID baked in (community reaction [https://www.reddit.com/r/singularity/comments/1tniqkb/the_strength_of_gemini_omni_is_in_video/]). * SynthID becomes a shared layer [https://x.com/GoogleDeepMind/status/2059235181274202500] — 100 billion watermarks, Search and Chrome, and OpenAI/ElevenLabs/Kakao on board. * Heretic in the Financial Times [https://www.reddit.com/r/LocalLLaMA/comments/1tna22m/the_financial_times_has_published_an_article/] — decensoring open weights in ten minutes, and the artifact that proves the gap [https://huggingface.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved]. * The user is visibly frustrated [https://pscanf.com/s/354/] — why conversational agent UX trips your social wiring. * A rage-quitting modder [https://www.reddit.com/r/singularity/comments/1tntdui/users_who_rage_quit_my_software/] and the jobs data [https://www.technologyreview.com/2026/05/26/1137855/a-reality-check-on-the-ai-jobs-hysteria/] — backlash, and what the numbers actually say. * The bench — NuExtract3 [https://www.reddit.com/r/LocalLLaMA/comments/1tn8utn/nuextract3_released_openweight_4b_vlm_for/], EAGLE 3.1 [https://vllm.ai/blog/2026-05-26-eagle-3-1], and a rejected llama.cpp patch [https://www.reddit.com/r/LocalLLaMA/comments/1to00xl/strix_halo_users_a_rejected_pr_can_give_you_up_to/] worth grabbing.

Gestern24 min
Episode A few hundred dollars a proof, and the long argument about what machines are for Cover

A few hundred dollars a proof, and the long argument about what machines are for

A frontier lab proves nine decades-old math problems for a few hundred dollars each, two talks make the numeric case that the cheapest agents route work to the smallest model that can do it, a lawsuit names an individual researcher over how Llama's training data was sourced, and a papal encyclical argues about AI on the terms of work and dignity. Eight things worth knowing today, told one developer to another. * DeepMind's AlphaProof Nexus clears nine open Erdős problems [https://arxiv.org/abs/2605.22763] — Lean-verified proofs, a few hundred dollars apiece. * "You don't need GPT to zoom for you" [https://www.youtube.com/watch?v=WRBNDpUhsJQ] — Callosum's numbers on routing subtasks to smaller models. * The token-efficiency turn [https://www.youtube.com/watch?v=0zw-Uk9KJiA] — ThePrimeagen on why the org paying retail eventually does the math. * Inside how DeepMind runs its own agents [https://www.youtube.com/watch?v=7gujZrJ9L5I] — worse quotas than customers, a Darwinian skills library, and skepticism about MCP. * The lawsuit that names a name [https://x.com/ednewtonrex/status/2058433725889716519] — Hobbs v. Meta, an individual researcher, and the internal dissent in the record. * Simon Willison on publishing GPT-4's retired architecture [https://x.com/simonw/status/2058877314004627690] — the guesswork behind the water numbers. * Jujutsu and the pile of laundry [https://ikesau.co/blog/defeating-git-rigour-fatigue-with-jujutsu/] — making a mess on purpose, then sorting it at the end. * Filming your chores for the robots [https://www.washingtonpost.com/technology/interactive/2026/robot-chores-video-data/] — where the embodied-AI training data is actually coming from. * Pope Leo XIV's AI encyclical [https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html] — technology is never neutral, and what no machine replaces.

25. Mai 202623 min
Episode The capability got here first: Mythos, a real prompt injection, and the structure that hasn't caught up Cover

The capability got here first: Mythos, a real prompt injection, and the structure that hasn't caught up

Anthropic's unreleased Mythos model has reportedly found more than ten thousand vulnerabilities for its Project Glasswing partners — and showed up briefly inside Claude Code this weekend. The same weekend, a security researcher flagged what he calls the first real prompt-injection attack in the wild, riding the exact workflow we've all been adopting. Today's episode walks both sides of that coin, then turns to what builders are actually doing: a three-dollar refactor with a deadlock in it, the missing coordination layer for agent swarms, and the argument that the chat box is the command-line phase of agentic software. * Mythos & Project Glasswing [https://www.engadget.com/2180028/anthropic-claude-mythos-preview-project-glasswing-update/] — a security model "too dangerous to release," and the case for and against that framing. * A real prompt injection in the wild [https://x.com/rez0__/status/2058350854508286082] — a malicious GitHub issue, a scan.js, and secrets exfiltrated over DNS. * The three-dollar refactor [https://www.reddit.com/r/singularity/comments/1tlj7ou/coding_is_basically_solved_for_the_boring_90_of/] — cheap worker models, one confident deadlock, and where judgment still lives. * The missing primitive is coordination [https://www.youtube.com/watch?v=5Sui_OnSRlY] — Lou Bichard of Ona on software factories, Stripe's Minions, and why GitHub isn't a coordination layer. * Your agent is an infinite canvas [https://www.youtube.com/watch?v=LMbeDEQO6QM] — Rachel Lee Nabors on MCP apps, Web MCP, and chat as the command-line phase. * r/programming reopens to AI [https://www.reddit.com/r/programming/comments/1tlh5aj/announcement_weve_updated_the_rules_and_april_is/] — a seven-million-person community moves from a reflex ban to a written policy.

24. Mai 202621 min
Episode Fast models, slow developers — and the part of the job that stays yours Cover

Fast models, slow developers — and the part of the job that stays yours

A Saturday episode about what your job becomes when the model writes the code — and writes it fast. The bottleneck moved from typing to deciding, and a surprising number of this week's stories land on the same instruction: stay the one who decides. Plus a price floor, a reclassification, a year of bold predictions, and a 4-year-old gaming card that won't quit. * "I don't write code anymore" [https://x.com/levelsio/status/2058116725929828722] — Pieter Levels, amplified by Marc Andreessen [https://x.com/pmarca/status/2058144277340049588], and the real-thing/bubble-thing tangle inside it. * Fast Models Need Slow Developers [https://www.youtube.com/watch?v=TeGsFFNqRLA] — Sarah Chieng of Cerebras on Codex Spark at 1,200 tokens a second, and why the discipline matters more, not less. * DeepSeek's permanent 75% cut [https://thenextweb.com/news/deepseek-v4-pro-price-cut-75-percent] and NVIDIA folding gaming into "Edge Computing" [https://www.guru3d.com/story/nvidia-removes-gaming-revenue-category-from-financial-reports/] — two ends of the same pipe. * Jack Clark's year of predictions [https://www.theguardian.com/technology/2026/may/21/ai-nobel-prize-winning-discovery-robots-jack-clark-anthropic] at Oxford — and the cognitive-atrophy counterpoint. * BeeLlama's DFlash update [https://www.reddit.com/r/LocalLLaMA/comments/1tkpz2y/beellama_v020_major_dflash_update_single_rtx_3090/] — 164 tokens a second on a single RTX 3090. * Lobster Trap [https://www.youtube.com/watch?v=F1DYkY1BlfM] — Sally Ann O'Malley of Red Hat on containerizing an OpenClaw agent setup. * How the rest of the world sees this [https://www.reddit.com/r/singularity/comments/1tl68ne/is_ai_viewed_as_evil_in_nontech_communities/] — and a couple overheard in a Copenhagen park [https://x.com/niloofar_mire/status/2058148404673331256].

23. Mai 202621 min