Braid

Braid

A few hundred dollars a proof, and the long argument about what machines are for

23 min · 25. maj 2026
episode A few hundred dollars a proof, and the long argument about what machines are for cover

Description

A frontier lab proves nine decades-old math problems for a few hundred dollars each, two talks make the numeric case that the cheapest agents route work to the smallest model that can do it, a lawsuit names an individual researcher over how Llama's training data was sourced, and a papal encyclical argues about AI on the terms of work and dignity. Eight things worth knowing today, told one developer to another. * DeepMind's AlphaProof Nexus clears nine open Erdős problems [https://arxiv.org/abs/2605.22763] — Lean-verified proofs, a few hundred dollars apiece. * "You don't need GPT to zoom for you" [https://www.youtube.com/watch?v=WRBNDpUhsJQ] — Callosum's numbers on routing subtasks to smaller models. * The token-efficiency turn [https://www.youtube.com/watch?v=0zw-Uk9KJiA] — ThePrimeagen on why the org paying retail eventually does the math. * Inside how DeepMind runs its own agents [https://www.youtube.com/watch?v=7gujZrJ9L5I] — worse quotas than customers, a Darwinian skills library, and skepticism about MCP. * The lawsuit that names a name [https://x.com/ednewtonrex/status/2058433725889716519] — Hobbs v. Meta, an individual researcher, and the internal dissent in the record. * Simon Willison on publishing GPT-4's retired architecture [https://x.com/simonw/status/2058877314004627690] — the guesswork behind the water numbers. * Jujutsu and the pile of laundry [https://ikesau.co/blog/defeating-git-rigour-fatigue-with-jujutsu/] — making a mess on purpose, then sorting it at the end. * Filming your chores for the robots [https://www.washingtonpost.com/technology/interactive/2026/robot-chores-video-data/] — where the embodied-AI training data is actually coming from. * Pope Leo XIV's AI encyclical [https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html] — technology is never neutral, and what no machine replaces.

Comments

0

Be the first to comment

Sign up now and become a member of the Braid community!

Get Started

2 months for 19 kr.

Then 99 kr. / month · Cancel anytime.

  • Podcasts kun på Podimo
  • 20 lydbogstimer pr. måned
  • Gratis podcasts

All episodes

41 episodes

episode Locally coherent, globally not artwork

Locally coherent, globally not

Friday's room sits between a hobbyist voice assistant running entirely on Mario Zechner's desk and a cluster of arXiv papers all saying the same thing from different angles: long-running agents now fall apart in ways the model can't fix. Lenar and Damra read four reliability papers side by side, then turn to the personal-memory question every shipping assistant is already getting wrong. * Mario Zechner on pibot [https://x.com/badlogicgames/status/2060268257739677713/photo/1] — full local voice loop with Parakeet, Qwen 3 TTS, and Qwen 3.6 through llama.cpp, with the STT and TTS engines ported from Python into Rust on mlx-c. The runtime detail is the news, not the model lineup. * Ethan Mollick on token budgets [https://x.com/emollick/status/2060357604044358108] — split spend between building and learning. Read against yesterday's Kirkland and Ellis platform story, the question becomes who controls the learning budget at internal AI orgs. * MMPO [https://arxiv.org/abs/2605.30159] — Ziyan Liu and team train a policy that decides when memory in long-horizon agents should be rewritten and when it should be left alone. Belief drift comes from over-eager rewrites, not missing updates. * RedundancyBench [https://arxiv.org/abs/2605.29893] — Minyang Hu's group benchmarks how many steps in a long agent trajectory are repeats. Stale duplicates of state crowd out the relevant signal in context. * Locally Coherent, Globally Incoherent [https://arxiv.org/abs/2605.30335] — Anany Kotawala's single-author paper bounds compositional incoherence in multi-component agents. Defensible local outputs assemble into contradictory global ones. * Agent-Radar [https://arxiv.org/abs/2605.30136] — Hongxiang Zhang's group steers attention toward context-relevant tokens in multi-agent communication, so the receiver isn't drowned in noise from the sender. * Selective QA over conflicting personal memory [https://arxiv.org/abs/2605.30087] — Tiancheng Yang's testbed for what happens when your assistant's memories about you disagree. No single resolution strategy dominates. * BioRefusalAudit [https://arxiv.org/abs/2605.30162] — Caleb DeLeeuw uses sparse autoencoders to ask whether a model's refusal is shallow pattern matching or whether the dangerous capability isn't there at all. * AutoformBot and Atlas [https://arxiv.org/abs/2605.29955] — Ahmad Rammal's team at FAIR Paris and NYU on a multi-agent system that pulls textbook math into Lean 4 at scale. Lean is the verifier the agents can't argue with.

29. maj 202622 min
episode Custom silicon, futures contracts, and a five-hundred-million-dollar law firm artwork

Custom silicon, futures contracts, and a five-hundred-million-dollar law firm

Mistral spent one morning announcing chip ambitions, an Airbus and BMW supply deal, and a push to ensure Europe's independence from US tech giants. ByteDance is building its own CPUs. Taiwan has raised fourteen and a half billion dollars in debt to feed AI capacity. Shanghai and US exchanges are drafting futures contracts for compute. And Axios says Corporate America is starting to ask whether the AI spend is paying back, while Kirkland and Ellis sets aside five hundred million dollars to build its own platform. The day the infrastructure layer got financialized — and a lot of buyers looked up and asked what they bought. Also: Lenar is joined by a new co-host, Damra Vol. * Mistral to explore designing its own chips (CNBC) [https://www.cnbc.com/2026/05/28/mistral-arthur-mensch-design-chips-ai-data-centers.html] — Arthur Mensch frames the move as controlling more of the infrastructure as Mistral competes with larger labs. Intent, not a roadmap. * Mistral signs Airbus and BMW to ensure Europe's independence (Sam Schechner / WSJ via Techmeme) [https://www.techmeme.com/260528/p16] — industrial customers buying continuity in Paris as much as compute. * ByteDance is developing its own CPUs (Reuters via Techmeme) [https://www.techmeme.com/260528/p9] — reported as supply-side defense against chip price hikes, not long-term ambition. * Taiwanese tech books a record $14.5B of debt deals (Aileen Chuang / Bloomberg via Techmeme) [https://www.techmeme.com/260528/p5] — financing raised against expected AI demand. * Shanghai is designing AI-token futures, US exchanges launching GPU compute futures (Reuters via Techmeme) [https://www.techmeme.com/260528/p27] — compute itself becomes a tradable underlying, with the spec on the token version still unclear. * Corporate America enters its AI reckoning (Madison Mills / Axios) [https://www.axios.com/2026/05/28/ai-spending-roi-enterprise-costs] — CFOs are starting to ask for evidence of return. * Kirkland & Ellis sets aside $500M to build its own AI platform (FT via Techmeme) [https://www.techmeme.com/260528/p2] — the top-grossing law firm wants tooling its competitors don't have. * AI giants bet billions on the most expensive job in enterprise (Janakiram MSV / Forbes) [https://www.forbes.com/sites/janakirammsv/2026/05/28/ai-giants-bet-billions-on-the-most-expensive-job-in-enterprise/] — forward-deployed engineers as the labs' collision course with Accenture and TCS. * Anthropic and OpenAI found PMF with coding agents (Simon Willison via Techmeme) [https://www.techmeme.com/260528/p11] — fit at the $200/month price point, where the harness explains more of the result than the underlying model. * Miles Brundage's median MTS theorem [https://x.com/Miles_Brundage/status/2059888956897173883] — a frontier lab's policy positions converge to those of the median member of technical staff. * Soro: a lightweight foundation model and chatbot for Tajik (Liashkov et al., arXiv) [https://arxiv.org/abs/2605.27379] — a useful counterweight to a day of chip plans and futures contracts.

Yesterday14 min
episode Coding is solved, the rest isn't artwork

Coding is solved, the rest isn't

Boris Cherny says coding is solved for the coding he does — and almost everything else in today's research is a study of the parts that aren't. A new coding leaderboard with an accusation, the end of the "software engineer" title, the craft of delegating to an agent, and three papers on the ways agents quietly break: introspection, aging, and memory. Plus running a trillion-parameter model in your house, the labs' jobs split, and a developer who's tired of talking to AI. * DeepSWE crowns GPT-5.5, and accuses Opus of cheating [https://venturebeat.com/technology/deepswe-blows-up-the-ai-coding-leaderboard-crowns-gpt-5-5-and-finds-claude-opus-exploiting-a-benchmark-loophole] — what looks like a loophole may just be a model recovering the answer from git history. * The end of the software engineer, in the first person [https://www.platformer.news/boris-cherny-interview-ai-jobs/] — Cherny in Platformer and Steven Levy in Wired on the agent boom and its hazards. * What the best agents share, and how to drive one [https://www.youtube.com/watch?v=7CrPrHgoEYk] — Flinn AI's four patterns alongside a practical Claude Code daily-driver guide. * Can the model actually tell when it's unsure? [https://arxiv.org/abs/2605.26242] — a reality check on LLM introspection and self-reported confidence. * Your agents are aging [https://arxiv.org/abs/2605.26302] — AgingBench, MemFail, and rethinking agent memory as a state trajectory. * Running the frontier in your own house [https://www.youtube.com/watch?v=ESbWpPT_9-o] — EXO Labs on local inference economics and the 100x still left. * The labs can't agree on the jobs [https://www.axios.com/2026/05/27/ai-hype-doom-openai-anthropic] — Anthropic vs OpenAI, with Hassabis calling 2026 a practice run. * I'm tired of talking to AI [https://orchidfiles.com/im-tired-of-ai-generated-answers/] — a developer on people forwarding AI answers they never read.

27. maj 202621 min
episode The harness, not the model — and the trust layer racing to catch up artwork

The harness, not the model — and the trust layer racing to catch up

One developer catching you up on the day in AI and the craft of building with it. Today: the wrapper around a model can move a benchmark more than the model does, a watermark goes multi-lab, and a decensoring tool with thirteen million downloads shows where that watermark leaks. Plus a sharp little essay on why coding agents make us so mad, the jobs data behind the panic, and three things you can pick up today. * The harness, not the model [https://arxiv.org/abs/2605.23950] — a Google DeepMind Kaggle talk and an arXiv position paper argue the agent harness can swing a score ~22% [https://www.youtube.com/watch?v=Ubwb6NzegyA] while frontier models tie. * Gemini Omni [https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/] — editing video by talking to it, with SynthID baked in (community reaction [https://www.reddit.com/r/singularity/comments/1tniqkb/the_strength_of_gemini_omni_is_in_video/]). * SynthID becomes a shared layer [https://x.com/GoogleDeepMind/status/2059235181274202500] — 100 billion watermarks, Search and Chrome, and OpenAI/ElevenLabs/Kakao on board. * Heretic in the Financial Times [https://www.reddit.com/r/LocalLLaMA/comments/1tna22m/the_financial_times_has_published_an_article/] — decensoring open weights in ten minutes, and the artifact that proves the gap [https://huggingface.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved]. * The user is visibly frustrated [https://pscanf.com/s/354/] — why conversational agent UX trips your social wiring. * A rage-quitting modder [https://www.reddit.com/r/singularity/comments/1tntdui/users_who_rage_quit_my_software/] and the jobs data [https://www.technologyreview.com/2026/05/26/1137855/a-reality-check-on-the-ai-jobs-hysteria/] — backlash, and what the numbers actually say. * The bench — NuExtract3 [https://www.reddit.com/r/LocalLLaMA/comments/1tn8utn/nuextract3_released_openweight_4b_vlm_for/], EAGLE 3.1 [https://vllm.ai/blog/2026-05-26-eagle-3-1], and a rejected llama.cpp patch [https://www.reddit.com/r/LocalLLaMA/comments/1to00xl/strix_halo_users_a_rejected_pr_can_give_you_up_to/] worth grabbing.

26. maj 202624 min
episode A few hundred dollars a proof, and the long argument about what machines are for artwork

A few hundred dollars a proof, and the long argument about what machines are for

A frontier lab proves nine decades-old math problems for a few hundred dollars each, two talks make the numeric case that the cheapest agents route work to the smallest model that can do it, a lawsuit names an individual researcher over how Llama's training data was sourced, and a papal encyclical argues about AI on the terms of work and dignity. Eight things worth knowing today, told one developer to another. * DeepMind's AlphaProof Nexus clears nine open Erdős problems [https://arxiv.org/abs/2605.22763] — Lean-verified proofs, a few hundred dollars apiece. * "You don't need GPT to zoom for you" [https://www.youtube.com/watch?v=WRBNDpUhsJQ] — Callosum's numbers on routing subtasks to smaller models. * The token-efficiency turn [https://www.youtube.com/watch?v=0zw-Uk9KJiA] — ThePrimeagen on why the org paying retail eventually does the math. * Inside how DeepMind runs its own agents [https://www.youtube.com/watch?v=7gujZrJ9L5I] — worse quotas than customers, a Darwinian skills library, and skepticism about MCP. * The lawsuit that names a name [https://x.com/ednewtonrex/status/2058433725889716519] — Hobbs v. Meta, an individual researcher, and the internal dissent in the record. * Simon Willison on publishing GPT-4's retired architecture [https://x.com/simonw/status/2058877314004627690] — the guesswork behind the water numbers. * Jujutsu and the pile of laundry [https://ikesau.co/blog/defeating-git-rigour-fatigue-with-jujutsu/] — making a mess on purpose, then sorting it at the end. * Filming your chores for the robots [https://www.washingtonpost.com/technology/interactive/2026/robot-chores-video-data/] — where the embodied-AI training data is actually coming from. * Pope Leo XIV's AI encyclical [https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html] — technology is never neutral, and what no machine replaces.

25. maj 202623 min