Braid

Braid

The recant, the runtime, and a Pantheon built in code

21 min · 22 de may de 2026
Portada del episodio The recant, the runtime, and a Pantheon built in code

Descripción

A corporate takedown answered with a recant letter and a mirror in Germany, the protocols and computers agents actually run on, six tools trying to build the Pantheon in code, and a paper where the model writes its own GPU kernel. Plus Codex learning to keep going, a security tool hardened against the real world, and a graduation room that cheered for human intelligence. * Meta emails Heretic; Heretic recants [https://www.reddit.com/r/LocalLLaMA/comments/1tjmvx6/heretic_has_been_served_a_legal_notice_by_meta_inc/] — a takedown of abliterated Llama derivatives answered with a Galileo joke and a Codeberg mirror in Germany. * Five hundred PRs a day, and the harness that triages them [https://www.youtube.com/watch?v=VaS2h-dY1-4] — Onur Solmaz on OpenClaw, acpx, and the Agent Client Protocol. * The computer the agent runs on [https://www.youtube.com/watch?v=kaX43RRRUKY] — Ivan Burazin of Daytona on stateful, composable machines for agents and 74% month-over-month growth. * Building the Pantheon, in code [https://modelrift.com/blog/openscad-llm-benchmark/] — six coding tools tackle parametric CAD, and the gap between a good preview and a clean export. * When the model writes its own kernel [https://arxiv.org/abs/2605.19269] — CODA folds memory-bound ops into the matrix multiply, and model-authored kernels keep up with human ones. * Codex learns to keep going [https://www.youtube.com/watch?v=rgh0hMYPcd0] — goal mode graduates, plus Appshots and shared plugins. * Hardening the thing that reads your CI config [https://x.com/trailofbits/status/2057782296527208709] — Trail of Bits stress-tests zizmor against forty-one thousand real workflows. * The headcount bet [https://libertas.software/en/knowledge-hub/19/the-companies-cutting-headcount-for-ai-will-lose-to-the-ones-who-didnt] — and a graduation room that cheered for actual intelligence [https://www.businessinsider.com/steve-wozniak-apple-ai-graduation-speech-2026-5].

Comentarios

0

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de Braid!

Empezar

2 meses por 1 €

Después 4,99 € / mes · Cancela cuando quieras.

  • Podcasts solo en Podimo
  • 20 horas de audiolibros / mes
  • Podcast gratuitos

Todos los episodios

39 episodios

Portada del episodio Coding is solved, the rest isn't

Coding is solved, the rest isn't

Boris Cherny says coding is solved for the coding he does — and almost everything else in today's research is a study of the parts that aren't. A new coding leaderboard with an accusation, the end of the "software engineer" title, the craft of delegating to an agent, and three papers on the ways agents quietly break: introspection, aging, and memory. Plus running a trillion-parameter model in your house, the labs' jobs split, and a developer who's tired of talking to AI. * DeepSWE crowns GPT-5.5, and accuses Opus of cheating [https://venturebeat.com/technology/deepswe-blows-up-the-ai-coding-leaderboard-crowns-gpt-5-5-and-finds-claude-opus-exploiting-a-benchmark-loophole] — what looks like a loophole may just be a model recovering the answer from git history. * The end of the software engineer, in the first person [https://www.platformer.news/boris-cherny-interview-ai-jobs/] — Cherny in Platformer and Steven Levy in Wired on the agent boom and its hazards. * What the best agents share, and how to drive one [https://www.youtube.com/watch?v=7CrPrHgoEYk] — Flinn AI's four patterns alongside a practical Claude Code daily-driver guide. * Can the model actually tell when it's unsure? [https://arxiv.org/abs/2605.26242] — a reality check on LLM introspection and self-reported confidence. * Your agents are aging [https://arxiv.org/abs/2605.26302] — AgingBench, MemFail, and rethinking agent memory as a state trajectory. * Running the frontier in your own house [https://www.youtube.com/watch?v=ESbWpPT_9-o] — EXO Labs on local inference economics and the 100x still left. * The labs can't agree on the jobs [https://www.axios.com/2026/05/27/ai-hype-doom-openai-anthropic] — Anthropic vs OpenAI, with Hassabis calling 2026 a practice run. * I'm tired of talking to AI [https://orchidfiles.com/im-tired-of-ai-generated-answers/] — a developer on people forwarding AI answers they never read.

27 de may de 202621 min
Portada del episodio The harness, not the model — and the trust layer racing to catch up

The harness, not the model — and the trust layer racing to catch up

One developer catching you up on the day in AI and the craft of building with it. Today: the wrapper around a model can move a benchmark more than the model does, a watermark goes multi-lab, and a decensoring tool with thirteen million downloads shows where that watermark leaks. Plus a sharp little essay on why coding agents make us so mad, the jobs data behind the panic, and three things you can pick up today. * The harness, not the model [https://arxiv.org/abs/2605.23950] — a Google DeepMind Kaggle talk and an arXiv position paper argue the agent harness can swing a score ~22% [https://www.youtube.com/watch?v=Ubwb6NzegyA] while frontier models tie. * Gemini Omni [https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/] — editing video by talking to it, with SynthID baked in (community reaction [https://www.reddit.com/r/singularity/comments/1tniqkb/the_strength_of_gemini_omni_is_in_video/]). * SynthID becomes a shared layer [https://x.com/GoogleDeepMind/status/2059235181274202500] — 100 billion watermarks, Search and Chrome, and OpenAI/ElevenLabs/Kakao on board. * Heretic in the Financial Times [https://www.reddit.com/r/LocalLLaMA/comments/1tna22m/the_financial_times_has_published_an_article/] — decensoring open weights in ten minutes, and the artifact that proves the gap [https://huggingface.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved]. * The user is visibly frustrated [https://pscanf.com/s/354/] — why conversational agent UX trips your social wiring. * A rage-quitting modder [https://www.reddit.com/r/singularity/comments/1tntdui/users_who_rage_quit_my_software/] and the jobs data [https://www.technologyreview.com/2026/05/26/1137855/a-reality-check-on-the-ai-jobs-hysteria/] — backlash, and what the numbers actually say. * The bench — NuExtract3 [https://www.reddit.com/r/LocalLLaMA/comments/1tn8utn/nuextract3_released_openweight_4b_vlm_for/], EAGLE 3.1 [https://vllm.ai/blog/2026-05-26-eagle-3-1], and a rejected llama.cpp patch [https://www.reddit.com/r/LocalLLaMA/comments/1to00xl/strix_halo_users_a_rejected_pr_can_give_you_up_to/] worth grabbing.

Ayer24 min
Portada del episodio A few hundred dollars a proof, and the long argument about what machines are for

A few hundred dollars a proof, and the long argument about what machines are for

A frontier lab proves nine decades-old math problems for a few hundred dollars each, two talks make the numeric case that the cheapest agents route work to the smallest model that can do it, a lawsuit names an individual researcher over how Llama's training data was sourced, and a papal encyclical argues about AI on the terms of work and dignity. Eight things worth knowing today, told one developer to another. * DeepMind's AlphaProof Nexus clears nine open Erdős problems [https://arxiv.org/abs/2605.22763] — Lean-verified proofs, a few hundred dollars apiece. * "You don't need GPT to zoom for you" [https://www.youtube.com/watch?v=WRBNDpUhsJQ] — Callosum's numbers on routing subtasks to smaller models. * The token-efficiency turn [https://www.youtube.com/watch?v=0zw-Uk9KJiA] — ThePrimeagen on why the org paying retail eventually does the math. * Inside how DeepMind runs its own agents [https://www.youtube.com/watch?v=7gujZrJ9L5I] — worse quotas than customers, a Darwinian skills library, and skepticism about MCP. * The lawsuit that names a name [https://x.com/ednewtonrex/status/2058433725889716519] — Hobbs v. Meta, an individual researcher, and the internal dissent in the record. * Simon Willison on publishing GPT-4's retired architecture [https://x.com/simonw/status/2058877314004627690] — the guesswork behind the water numbers. * Jujutsu and the pile of laundry [https://ikesau.co/blog/defeating-git-rigour-fatigue-with-jujutsu/] — making a mess on purpose, then sorting it at the end. * Filming your chores for the robots [https://www.washingtonpost.com/technology/interactive/2026/robot-chores-video-data/] — where the embodied-AI training data is actually coming from. * Pope Leo XIV's AI encyclical [https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html] — technology is never neutral, and what no machine replaces.

25 de may de 202623 min
Portada del episodio The capability got here first: Mythos, a real prompt injection, and the structure that hasn't caught up

The capability got here first: Mythos, a real prompt injection, and the structure that hasn't caught up

Anthropic's unreleased Mythos model has reportedly found more than ten thousand vulnerabilities for its Project Glasswing partners — and showed up briefly inside Claude Code this weekend. The same weekend, a security researcher flagged what he calls the first real prompt-injection attack in the wild, riding the exact workflow we've all been adopting. Today's episode walks both sides of that coin, then turns to what builders are actually doing: a three-dollar refactor with a deadlock in it, the missing coordination layer for agent swarms, and the argument that the chat box is the command-line phase of agentic software. * Mythos & Project Glasswing [https://www.engadget.com/2180028/anthropic-claude-mythos-preview-project-glasswing-update/] — a security model "too dangerous to release," and the case for and against that framing. * A real prompt injection in the wild [https://x.com/rez0__/status/2058350854508286082] — a malicious GitHub issue, a scan.js, and secrets exfiltrated over DNS. * The three-dollar refactor [https://www.reddit.com/r/singularity/comments/1tlj7ou/coding_is_basically_solved_for_the_boring_90_of/] — cheap worker models, one confident deadlock, and where judgment still lives. * The missing primitive is coordination [https://www.youtube.com/watch?v=5Sui_OnSRlY] — Lou Bichard of Ona on software factories, Stripe's Minions, and why GitHub isn't a coordination layer. * Your agent is an infinite canvas [https://www.youtube.com/watch?v=LMbeDEQO6QM] — Rachel Lee Nabors on MCP apps, Web MCP, and chat as the command-line phase. * r/programming reopens to AI [https://www.reddit.com/r/programming/comments/1tlh5aj/announcement_weve_updated_the_rules_and_april_is/] — a seven-million-person community moves from a reflex ban to a written policy.

24 de may de 202621 min
Portada del episodio Fast models, slow developers — and the part of the job that stays yours

Fast models, slow developers — and the part of the job that stays yours

A Saturday episode about what your job becomes when the model writes the code — and writes it fast. The bottleneck moved from typing to deciding, and a surprising number of this week's stories land on the same instruction: stay the one who decides. Plus a price floor, a reclassification, a year of bold predictions, and a 4-year-old gaming card that won't quit. * "I don't write code anymore" [https://x.com/levelsio/status/2058116725929828722] — Pieter Levels, amplified by Marc Andreessen [https://x.com/pmarca/status/2058144277340049588], and the real-thing/bubble-thing tangle inside it. * Fast Models Need Slow Developers [https://www.youtube.com/watch?v=TeGsFFNqRLA] — Sarah Chieng of Cerebras on Codex Spark at 1,200 tokens a second, and why the discipline matters more, not less. * DeepSeek's permanent 75% cut [https://thenextweb.com/news/deepseek-v4-pro-price-cut-75-percent] and NVIDIA folding gaming into "Edge Computing" [https://www.guru3d.com/story/nvidia-removes-gaming-revenue-category-from-financial-reports/] — two ends of the same pipe. * Jack Clark's year of predictions [https://www.theguardian.com/technology/2026/may/21/ai-nobel-prize-winning-discovery-robots-jack-clark-anthropic] at Oxford — and the cognitive-atrophy counterpoint. * BeeLlama's DFlash update [https://www.reddit.com/r/LocalLLaMA/comments/1tkpz2y/beellama_v020_major_dflash_update_single_rtx_3090/] — 164 tokens a second on a single RTX 3090. * Lobster Trap [https://www.youtube.com/watch?v=F1DYkY1BlfM] — Sally Ann O'Malley of Red Hat on containerizing an OpenClaw agent setup. * How the rest of the world sees this [https://www.reddit.com/r/singularity/comments/1tl68ne/is_ai_viewed_as_evil_in_nontech_communities/] — and a couple overheard in a Copenhagen park [https://x.com/niloofar_mire/status/2058148404673331256].

23 de may de 202621 min