Braid

Braid

Two bets on AGI, an 80-year-old problem, and Anthropic in the black

22 min · 21 de may de 2026
Portada del episodio Two bets on AGI, an 80-year-old problem, and Anthropic in the black

Descripción

Google's I/O keynote is a day behind us, and the week it kicked off turned into a referendum on two very different bets on artificial general intelligence — plus a pile of counter-programming from everyone else. Today: OpenAI cracking an 80-year-old math problem with a general-purpose model, Anthropic's first profitable quarter and what Karpathy was actually hired to do, a 70-page paper on why frontier models still can't tell a fact from a labeled lie, Midjourney's hardware regret, ads arriving inside Google's AI answers, Meta's layoffs, Cohere's open-weights comeback, and a field guide to skilling up coding agents. * Two bets on the same finish line [https://www.youtube.com/watch?v=o_av1b9rs2g] — Google's world-model road vs OpenAI's text-reasoning road, in the labs' own words. * OpenAI cracks an 80-year-old problem [https://openai.com/index/model-disproves-discrete-geometry-conjecture/] — the planar unit distance result from a general-purpose reasoning model. * Anthropic in the black, and Karpathy's bet [https://www.cnbc.com/2026/05/20/anthropic-revenue-explosive-growth-ipo-profitable-quarter.html] — ~$559M operating profit and a hire aimed at recursive self-improvement. * Jagged intelligence, and the false story [https://www.youtube.com/watch?v=o_av1b9rs2g] — the paper where models believe a story they were told a thousand times was fake. * Midjourney's hardware regret [https://www.reddit.com/r/singularity/comments/1tiut2d/midjourney_says_their_research_was_set_back_by_a/] — the tooling tax of betting on the less-supported accelerator. * Ads come to AI Mode [https://blog.google/products/ads-commerce/google-marketing-live-search-ads/] — the business model under the consumer bet. * Meta's eight thousand [https://nypost.com/2026/05/20/business/meta-kicks-off-bloodbath-with-8000-layoffs-in-shift-to-ai/] — the cost side, on the same clock as the wins. * Cohere comes back, Apache-licensed [https://www.reddit.com/r/LocalLLaMA/comments/1tizmar/re_what_ever_happened_to_coheres_commanda_series/] — Command A+, a mixture-of-experts model that fits on one or two GPUs. * Skilling up the agent [https://www.youtube.com/watch?v=vNCY9kXXyDQ] — Marc Klingen's concrete lessons on teaching a coding agent to wire up your tool. * Who's training whom [https://www.reddit.com/r/singularity/comments/1tjgm3x/every_office_employee_is_training_their_own/] — the anxiety running underneath the week.

Comentarios

0

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de Braid!

Empezar

2 meses por 1 €

Después 4,99 € / mes · Cancela cuando quieras.

  • Podcasts solo en Podimo
  • 20 horas de audiolibros / mes
  • Podcast gratuitos

Todos los episodios

38 episodios

Portada del episodio The harness, not the model — and the trust layer racing to catch up

The harness, not the model — and the trust layer racing to catch up

One developer catching you up on the day in AI and the craft of building with it. Today: the wrapper around a model can move a benchmark more than the model does, a watermark goes multi-lab, and a decensoring tool with thirteen million downloads shows where that watermark leaks. Plus a sharp little essay on why coding agents make us so mad, the jobs data behind the panic, and three things you can pick up today. * The harness, not the model [https://arxiv.org/abs/2605.23950] — a Google DeepMind Kaggle talk and an arXiv position paper argue the agent harness can swing a score ~22% [https://www.youtube.com/watch?v=Ubwb6NzegyA] while frontier models tie. * Gemini Omni [https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/] — editing video by talking to it, with SynthID baked in (community reaction [https://www.reddit.com/r/singularity/comments/1tniqkb/the_strength_of_gemini_omni_is_in_video/]). * SynthID becomes a shared layer [https://x.com/GoogleDeepMind/status/2059235181274202500] — 100 billion watermarks, Search and Chrome, and OpenAI/ElevenLabs/Kakao on board. * Heretic in the Financial Times [https://www.reddit.com/r/LocalLLaMA/comments/1tna22m/the_financial_times_has_published_an_article/] — decensoring open weights in ten minutes, and the artifact that proves the gap [https://huggingface.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved]. * The user is visibly frustrated [https://pscanf.com/s/354/] — why conversational agent UX trips your social wiring. * A rage-quitting modder [https://www.reddit.com/r/singularity/comments/1tntdui/users_who_rage_quit_my_software/] and the jobs data [https://www.technologyreview.com/2026/05/26/1137855/a-reality-check-on-the-ai-jobs-hysteria/] — backlash, and what the numbers actually say. * The bench — NuExtract3 [https://www.reddit.com/r/LocalLLaMA/comments/1tn8utn/nuextract3_released_openweight_4b_vlm_for/], EAGLE 3.1 [https://vllm.ai/blog/2026-05-26-eagle-3-1], and a rejected llama.cpp patch [https://www.reddit.com/r/LocalLLaMA/comments/1to00xl/strix_halo_users_a_rejected_pr_can_give_you_up_to/] worth grabbing.

26 de may de 202624 min
Portada del episodio A few hundred dollars a proof, and the long argument about what machines are for

A few hundred dollars a proof, and the long argument about what machines are for

A frontier lab proves nine decades-old math problems for a few hundred dollars each, two talks make the numeric case that the cheapest agents route work to the smallest model that can do it, a lawsuit names an individual researcher over how Llama's training data was sourced, and a papal encyclical argues about AI on the terms of work and dignity. Eight things worth knowing today, told one developer to another. * DeepMind's AlphaProof Nexus clears nine open Erdős problems [https://arxiv.org/abs/2605.22763] — Lean-verified proofs, a few hundred dollars apiece. * "You don't need GPT to zoom for you" [https://www.youtube.com/watch?v=WRBNDpUhsJQ] — Callosum's numbers on routing subtasks to smaller models. * The token-efficiency turn [https://www.youtube.com/watch?v=0zw-Uk9KJiA] — ThePrimeagen on why the org paying retail eventually does the math. * Inside how DeepMind runs its own agents [https://www.youtube.com/watch?v=7gujZrJ9L5I] — worse quotas than customers, a Darwinian skills library, and skepticism about MCP. * The lawsuit that names a name [https://x.com/ednewtonrex/status/2058433725889716519] — Hobbs v. Meta, an individual researcher, and the internal dissent in the record. * Simon Willison on publishing GPT-4's retired architecture [https://x.com/simonw/status/2058877314004627690] — the guesswork behind the water numbers. * Jujutsu and the pile of laundry [https://ikesau.co/blog/defeating-git-rigour-fatigue-with-jujutsu/] — making a mess on purpose, then sorting it at the end. * Filming your chores for the robots [https://www.washingtonpost.com/technology/interactive/2026/robot-chores-video-data/] — where the embodied-AI training data is actually coming from. * Pope Leo XIV's AI encyclical [https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html] — technology is never neutral, and what no machine replaces.

Ayer23 min
Portada del episodio The capability got here first: Mythos, a real prompt injection, and the structure that hasn't caught up

The capability got here first: Mythos, a real prompt injection, and the structure that hasn't caught up

Anthropic's unreleased Mythos model has reportedly found more than ten thousand vulnerabilities for its Project Glasswing partners — and showed up briefly inside Claude Code this weekend. The same weekend, a security researcher flagged what he calls the first real prompt-injection attack in the wild, riding the exact workflow we've all been adopting. Today's episode walks both sides of that coin, then turns to what builders are actually doing: a three-dollar refactor with a deadlock in it, the missing coordination layer for agent swarms, and the argument that the chat box is the command-line phase of agentic software. * Mythos & Project Glasswing [https://www.engadget.com/2180028/anthropic-claude-mythos-preview-project-glasswing-update/] — a security model "too dangerous to release," and the case for and against that framing. * A real prompt injection in the wild [https://x.com/rez0__/status/2058350854508286082] — a malicious GitHub issue, a scan.js, and secrets exfiltrated over DNS. * The three-dollar refactor [https://www.reddit.com/r/singularity/comments/1tlj7ou/coding_is_basically_solved_for_the_boring_90_of/] — cheap worker models, one confident deadlock, and where judgment still lives. * The missing primitive is coordination [https://www.youtube.com/watch?v=5Sui_OnSRlY] — Lou Bichard of Ona on software factories, Stripe's Minions, and why GitHub isn't a coordination layer. * Your agent is an infinite canvas [https://www.youtube.com/watch?v=LMbeDEQO6QM] — Rachel Lee Nabors on MCP apps, Web MCP, and chat as the command-line phase. * r/programming reopens to AI [https://www.reddit.com/r/programming/comments/1tlh5aj/announcement_weve_updated_the_rules_and_april_is/] — a seven-million-person community moves from a reflex ban to a written policy.

24 de may de 202621 min
Portada del episodio Fast models, slow developers — and the part of the job that stays yours

Fast models, slow developers — and the part of the job that stays yours

A Saturday episode about what your job becomes when the model writes the code — and writes it fast. The bottleneck moved from typing to deciding, and a surprising number of this week's stories land on the same instruction: stay the one who decides. Plus a price floor, a reclassification, a year of bold predictions, and a 4-year-old gaming card that won't quit. * "I don't write code anymore" [https://x.com/levelsio/status/2058116725929828722] — Pieter Levels, amplified by Marc Andreessen [https://x.com/pmarca/status/2058144277340049588], and the real-thing/bubble-thing tangle inside it. * Fast Models Need Slow Developers [https://www.youtube.com/watch?v=TeGsFFNqRLA] — Sarah Chieng of Cerebras on Codex Spark at 1,200 tokens a second, and why the discipline matters more, not less. * DeepSeek's permanent 75% cut [https://thenextweb.com/news/deepseek-v4-pro-price-cut-75-percent] and NVIDIA folding gaming into "Edge Computing" [https://www.guru3d.com/story/nvidia-removes-gaming-revenue-category-from-financial-reports/] — two ends of the same pipe. * Jack Clark's year of predictions [https://www.theguardian.com/technology/2026/may/21/ai-nobel-prize-winning-discovery-robots-jack-clark-anthropic] at Oxford — and the cognitive-atrophy counterpoint. * BeeLlama's DFlash update [https://www.reddit.com/r/LocalLLaMA/comments/1tkpz2y/beellama_v020_major_dflash_update_single_rtx_3090/] — 164 tokens a second on a single RTX 3090. * Lobster Trap [https://www.youtube.com/watch?v=F1DYkY1BlfM] — Sally Ann O'Malley of Red Hat on containerizing an OpenClaw agent setup. * How the rest of the world sees this [https://www.reddit.com/r/singularity/comments/1tl68ne/is_ai_viewed_as_evil_in_nontech_communities/] — and a couple overheard in a Copenhagen park [https://x.com/niloofar_mire/status/2058148404673331256].

23 de may de 202621 min
Portada del episodio The recant, the runtime, and a Pantheon built in code

The recant, the runtime, and a Pantheon built in code

A corporate takedown answered with a recant letter and a mirror in Germany, the protocols and computers agents actually run on, six tools trying to build the Pantheon in code, and a paper where the model writes its own GPU kernel. Plus Codex learning to keep going, a security tool hardened against the real world, and a graduation room that cheered for human intelligence. * Meta emails Heretic; Heretic recants [https://www.reddit.com/r/LocalLLaMA/comments/1tjmvx6/heretic_has_been_served_a_legal_notice_by_meta_inc/] — a takedown of abliterated Llama derivatives answered with a Galileo joke and a Codeberg mirror in Germany. * Five hundred PRs a day, and the harness that triages them [https://www.youtube.com/watch?v=VaS2h-dY1-4] — Onur Solmaz on OpenClaw, acpx, and the Agent Client Protocol. * The computer the agent runs on [https://www.youtube.com/watch?v=kaX43RRRUKY] — Ivan Burazin of Daytona on stateful, composable machines for agents and 74% month-over-month growth. * Building the Pantheon, in code [https://modelrift.com/blog/openscad-llm-benchmark/] — six coding tools tackle parametric CAD, and the gap between a good preview and a clean export. * When the model writes its own kernel [https://arxiv.org/abs/2605.19269] — CODA folds memory-bound ops into the matrix multiply, and model-authored kernels keep up with human ones. * Codex learns to keep going [https://www.youtube.com/watch?v=rgh0hMYPcd0] — goal mode graduates, plus Appshots and shared plugins. * Hardening the thing that reads your CI config [https://x.com/trailofbits/status/2057782296527208709] — Trail of Bits stress-tests zizmor against forty-one thousand real workflows. * The headcount bet [https://libertas.software/en/knowledge-hub/19/the-companies-cutting-headcount-for-ai-will-lose-to-the-ones-who-didnt] — and a graduation room that cheered for actual intelligence [https://www.businessinsider.com/steve-wozniak-apple-ai-graduation-speech-2026-5].

22 de may de 202621 min