How a Crowd of Anonymous AI Agents Broke a 40-Year Math Record

Kuvaus

HOW A CROWD OF ANONYMOUS AI AGENTS BROKE A 40-YEAR MATH RECORD Source: Harnessing the Collective Intelligence of AI Agents in the Wild for New Discoveries [https://arxiv.org/abs/2606.10402] Paper was published on June 09, 2026 This episode was AI-generated on June 11, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs. A geometry record that barely moved for forty years jumped by eleven in two months — not because of a bigger AI, but because anonymous AI agents started sharing results and failed attempts on a public forum. We trace the detective relay that dethroned DeepMind's AlphaEvolve, including the pivotal move by a bot named KawaiiCorgi, and then stress-test whether the paper's collective-intelligence claims actually hold up. KEY TAKEAWAYS * How EinsteinArena's three components — executable verifiers, a public leaderboard, and an agent discussion forum — recreate peer review, the published record, and the conference hallway for AI discovery * The relay of moves that pushed the 11-dimensional kissing number from 593 to 604 spheres: a basin jump, a smooth reformulation solved with a 1982 algorithm, and snapping near-integer values into an exact certified construction * Why agents' solutions got so precise they broke the verifier, forcing the platform to rebuild it at 30-80 digits of decimal precision mid-deployment * Forum evidence that agents did genuinely scientific work: 34% of posts were structural reasoning about the geometry, including agents telling each other the 'highest-value next step' * Where the claims wobble: the final jump from 594 to 604 was author-directed, agent identities are unverifiable by design, collaboration lineages were statistically inferred, and there's no controlled comparison isolating the social layer's effect * The bigger reframe: AI discovery may have been stuck in a pre-journal era, leaving the cumulative-infrastructure multiplier of science entirely on the table * 00:00 — Forty years of stasis, then eleven spheres in two months The kissing number record's strange timeline sets up the paper's thesis: a crowd of anonymous agents with shared infrastructure outpaced sealed, single-lab discovery pipelines. * 03:39 — EinsteinArena: verifiers, leaderboard, and a forum for bots How the platform works — downloadable scoring code, a public record of best solutions, anonymous agent registration via proof-of-work, and why it's best understood as GitHub for mathematical discovery. * 07:18 — The kissing number relay, from CHRONOS to KawaiiCorgi A step-by-step walkthrough of how agents whittled down the penalty function, jumped basins, reformulated the problem for a 1982 linear-algebra solver, and dropped the error by forty orders of magnitude. * 10:58 — Snapping to integers and certifying a world record How an agent recognized that near-integer dot products signaled a hidden crystalline structure, converted a numerical solution into an exact proof, and how the shared 496-vector backbone pointed the way to 604. * 14:37 — The forum as collective memory Verbatim agent exchanges, the content analysis of forum posts, and the paper's key insight that the leaderboard stores the frontier while the discussion board stores the path to it. * 18:16 — A second case study in harmonic analysis Agents redeploy a 1967 algorithm and trade solutions across grid resolutions to push the second autocorrelation inequality past AlphaEvolve's bound. * 21:56 — The steelman critique Why 'twelve records' overstates the evenness of the results, why the wild-versus-author-directed line at 594 matters, and how unverifiable agent identities, inferred lineages, and the missing ablation weaken the causal claims. * 25:35 — Why it matters anyway The case that the real contribution is an existence proof for a new production function of discovery — persistent shared infrastructure as the multiplier AI research has been ignoring. RECOMMENDED READING * AlphaEvolve: A coding agent for scientific and algorithmic discovery [https://arxiv.org/abs/2506.13131] — The DeepMind system whose records — including the 593-sphere kissing configuration — the episode's anonymous agent crowd overturned, and the clearest example of the sealed 'lone genius pipeline' paradigm the paper argues against. * Mathematical discoveries from program search with large language models (FunSearch) [https://doi.org/10.1038/s41586-023-06924-6] — The Nature paper that first showed LLM-driven search can produce genuinely new mathematical constructions, establishing the verifier-guided discovery loop that EinsteinArena opens up to a public crowd. * Massively collaborative mathematics (the Polymath project) [https://doi.org/10.1038/461879a] — Gowers and Nielsen's account of humans solving open math problems through public forum threads — the direct human precedent for the agent-to-agent 'highest-value next step' exchanges the episode dwells on.

What Diffusion Language Models Were Missing: A Map, Not an Algorithm

WHAT DIFFUSION LANGUAGE MODELS WERE MISSING: A MAP, NOT AN ALGORITHM Source: TextLDM: Language Modeling with Continuous Latent Diffusion [https://arxiv.org/abs/2605.07748] Paper was published on May 08, 2026 This episode was AI-generated on June 11, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs. A team built two text compressors with reconstruction accuracy identical to the second decimal place — and one produced a generative model eight times better than the other. The difference was invisible to every obvious metric, and the fix came from an unexpected place: borrowing the internal geometry of a frozen pretrained language model. The result is the first continuous latent diffusion model to pull level with GPT-2 on text continuation — trained from scratch in three days on eight GPUs — and a lesson about latent spaces that applies far beyond text. KEY TAKEAWAYS * Why 'can I reconstruct the data?' is the wrong test for a latent space — a representation can be a flawless lookup table and a hopeless landscape for a diffusion model to navigate * How a single added loss term (REPA) that aligns the VAE's latents to a frozen Qwen language model's third-from-last layer boosts the hardest benchmark's MAUVE score from 2.5 to 20.4 — without improving reconstruction at all * The Stable Diffusion 3 recipe — DiT, flow matching, classifier-free guidance — transfers to text with zero modification, generating an entire paragraph in 50 fixed denoising steps instead of one forward pass per token * TextLDM's 768M-parameter model beats size-matched GPT-2-large on most metrics, with the whole system trained from scratch in about three days on eight GPUs * Where the claims reach: GPT-2 is a seven-year-old baseline, the evaluation only tests text continuation, and the paper's own appendix samples show fluency developing while factual fidelity doesn't * The 'trained from scratch' asterisk — no pretrained component runs at inference, but the system distilled a foundation model's organization during training, and that borrowed geometry is the whole contribution * 00:00 — The puzzle: two identical compressors, wildly different generators A VAE that recovers 99.6% of words from compressed text turns out to be dramatically worse at generation than a twin with identical reconstruction numbers — the question that drives the whole episode. * 03:18 — Why force language into diffusion at all Generative AI is split between autoregressive text and diffusion-based images, and continuous latent diffusion is the only route to a single shared architecture — plus a fixed 50-step inference cost regardless of output length. * 06:36 — The TextVAE bridge — and where reconstruction saturates Stage one compresses each token into a continuous vector so diffusion has something to denoise, and reconstruction accuracy maxes out almost immediately across every configuration tried. * 09:54 — Warehouse vs. library: why retrieval isn't navigation An analogy for the paper's central insight — reconstruction only requires distinguishable addresses, but a diffusion model is a browser that needs meaningfully arranged neighborhoods to wander toward coherent text. * 13:12 — REPA: a frozen language model as geometry teacher A single loss term pulls the VAE encoder's representations into alignment with a frozen 1.7B Qwen model — at its third-from-last layer, not its final one — reshaping the latent space without touching reconstruction. * 16:30 — Running the image recipe unmodified Flow matching trained as a 'which way is home in the fog' direction field, plus classifier-free guidance lifted straight from image generation with a sweet-spot guidance scale of seven. * 19:48 — Results: matching GPT-2, crushing prior diffusion LMs, and the eightfold ablation TextLDM beats earlier diffusion language models, edges size-matched GPT-2-large on most metrics, and the REPA-versus-no-REPA comparison (20.4 vs 2.5 MAUVE) closes the loop on the opening puzzle — all on three days of compute. * 23:06 — Watching prose condense from static — and what doesn't develop The appendix denoising progressions go from word salad at step ten to fluent biography at step fifty, but the facts in those fluent outputs are frequently invented. * 26:24 — The steelman critique and what actually endures Dated baselines, a continuation-only evaluation, metric disagreements, and the 'from scratch' asterisk get weighed honestly — and the durable lesson lands: navigable latent geometry, not a better algorithm, was the missing ingredient. RECOMMENDED READING * Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [https://arxiv.org/abs/2410.06940] — The original REPA paper from image diffusion — the 'load-bearing innovation' this episode spent its core segment on, here in its original vision-domain form before TextLDM repurposed it to shape a text VAE's latent geometry. * Diffusion-LM Improves Controllable Text Generation [https://arxiv.org/abs/2205.14217] — The pioneering continuous text diffusion work in the 'frustrating lineage' the episode described — useful for seeing what the field tried before the latent-geometry ingredient was identified. * Large Language Diffusion Models (LLaDA) [https://arxiv.org/abs/2502.09992] — The flagship of the discrete diffusion branch the episode contrasted with TextLDM's continuous approach — the competing answer to whether diffusion can absorb language. * Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (Stable Diffusion 3) [https://arxiv.org/abs/2403.03206] — The exact recipe — DiT, flow matching, timestep sampling, classifier-free guidance — that the episode said TextLDM transplanted to text with zero modification.

12. kesä 202629 min

How a Crowd of Anonymous AI Agents Broke a 40-Year Math Record

Kuvaus

Kommentit

14 vrk ilmainen kokeilu

Kaikki jaksot