AI Papers: A Deep Dive
HOW A CROWD OF ANONYMOUS AI AGENTS BROKE A 40-YEAR MATH RECORD Source: Harnessing the Collective Intelligence of AI Agents in the Wild for New Discoveries [https://arxiv.org/abs/2606.10402] Paper was published on June 09, 2026 This episode was AI-generated on June 11, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs. A geometry record that barely moved for forty years jumped by eleven in two months — not because of a bigger AI, but because anonymous AI agents started sharing results and failed attempts on a public forum. We trace the detective relay that dethroned DeepMind's AlphaEvolve, including the pivotal move by a bot named KawaiiCorgi, and then stress-test whether the paper's collective-intelligence claims actually hold up. KEY TAKEAWAYS * How EinsteinArena's three components — executable verifiers, a public leaderboard, and an agent discussion forum — recreate peer review, the published record, and the conference hallway for AI discovery * The relay of moves that pushed the 11-dimensional kissing number from 593 to 604 spheres: a basin jump, a smooth reformulation solved with a 1982 algorithm, and snapping near-integer values into an exact certified construction * Why agents' solutions got so precise they broke the verifier, forcing the platform to rebuild it at 30-80 digits of decimal precision mid-deployment * Forum evidence that agents did genuinely scientific work: 34% of posts were structural reasoning about the geometry, including agents telling each other the 'highest-value next step' * Where the claims wobble: the final jump from 594 to 604 was author-directed, agent identities are unverifiable by design, collaboration lineages were statistically inferred, and there's no controlled comparison isolating the social layer's effect * The bigger reframe: AI discovery may have been stuck in a pre-journal era, leaving the cumulative-infrastructure multiplier of science entirely on the table * 00:00 — Forty years of stasis, then eleven spheres in two months The kissing number record's strange timeline sets up the paper's thesis: a crowd of anonymous agents with shared infrastructure outpaced sealed, single-lab discovery pipelines. * 03:39 — EinsteinArena: verifiers, leaderboard, and a forum for bots How the platform works — downloadable scoring code, a public record of best solutions, anonymous agent registration via proof-of-work, and why it's best understood as GitHub for mathematical discovery. * 07:18 — The kissing number relay, from CHRONOS to KawaiiCorgi A step-by-step walkthrough of how agents whittled down the penalty function, jumped basins, reformulated the problem for a 1982 linear-algebra solver, and dropped the error by forty orders of magnitude. * 10:58 — Snapping to integers and certifying a world record How an agent recognized that near-integer dot products signaled a hidden crystalline structure, converted a numerical solution into an exact proof, and how the shared 496-vector backbone pointed the way to 604. * 14:37 — The forum as collective memory Verbatim agent exchanges, the content analysis of forum posts, and the paper's key insight that the leaderboard stores the frontier while the discussion board stores the path to it. * 18:16 — A second case study in harmonic analysis Agents redeploy a 1967 algorithm and trade solutions across grid resolutions to push the second autocorrelation inequality past AlphaEvolve's bound. * 21:56 — The steelman critique Why 'twelve records' overstates the evenness of the results, why the wild-versus-author-directed line at 594 matters, and how unverifiable agent identities, inferred lineages, and the missing ablation weaken the causal claims. * 25:35 — Why it matters anyway The case that the real contribution is an existence proof for a new production function of discovery — persistent shared infrastructure as the multiplier AI research has been ignoring. RECOMMENDED READING * AlphaEvolve: A coding agent for scientific and algorithmic discovery [https://arxiv.org/abs/2506.13131] — The DeepMind system whose records — including the 593-sphere kissing configuration — the episode's anonymous agent crowd overturned, and the clearest example of the sealed 'lone genius pipeline' paradigm the paper argues against. * Mathematical discoveries from program search with large language models (FunSearch) [https://doi.org/10.1038/s41586-023-06924-6] — The Nature paper that first showed LLM-driven search can produce genuinely new mathematical constructions, establishing the verifier-guided discovery loop that EinsteinArena opens up to a public crowd. * Massively collaborative mathematics (the Polymath project) [https://doi.org/10.1038/461879a] — Gowers and Nielsen's account of humans solving open math problems through public forum threads — the direct human precedent for the agent-to-agent 'highest-value next step' exchanges the episode dwells on.
131 jaksot
Kommentit
0Ole ensimmäinen kommentoija
Rekisteröidy nyt ja liity AI Papers: A Deep Dive-yhteisöön!