The AI Cookbook Show by Malcolm Werchota

#128 - How ChatGPT Cracked an 80-Year-Old Math Problem for $1,000

27 min · 11 de jun de 2026

Descripción

Picture Dr. Katharina Hess — she runs the Computational Chemistry Group at one of the big pharma companies in the Novartis corridor. 11 postdocs and data scientists under her. Not 3 projects — 30 open projects, research cycles of 5, 10, 20 years. Five days ago she opens Nature. The headline grabs her: "AI cracks an 80-year-old mathematical challenge." She reads it. Reads it again. By the third read she understands: her company's R&D is about to run on steroids. Not because of the math problem itself — but because of the method. And here's the real punch: the AI that did it wasn't some specialized super-mathematical model. It was ChatGPT. Yes, your ChatGPT. (OK, the reasoning model, GPT-5.4 Pro — but still.) 🧮 Who the hell was Paul Erdős? Hungarian mathematician, born 1913. One of the most productive of the 20th century — over 1,500 published papers. Restless. No apartment. No fixed office. Today we'd call him a digital nomad — back then, an analog one. He went from university to university with two suitcases. His passion wasn't solving problems. It was formulating them. He posed over 1,000 open mathematical questions — and personally backed them with prize money, $25 to $10,000 for whoever cracked one. 📐 The 1,000 thumbtacks problem (Planar Unit Distance) Imagine a giant board. You take 1,000 thumbtacks. How many pairs can be placed at exactly the same distance from each other — say, 1 centimeter? Sounds simple. It isn't. In 1984, Spencer & Trotter calculated the upper bound: n to the 4/3 power. That ceiling hasn't moved in 40 years. Noga Alon (Princeton): "It was one of Erdős's favorite problems." 💸 How ChatGPT solved it — for ~$1,000 in tokens Step one — which ChatGPT? Not the one that messes up your email. The reasoning model — GPT-5.4 Pro. You actually have to click the model selector. Don't use Auto. The prompt was almost unassuming: "Could Erdős be wrong? Could the reasoning behind this bound be flawed?" And then the model worked. Completely autonomously. 125 pages. Around 100,000 tokens. Cost: somewhere between $100 and $1,000. Reality check: tomorrow I'm flying to an oil & gas company in Hannover. Zurich → Hannover one-way: $800. So the token cost of solving an 80-year-old mathematical problem is in the order of a single business trip. 🔧 The trick: not a better screwdriver — a different wrench entirely For 40 years mathematicians attacked this with geometric tools: incidence geometry, Szemerédi-Trotter, crossing number method. Those tools hit a natural ceiling — the n^(4/3) bound. The AI did something else. It pulled a completely different key out of the toolbox: algebraic number theory. CM fields. Complex multiplication. Infinite Galois towers. It didn't solve the problem. It reformulated it — from a geometric problem to a number-theoretic one. And suddenly the answer became much more concrete. 🤖 The DeepMind counter-punch: AlphaProof Nexus + Lean Then Google DeepMind dropped the receipts. Their system AlphaProof Nexus claims to have solved: * 9 open Erdős problems * 44 additional open conjectures * A 15-year-old problem in algebraic geometry And here's where it gets architectural. AlphaProof Nexus combines AI reasoning with a formal verification tool called Lean. The AI doesn't just spit out an answer — it produces a step-by-step proof, and Lean mechanically verifies every single step. Every logical leap is checked. Incorrect assumptions are rejected. The final proof meets strict mathematical standards. Cost per problem: a few hundred dollars in compute. ⚖️ Two religions: human-verified vs machine-verified This is now a genuine philosophical split in the AI math community: * OpenAI's approach: let the LLM produce the proof, then send it to 9 of the world's top mathematicians — including Fields Medal winners like Noga Alon, Daniel Litt, Melanie Wood — to verify by hand. Slow. Authoritative. * DeepMind's approach: let the AI prove it AND let the machine (Lean) verify it. Fast. Reproducible. But — you have to trust Lean. Both approaches address the hallucination problem: AI models can invent unproven statements, skip difficult parts, present incomplete proofs as finished. Human review and machine verification are two different solutions to the same fundamental risk. 🛑 The Hassabis caveat: AGI is still far Demis Hassabis (DeepMind CEO) reminds everyone: "For an AI, this wasn't actually that hard." The problem is extremely difficult to solve, but it's bounded. AGI would require: * Creativity across multiple fields simultaneously * Independent reasoning * Original idea generation Today's systems are powerful specialized tools — not minds. But here's the catch: the most clever thing the AI did wasn't the solution. It was the cross-domain reformulation. And that's exactly where your R&D department needs to wake up. 🧬 Why your R&D needs this — silos, Da Vinci, AlphaFold Pharma R&D is the textbook silo problem: * Medicinal chemists define and find targets * Biologists know the pathways * Statisticians wade through the data They work in their silos. They don't talk on the level where breakthroughs happen. Leonardo da Vinci could. Math + chemistry + physics + anatomy — all in one head, all connected. Today that's impossible for a human because of information overload. But an AI? An AI has exactly that cross-domain synthesis ability. Side note: Google DeepMind already won the Nobel Prize 10 years ago — for AlphaFold solving the protein-folding problem. Pure cross-domain AI. If pharma had taken that seriously, they'd be a decade ahead today. 🦴 The uncomfortable truth about your senior researchers Who are the most expensive people in any R&D department? Not the juniors. The 30-year veterans earning three-quarters of a million euros a year. And they are the worst AI users. Because they fundamentally say: "I've done research like this for 40 years. I don't need ChatGPT." When you hire a postdoc in 2026, "is he good in his domain?" is no longer the only question. The new questions: * Can he prompt a reasoning model correctly? * Can he ask cross-domain questions? "How would a biologist see this? How would an economist see this?" * Does he click "Auto" or does he deliberately choose GPT-5.4 Reasoning? ⚖️ The legal department will be your next blocker Imagine: you've found something genius with ChatGPT. You want to patent it. Who stops you first? Legal. * Does it belong to us? Or to OpenAI? * Does it belong to Microsoft (if you used Copilot)? * Who holds the patent? The answers aren't clarified yet. Your discoveries may sit in legal review for 2 years. Plan for it. 🎯 Three Monday Actions...

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de The AI Cookbook Show by Malcolm Werchota!

Prueba gratis

#127 - [Quickbite] - Chief AI Academy — Sneak Peek into Session 1

People keep asking: "What are you actually teaching in the Werchota Chief AI Academy?" So here's a Quickbite sneak peek into Cohort 1, Session 1 — the four moments that made the room go silent. Quick context: each cohort = small group (5-15 business leaders — CFOs, Chief AI Officers, heads of procurement, HR directors), 4 weekly sessions × 2 hours. Not just listening to Malcolm — also to Maria (co-founder) and Damian (associate partner, head of engineering). And critically: participants talking to each other. Because when you leave the academy, you shouldn't just talk to us — go talk to your peers. 💥 Moment 1: The Software Armageddon Opened the session with the stock charts: HubSpot down 30-40%, Gartner in free fall since Q3, Adobe Duolingo essentially dead, Salesforce bleeding. None of these companies will literally disappear tomorrow — but the rate of decline is accelerating because their customers can now build the same thing themselves. Example demoed live: a procurement participant uses IronCloud for contract reviews — €40,000/year. We showed Claude doing the same thing, more personalized, more tailored to her business, for ~$50 in tokens. The room went silent. The pushback: "But Malcolm, companies are still buying software." Yes — out of habit. As soon as they understand headless software (no UI, just an MCP server or API key, AI orchestrates it), the whole game changes. We're giving the wrong software to the wrong people right now. Stop rolling out Microsoft Copilot to your knowledge workers. Roll out Claude Code and Codex instead — with compliance-friendly options like OpenCode. Then they can build their own solutions. 🧠 Moment 2: Reverse Prompting (the killer technique) Someone said: "I tried your prompt 'make me a sexy dashboard' and nothing came out." So we went back to basics — because most people still don't know how to prompt. The technique: Reverse Prompting. Inspired by how psychologists work — they don't ask you to articulate your trauma in one sentence. They ask you questions, and through your answers, you discover what you actually want to say. Your command to Claude/ChatGPT: "Ask me 10 questions, multiple choice, 4-5 answers each. Don't ask all at once — three at a time, wait for my answers, then formulate the next three." Now the AI is interviewing you. While you answer, you start to actually understand what you want. The AI surfaces concepts you didn't even know existed — "Do you want a Streamlit app, or maybe a web app?" — and you go: "Wait, what's a Streamlit app? Yes, that one." Promise: Malcolm will pay for your dinner if reverse prompting does not measurably improve your output. 🧠 Moment 3: The Second Brain that called me the WORST salesperson The werchota.ai "second brain" — built in 48 hours, now running in production on Azure (Victor's setup) — has access to: * Every email from every employee (including mine — anyone in the company can query my inbox) * Every meeting transcript * Everything on SharePoint * Queryable via Teams Live demo: I asked it to do a SWOT analysis of Malcolm, focused on sales. It ran for 30 minutes. Produced a dashboard that called me out: * "Malcolm is the weakest persona on the entire sales team" * "Rarely has an agenda going into calls" * "Talks all over the place, ends calls with 'OK, bye'" * "No follow-up discipline. No paper sent in 3 days, no follow-up in 1 week" * "Creates information overload for customers" The room went silent. I love when that happens. Because the point isn't to publicly roast me — it's that a Second Brain lets you do this for everyone in your team. People know their strengths. They struggle to articulate weaknesses. The Second Brain extracts them — and then Marsha can jump in on my post-meeting communication, Alex can cover ABCD, and the team plays to its actual gaps. 🦴 Moment 4: "You've been hiring AI Neanderthals" I showed them what an AI-native business leader can do — Claude Code, Codex, prompting fluency, voice-to-Excel, MCP servers, headless software integration. Then I asked: "Be honest. Your last 10 hires. How many can do this?" Answer in the room: essentially zero. You're running an AI-powered company while continuously hiring AI Neanderthals. Then you wonder why adoption is slow. If you want your company to stay Neanderthal-shaped and disappear in the next 1-2 years, continue hiring like this. I don't care. But you should. Remove "Microsoft Office" from your job descriptions. Replace with prompting, AI tool fluency, understanding of where the tech is going. The biggest leverage you have right now is who you hire next. 🎯 Three Monday Actions 1. Try Reverse Prompting today. Take any messy goal you have, paste it to Claude/ChatGPT with the formula above. Free dinner if it doesn't work. 2. Audit your last 10 hires against ~10 AI skills (prompting correctly, Claude Code / Codex fluency, MCP understanding, voice-to-Excel, second-brain literacy, etc). Expect 1-2 out of 10. You've been hiring problems — now they need re-training. Going forward, hire people who already use AI natively. Biggest leverage in the company. 3. Build a Second Brain — even a small one. Don't have to go enterprise like we did. Start with a project: shared email address, project files, meeting transcripts of one initiative. Build it. Query it. Watch what surfaces. 💬 What participants left with A Swiss consultant: "I realized I don't have a workload problem — I have a cross-department visibility problem. The Second Brain would solve that for me." Another participant: "I'm going to go try reverse prompting tomorrow. This alone is worth the session." That's the Chief AI Academy in 18 minutes. Want to come to Cohort 2? See the link below. ⏱️ Timestamps * 00:00 — What is a Quickbite + intro to the Chief AI Academy format * 02:30 — Who comes: CFOs, Chief AI Officers, procurement, HR directors * 05:00 — Moment 1: Software Armageddon — HubSpot, Gartner, Adobe, Salesforce * 08:00 — IronCloud €40k/year vs Claude $50 in tokens — live demo * 10:00 — Headless software + MCP servers explained * 12:00 — Moment 2: Reverse Prompting — the dating-your-psychologist technique * 14:00 — Moment 3: Second Brain — the SWOT that called me the worst salesperson * 16:00 — Moment 4: AI Neanderthals — your last 10 hires * 17:30 — Three Monday actions + participant takeaways 🎙️ About the Host Malcolm Werchota runs AI adoption programs for companies across Europe — close to 90 companies advised, majority in the DACH region. After 15+ years at Novartis and Schlumberger, today's focus: AI without the bullshit. Lecturer at ESADE and HSLU. Studied in Leoben. 🚀 Resources for Executives * 📚 Chief AI Academy — AI for Decision Makers [https://www.werchota.ai/chief-ai-academy] ← Cohort 2 ...

8 de jun de 202620 min

#128 - How ChatGPT Cracked an 80-Year-Old Math Problem for $1,000

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios