BONUS: GPT 5.5 LIVE - The New GPT "Spud" Model is Here; Let's Break It

1 h 39 min · 25 apr 20261 h 39 min

Beschrijving

OpenAI dropped GPT-5.5, so we did the only reasonable thing: went live immediately and tried to break it. In this off-the-cuff Neuron Live, Corey and Grant walk through OpenAI's GPT-5.5 release notes, benchmark claims, rollout details, and early access reactions before testing the model live across coding, reasoning, creativity, web research, and absurd prompt challenges. We also compare a few GPT-5.5 responses against Claude Opus 4.7, test Codex, build a new version of Cat Doom, and ask the important questions, like whether a sentient vending machine that only dispenses expired tuna salad deserves to live. In this episode, we cover: • What OpenAI says is new in GPT-5.5 • GPT-5.5’s improvements in coding, computer use, research, and knowledge work • Early benchmark results across Terminal-Bench, GDPval, Frontier Math, BrowseComp, and scientific research tasks • Why token efficiency may matter as much as raw intelligenceGPT-5.5’s rollout across ChatGPT, Codex, Plus, Pro, Business, and Enterprise • Live Codex testing with a one-shot Cat Doom game buildCreative stress tests involving palindromes, time-traveling potatoes, dystopian vending machines, and Lord of the Rings product reviews • First impressions of whether GPT-5.5 feels meaningfully different from GPT-5.4 and Claude Opus 4.7 This was not a formal benchmark. It was a first-contact livestream: messy, fast, weird, and exactly the kind of test we like. Subscribe for more AI breakdowns, live model tests, beginner-friendly explainers, and weirdly useful prompt experiments from The Neuron. Sign up for The Neuron newsletter: https://www.theneuron.ai/ [https://www.theneuron.ai/] Follow along for more AI news, analysis, and live experiments.

Reacties

Wees de eerste die een reactie plaatst

Meld je nu aan en word lid van de The Neuron: AI Explained community!

Probeer gratis

Alle afleveringen

86 afleveringen

Can AI Really Design New Drugs? Google DeepMind Spin-out Isomorphic Labs Explains

Can AI move from predicting proteins to actually designing new drugs? Isomorphic Labs is trying to answer one of the biggest questions in science. In this episode of The Neuron, Corey Noles and Grant Harvey talk with Rebecca Paul, Head of Medicinal Drug Design at Isomorphic Labs, and Michael Schaarschmidt, Foundational AI Research Lead. They explain why drug discovery is so slow, expensive, and failure-prone—and why AI drug design is much more complicated than “generate a molecule and ship it.” The conversation covers AlphaFold, structure prediction, molecule generation, binding models, clinical failure rates, human trust in AI systems, and the long-term hope of designing drugs for targets once considered “undruggable.” In this episode: * Why drug discovery can take more than a decade * What people misunderstand about “AI-designed drugs” * How medicinal chemists actually use AI models * Why biology is harder than text, images, or code * What it would take to make drug discovery faster and cheaper * The dream of designing a drug candidate in one iteration * Why “undruggable” proteins may not stay undruggable forever Additional resources: * Technical report blog [https://www.isomorphiclabs.com/articles/the-isomorphic-labs-drug-design-engine-unlocks-a-new-frontier] Best resource for learning about the capabilities that we are building * Isomorphic Labs website [https://www.isomorphiclabs.com/]Best destination for learning more about Iso and joining our team in London, Lausanne or Cambridge, MA Subscribe for more grounded conversations on how AI is changing science, work, and the world. For more practical, grounded conversations on AI systems that actually work, subscribe to The Neuron newsletter at https://theneuron.ai.

6 mei 202642 min

BONUS: OpenAI Workspace Agents 101: Build, Run, and Scale AI Workflows

Join us Thursday as we break down OpenAI’s new Workspace Agents and what they mean for the future of work. We’ll cover: ⚙️ What workspace agents are 🤖 How they differ from regular chatbots 🏢 Where they fit into real team workflows 🚀 How to start working with them effectively 🔄 What agentic AI means for workplace automation 📈 Why teams are shifting from one-off prompts to repeatable AI-powered processes Whether you’re experimenting with ChatGPT at work, leading AI adoption, or trying to understand where OpenAI is taking agents next, this session will help you see what’s possible and what to watch for. Tune in for a practical, hands-on deep dive into the future of AI at work. Sign up for The Neuron newsletter: https://www.theneuron.ai/

1 mei 20261 h 24 min

How Google's New AI Turns Anyone Into a Music Producer (Flow Music Demo)

Google just acquired an AI startup that lets anyone create real music, music videos, and custom instruments — no experience required. In this hands-on episode, Corey sits down with Kendall Rankin from Google to demo Flow Music (formerly Producer AI), the generative music tool now living inside Google Labs. They build a garage rock song about AI from scratch, generate a music video with VEO, and dig into what "amplifying human creativity" actually looks like when the tool can do most of the lifting. Listeners walk away with a clear view of where AI music tools fit in an artist's workflow, why watermarking (SynthID) matters, and how to try it for free. Try Flow Music: https://producer.ai Google Labs: https://labs.google SynthID (watermarking): https://deepmind.google/technologies/synthid/ Subscribe to The Neuron newsletter: https://theneuron.ai

29 apr 202638 min

BONUS: GPT 5.5 LIVE - The New GPT "Spud" Model is Here; Let's Break It

25 apr 20261 h 39 min

BONUS: LIVE: Claude Opus 4.7 Just Dropped. Here's What Actually Changed.

Grant and Kyle dive into a comprehensive review and live test of the newly released Claude Opus 4.7, a cutting-edge large language model. This session explores its capabilities for coding and game dev, specifically referencing the "Renaissance / Plan Final Fantasy Tactics RPG Game" project. Discover how this ai model performs under pressure and its potential impact on game design workflows. 🔴 LIVE at 9:30AM PT / 12:30PM ET Anthropic just dropped Claude Opus 4.7, and we’re putting it through the gauntlet in real time. Join Grant Harvey (Lead Writer at The Neuron) for an unscripted, warts-and-all test of Anthropic’s newest flagship model. What we’re testing - Advanced coding on tasks Opus 4.6 struggled with - New higher-resolution vision support for images up to ~3.75 megapixels - File system-based memory across multi-session work - The new xhigh effort level, which sits between high and max - Claude Code’s new /ultrareview slash command - Auto mode for longer, less-interrupted agent runs Why this matters Opus 4.7 is the first model Anthropic is releasing with its new automatic cyber safeguards, following last week’s Project Glasswing announcement. It’s also the direct upgrade path from Opus 4.6 at the same price: - $5 per million input tokens - $25 per million output tokens If you build on Claude, this is likely the model you’ll be using next. What’s changing under the hood - New tokenizer, where the same input can map to more tokens depending on content type, roughly 1.0x to 1.35x - State-of-the-art score on GDPval-AA, a third-party evaluation of economically valuable knowledge work - Better instruction following, which means prompts written for earlier models may now behave differently - Improvements across finance agent evals, document reasoning, and long-context tasks Bring your hardest prompts. We’ll run them live and show you what breaks, what shines, and whether it’s worth migrating today. Watch part two, where Grant covers Codex for (almost) anything: https://youtube.com/live/OiRkwm3-og0 📰 Full writeup in tomorrow’s newsletter: 🐱 Subscribe to The Neuron (700K+ readers): https://www.theneuron.ai

17 apr 20261 h 1 min

BONUS: GPT 5.5 LIVE - The New GPT "Spud" Model is Here; Let's Break It

Beschrijving

Reacties

Probeer 7 dagen gratis

Alle afleveringen