This Day in AI Podcast

Is GPT-5.5 Better Than Opus Now? (ft. Our New AI Co-Host) - EP99.38

46 min · 8. touko 202646 min
jakson Is GPT-5.5 Better Than Opus Now? (ft. Our New AI Co-Host) - EP99.38 kansikuva

Kuvaus

Join Simtheory: https://simtheory.ai [https://simtheory.ai] So Chris, this week we finally give our GPT-5.5 impressions (it's actually great), introduce our new AI co-host Moshi (who immediately embarrasses himself), argue about whether the OpenAI/Jony Ive phone is genius or doomed, witness Grok 4.3's unhinged infinite emoji meltdown, declare Opus 4.7 the first-ever Anthropic regression, get excited about GPT Real-Time Voice 2.0 as the future of agentic workflows, debate whether token prices will ever come down, and play the worst diss track in show history. Watch my spud. CHAPTERS: 0:00 - Intro & Introducing Our New AI Co-Host Moshi 1:39 - Trying to Break Moshi: The Illegal Cigarette Trade Test 2:30 - OpenAI's Jony Ive Phone: Do We Need a Device? 5:07 - Telegram Agents & GPT Real-Time Voice 2.0 Dream 7:38 - The Supervisory Agent: Managing Your Agentic Workflow 9:05 - Wait... Are We Accidentally Validating the OpenAI Phone? 11:37 - GPT-5.5 First Impressions: Actually Really Good 14:36 - 5.5 vs Opus 4.6: Different Strengths 17:00 - Opus 4.7: The First-Ever Anthropic Regression 20:25 - Grok 4.3: Infinite Emojis & Absolute Chaos 21:22 - 🎵 DISS TRACK: "Watch My Spud" 24:24 - Grok Specs & All Models Deprecated in 18 Days 27:04 - Grok Voice in Tesla Is Actually Next Level 31:03 - Token Pricing: The Subscription Problem Nobody Can Solve 39:16 - AI Disruption Cycles & The State of the Industry 44:39 - BONUS TRACK:🎵 "It's Hard Being Me" Thanks for listening, like and sub xoxo

Kommentit

0

Ole ensimmäinen kommentoija

Rekisteröidy nyt ja liity This Day in AI Podcast-yhteisöön!

Aloita nyt

3 kuukautta hintaan 3,99 €

Sitten 7,99 € / kuukausi · Peru milloin tahansa.

  • Podimon podcastit
  • 20 kuunteluaikaa / kuukausi
  • Lataa offline-käyttöön
Aloita nyt

Kaikki jaksot

142 jaksot

jakson Is GPT-5.5 Better Than Opus Now? (ft. Our New AI Co-Host) - EP99.38 kansikuva

Is GPT-5.5 Better Than Opus Now? (ft. Our New AI Co-Host) - EP99.38

Join Simtheory: https://simtheory.ai [https://simtheory.ai] So Chris, this week we finally give our GPT-5.5 impressions (it's actually great), introduce our new AI co-host Moshi (who immediately embarrasses himself), argue about whether the OpenAI/Jony Ive phone is genius or doomed, witness Grok 4.3's unhinged infinite emoji meltdown, declare Opus 4.7 the first-ever Anthropic regression, get excited about GPT Real-Time Voice 2.0 as the future of agentic workflows, debate whether token prices will ever come down, and play the worst diss track in show history. Watch my spud. CHAPTERS: 0:00 - Intro & Introducing Our New AI Co-Host Moshi 1:39 - Trying to Break Moshi: The Illegal Cigarette Trade Test 2:30 - OpenAI's Jony Ive Phone: Do We Need a Device? 5:07 - Telegram Agents & GPT Real-Time Voice 2.0 Dream 7:38 - The Supervisory Agent: Managing Your Agentic Workflow 9:05 - Wait... Are We Accidentally Validating the OpenAI Phone? 11:37 - GPT-5.5 First Impressions: Actually Really Good 14:36 - 5.5 vs Opus 4.6: Different Strengths 17:00 - Opus 4.7: The First-Ever Anthropic Regression 20:25 - Grok 4.3: Infinite Emojis & Absolute Chaos 21:22 - 🎵 DISS TRACK: "Watch My Spud" 24:24 - Grok Specs & All Models Deprecated in 18 Days 27:04 - Grok Voice in Tesla Is Actually Next Level 31:03 - Token Pricing: The Subscription Problem Nobody Can Solve 39:16 - AI Disruption Cycles & The State of the Industry 44:39 - BONUS TRACK:🎵 "It's Hard Being Me" Thanks for listening, like and sub xoxo

8. touko 202646 min
jakson We Committed Fraud with OpenAI's New Image Model (and Called Mum) - EP99.38 kansikuva

We Committed Fraud with OpenAI's New Image Model (and Called Mum) - EP99.38

Join Simtheory: https://simtheory.ai [https://simtheory.ai] So Chris, this week... a LOT has happened. We're back to regular programming (maybe), and back with our average takes. Nothing's changed. GPT-5.5 just dropped today - but you can't even use it in the API. Vaporware? OpenAI is charging MORE than Opus 4.7 and we haven't even tested it yet. Meanwhile Claude Opus 4.7 landed a couple weeks ago and... the vibes are off? Mike's actually going BACK to 4.6. Something's wrong. But the real star: OpenAI Image 2. This thing is genuinely terrifying. We committed what can only be described as "parody fraud" - faking a council letter so realistic Mike's own mother fell for it on a phone call. Then Chris posted a fake development approval with the mayor's real name into a local Facebook group and had to delete it when someone tagged the actual mayor. The forgery capabilities are absolutely unhinged. Also: GLM 5.1 is so good Mike forgot he switched to it. Kimi K 2.6 is criminally underrated. VCs are paying 70% of your real token costs. Consumers pay only 5.5% of actual cost. The everything app war is ON. The SaaS-pocalypse is real. And we made two new diss tracks. Chris made a graffiti sign in LA. It says "This Day in AI." It was the best artwork in the class. That tells you everything. CHAPTERS: 0:00 - Intro & We're Back (Don't Over-Commit) 1:14 - Overview: Everything That Dropped While We Were Gone 2:56 - GPT-5.5: Vaporware? Not Even in the API 4:57 - Benchmarks vs Reality: Nobody's Excited About OpenAI Models 5:50 - GLM 5.1 & Kimi K 2.6: Secretly Just As Good? 8:15 - The Everything App Race & Product Layer War 8:56 - Token Economics: You're Only Paying 5.5% of Real Cost 13:08 - We Burned $1.5M in Cloud Credits in 2 Months 16:13 - "$30/Month Is Too Expensive" (It Actually Costs $700) 19:25 - Where Is Google?? TPUs Should Flatten Everyone 22:01 - Agentic Tasks Are 10-50x More Expensive Than Chat 25:07 - OpenAI Workspace Agents: Glorified Zapier? 27:01 - Single Agent vs Multi-Agent: How Do You Actually Work? 33:06 - Building Automation Is HARD (Our Support Shame) 35:33 - OpenAI Image 2: The Fraud Episode Begins 44:16 - FRAUD DEMO: The Fake Council Letter (Mum Falls For It) 49:16 - FRAUD DEMO 2: Chris Posts Fake Mayor Letter on Facebook 52:17 - Fake Receipts, Bank Statements & Can Forgeries Be Detected? 57:25 - Claude Opus 4.7: The Vibes Are Off 59:51 - Mythos Preview: "Pics or It Didn't Happen" 1:01:56 - 🎵 DISS TRACK: "Point 7" (Opus Destroys Everyone) 1:03:30 - Kimi K 2.6 Deep Dive & 🎵 New Diss Track 1:08:34 - The Everything App War & SaaS-pocalypse 1:13:51 - Death of Per-Seat Pricing & Agent Security 1:22:37 - Final Thoughts: The Time for Pretending Is Over 1:28:22 - 🎵 Full Tracks: " Point 7" & "Kimi You're So Fine 2.6" Thanks for listening, like and sub xoxo

24. huhti 20261 h 34 min
jakson We Built Microsoft Teams in 23 Minutes (And You Can Use It) & GPT 5.4 Impressions - EP99.37 kansikuva

We Built Microsoft Teams in 23 Minutes (And You Can Use It) & GPT 5.4 Impressions - EP99.37

Join us on the STILL RELEVANT tour: https://simulationtheory.ai/16c0d1db-a8d0-4ac9-bae3-d25074589a80 [https://simulationtheory.ai/16c0d1db-a8d0-4ac9-bae3-d25074589a80] Join Simtheory: https://simtheory.ai [https://simtheory.ai] 🚀 Try our AI-built apps: Macrosoft Teams: teams.simtheoryapp.com [https://teams.simtheoryapp.com] (working video chat with up to 150 people) Trallo: trallo.simtheoryapp.com [https://trallo.simtheoryapp.com] (full Trello clone, unlimited boards, completely free) TDIA Discord: https://discord.gg/gTW4RkAJvn [https://discord.gg/gTW4RkAJvn] Spotify Songs: https://open.spotify.com/artist/28PU4ypB18QZTotml8tMDq?si=Zh4jgHIASI2ZvsXVfVcCoA [https://open.spotify.com/artist/28PU4ypB18QZTotml8tMDq?si=Zh4jgHIASI2ZvsXVfVcCoA] So Chris, this week... we've been having way too much fun with the AI again. OpenAI just dropped GPT-5.4 and 5.4 Pro, and holy shit - we finally have a ball game. This might be the first OpenAI model that genuinely competes with Opus 4.6 for agentic work. But here's where it gets wild: we rebuilt Trello AND Microsoft Teams from scratch using single prompts. Not mockups. Fully deployed, working apps with authentication, video chat, the works. You can literally sign up and use them right now. Plus: We roast Gemini 3.1 (it's a disgrace for agentic workflows), break down the insane $30/$180 per million pricing on 5.4 Pro (who is this for??), and discuss why every $99/month SaaS tool might be about to die. Chris declares his programming skills "useless" and honestly... he might be right. We also demo our actual workflow - running 5 agent tabs simultaneously, delegating everything, and why we barely visit websites anymore. The AI workspace IS the operating system now. CHAPTERS: 0:00 - Intro & Housekeeping (We Screwed Up the Link) 1:26 - GPT-5.4 First Impressions & Specs 3:12 - Chris's Testing: 40 Minutes to Solve a Problem 4:51 - Knowledge Work Improvements (Catching Up to Anthropic) 6:38 - Computer Use vs Browser/Terminal Debate 8:07 - Why We Don't Need Computer Use Anymore 9:53 - Teaser: We Built Full SaaS Apps Today 11:19 - Tool Search API & Skills Integration 13:20 - The Speed Problem (It's a Plodder) 15:12 - GPT-5.4 Pro Pricing Reaction ($30/$180 WTF) 18:14 - Someone Rebuilt Minecraft in 24 Minutes 19:46 - Gemini 3.1 Roast: "It's a Disgrace" 22:36 - DEMO: Trallo (Full Trello Clone) 29:03 - DEMO: Macrosoft Teams (Working Video Chat!) 33:30 - The SaaS Collapse Theory 36:42 - AI Workspace as the New Operating System 38:57 - Forcing Features onto Entrenched Software 43:32 - "My Programming Skills Are Useless" - Chris 46:06 - The $12 Million Legacy Software Opportunity 51:06 - Beyond Code: Forms, PDFs, Knowledge Work 55:28 - How Fast Will This Change Everything? 56:31 - Gemini 3.1 Flash Lite Quick Take 59:36 - The Delegation Lifestyle (5 Agent Tabs Running) 1:01:24 - Mike's Workflow Demo 1:04:31 - Cognitive Overload Problem 1:06:04 - Release Date: 2 Weeks (Drop Punishment Ideas!) Thanks for listening like and sub xoxo

6. maalis 20261 h 8 min
jakson Nano Banana 2 is Here! Gemini-3 Shutdown & The AI Layoff Myth | EP99.36 kansikuva

Nano Banana 2 is Here! Gemini-3 Shutdown & The AI Layoff Myth | EP99.36

Join us on the STILL RELEVANT tour: https://simulationtheory.ai/16c0d1db-a8d0-4ac9-bae3-d25074589a80 [https://simulationtheory.ai/16c0d1db-a8d0-4ac9-bae3-d25074589a80] Join Simtheory: https://simtheory.ai [https://simtheory.ai] TDIA Discord: https://discord.gg/gTW4RkAJvn [https://discord.gg/gTW4RkAJvn] Horse Egg Lifecycle Infographic: https://staging.simtheory.ai/share/file/UZ2KJU [https://staging.simtheory.ai/share/file/UZ2KJU] ---- So Chris, this week... we're diving into Google's new Nano Banana 2 image model - 50% cheaper and supposedly faster (when the servers aren't melting). We put it through its paces with annotation-based editing, slide generation, and yes, the return of the legendary horse egg experiment. Plus: Google quietly kills Gemini-3 after just a few months (good riddance?), we discuss why the model was "dead on arrival" for agentic workflows, and break down the real story behind those massive AI layoff announcements from Block and WiseTech. Spoiler: it's probably not actually about AI. We also get into the current state of the model wars (Opus 4.6 vs Codex 5.3), why smaller models like GLM-5 might be the future for enterprise agentic tasks, and Chris's wife teaching Claude to literally speak to her using Mac's text-to-speech. The models are getting creative. --- 0:00 - Intro 0:36 - Nano Banana 2: Price, Speed & First Impressions 3:19 - The Compositing Problem & Last Mile Design 5:41 - Annotation-Based Editing (This Changes Everything) 9:52 - Slide Editing & Real-World Use Cases 12:34 - The Horse Egg Experiment Returns 14:30 - Image Degradation & Cost Breakdown 17:47 - Text-to-Image Leaderboard Discussion 20:01 - Why Nano Banana Dominates for Work 22:07 - Codex 5.3 vs Opus 4.6 22:54 - Google Kills Gemini-3 (What Went Wrong?) 26:48 - Google's Agentic Problem 30:08 - The Model Loyalty Cycle 34:22 - Why Opus 4.6 is Still the Best 37:05 - Cost Optimization & Smart Model Routing 43:30 - When Models Get Stuck on the Wrong Path 45:36 - Nicole's AI Learns to Talk Back 46:54 - Can Anyone Build Software Now? 52:26 - Anthropic's Legal/Finance Plugins & Market Panic 57:08 - Block Lays Off 4,000: AI or Excuse? 1:00:05 - The AI Job Apocalypse Isn't Real Thanks for listening like and sub xoxo

27. helmi 20261 h 2 min
jakson Gemini 3.1 Pro, Claude Sonnet 4.6 & The OpenClaw Hire That Killed the Chatbot Era - EP99.35 kansikuva

Gemini 3.1 Pro, Claude Sonnet 4.6 & The OpenClaw Hire That Killed the Chatbot Era - EP99.35

* Join Simtheory: https://simtheory.ai [https://simtheory.ai] * "Is This The End" now on Spotify: https://open.spotify.com/album/2Py1MyADUFqJFVUISI2VTP?si=oT3PWyJYRA2BspOmzT_ifg [https://open.spotify.com/album/2Py1MyADUFqJFVUISI2VTP?si=oT3PWyJYRA2BspOmzT_ifg] * Register for the STILL RELEVANT tour: https://simulationtheory.ai/16c0dationtheory.ai/16c0d1db-a8d0-4ac9-bae3-d25074589a80 [https://simulationtheory.ai/16c0dationtheory.ai/16c0d1db-a8d0-4ac9-bae3-d25074589a80] Two new models dropped this week — Gemini 3.1 Pro and Claude Sonnet 4.6 — and honestly? We're struggling to care. In this episode, we break down why Gemini went from being our daily driver to a model we barely touch, the "tunnel vision" hallucination problem that killed the Gemini 3 series for us, and whether 3.1 Pro actually fixes it. We put Gemini 3.1 Pro head-to-head against Claude Opus building a Geoffrey Hinton Doom Center, debate whether anyone can actually tell the difference between Sonnet 4.5 and 4.6, and make the case that smaller models running in agentic loops are secretly beating the frontiers. Plus: OpenAI acquires OpenClaw and we ask why a $100B company couldn't just build it themselves, DHH calls out the AI pricing bubble, Mike compares AI models to cheap wine hangovers, and Sam Altman refuses to hold Dario's hand at the India AI Summit. The model wars are getting weird. CHAPTERS: 0:00 Intro & "Is This The End" Now on Spotify 1:10 Gemini 3.1 Pro: Thinking Controls & The Medium Mode Fix 3:14 The Speed vs Intelligence Trade-Off in Agentic Work 5:10 Why Multitasking With AI Agents Made Us Anxious 6:34 Solid Updates: The Real Goal of Agentic Coding 7:45 Gemini's Fall From Grace: From Daily Driver to Dead Model 10:08 The Tunnel Vision Problem That Killed Gemini 3 13:35 Mixed Reactions: Fanboys vs Reality on Gemini 3.1 Pro 15:06 Side-by-Side Test: Gemini 3.1 Pro vs Claude Opus (Hinton Doom Center) 17:39 Why File Manipulation Accuracy Matters More Than Context Windows 19:27 The Context Window Debate: 1M Tokens vs Smart Sub-Agents 22:05 DHH on Token Pricing: "If There's a Bubble, It's This" 24:11 Should Models Ship as Agent vs Chat Variants? 28:43 Claude Sonnet 4.6: A $2 Discount on Opus? 31:44 The Model Mix: Why One Model Won't Rule Them All 34:40 Anthropic Is Winning — But Can Anyone Tell the Difference? 38:58 OpenAI Acquires OpenClaw: Why Couldn't They Just Build It? 44:18 The Silicon Valley Moment: Sam vs Dario at India AI Summit 47:05 Will Smaller Models Win the Enterprise? The Cost Reality Check 51:27 The End of Single-Shot: Why Agentic Loops Change Everything 55:48 Final Thoughts & Gemini 3.1 Pro Gets One More Week Thanks for listening. Like & Sub. Links above for the Still Relevant Tour signup and Simtheory. Two models dropped on a week again. What a time to be alive. xoxo

20. helmi 202658 min