ThursdAI - The top AI news from the past week

Podcast af From Weights & Biases, Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week

engelsk

Nyheder & politik

Kom i gang

Begrænset tilbud

Derefter 99 kr. / månedOpsig når som helst.

20 lydbogstimer pr. måned
Podcasts kun på Podimo
Gratis podcasts

Kom i gang

Læs mere ThursdAI - The top AI news from the past week

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more. sub.thursdai.news

Alle episoder

133 episoder

📆 ThursdAI - Dec 4, 2025 - DeepSeek V3.2 Goes Gold Medal, Mistral Returns to Apache 2.0, OpenAI Hits Code Red, and US-Trained MOEs Are Back!

Hey yall, Alex here 🫡 Welcome to the first ThursdAI of December! Snow is falling in Colorado, and AI releases are falling even harder. This week was genuinely one of those “drink from the firehose” weeks where every time I refreshed my timeline, another massive release had dropped. We kicked off the show asking our co-hosts for their top AI pick of the week, and the answers were all over the map: Wolfram was excited about Mistral’s return to Apache 2.0, Yam couldn’t stop talking about Claude Opus 4.5 after a full week of using it, and Nisten came out of left field with an AWQ quantization of Prime Intellect’s model that apparently runs incredibly fast on a single GPU. As for me? I’m torn between Opus 4.5 (which literally fixed bugs that Gemini 3 created in my code) and DeepSeek’s gold-medal winning reasoning model. Speaking of which, let’s dive into what happened this week, starting with the open source stuff that’s been absolutely cooking. Open Source LLMs DeepSeek V3.2: The Whale Returns with Gold Medals The whale is back, folks! DeepSeek released two major updates this week: V3.2 and V3.2-Speciale. And these aren’t incremental improvements—we’re talking about an open reasoning-first model that’s rivaling GPT-5 and Gemini 3 Pro with actual gold medal Olympiad wins. Here’s what makes this release absolutely wild: DeepSeek V3.2-Speciale is achieving 96% on AIME versus 94% for GPT-5 High. It’s getting gold medals on IMO (35/42), CMO, ICPC (10/12), and IOI (492/600). This is a 685 billion parameter MOE model with MIT license, and it literally broke the benchmark graph on HMMT 2025—the score was so high it went outside the chart boundaries. That’s how you DeepSeek, basically. But it’s not just about reasoning. The regular V3.2 (not Speciale) is absolutely crushing it on agentic benchmarks: 73.1% on SWE-Bench Verified, first open model over 35% on Tool Decathlon, and 80.3% on τ²-bench. It’s now the second most intelligent open weights model and ranks ahead of Grok 4 and Claude Sonnet 4.5 on Artificial Analysis. The price is what really makes this insane: 28 cents per million tokens on OpenRouter. That’s absolutely ridiculous for this level of performance. They’ve also introduced DeepSeek Sparse Attention (DSA) which gives you 2-3x cheaper 128K inference without performance loss. LDJ pointed out on the show that he appreciates how transparent they’re being about not quite matching Gemini 3’s efficiency on reasoning tokens, but it’s open source and incredibly cheap. One thing to note: V3.2-Speciale doesn’t support tool calling. As Wolfram pointed out from the model card, it’s “designed exclusively for deep reasoning tasks.” So if you need agentic capabilities, stick with the regular V3.2. Check out the full release on Hugging Face [https://huggingface.co/deepseek-ai/DeepSeek-V3.2] or read the announcement [https://platform.deepseek.com/blog/deepseek-v3-2]. Mistral 3: Europe’s Favorite AI Lab Returns to Apache 2.0 Mistral is back, and they’re back with fully open Apache 2.0 licenses across the board! This is huge news for the open source community. They released two major things this week: Mistral Large 3 and the Ministral 3 family of small models. Mistral Large 3 is a 675 billion parameter MOE with 41 billion active parameters and a quarter million (256K) context window, trained on 3,000 H200 GPUs. There’s been some debate about this model’s performance, and I want to address the elephant in the room: some folks saw a screenshot showing Mistral Large 3 very far down on Artificial Analysis and started dunking on it. But here’s the key context that Merve from Hugging Face pointed out—this is the only non-reasoning model on that chart besides GPT 5.1. When you compare it to other instruction-tuned (non-reasoning) models, it’s actually performing quite well, sitting at #6 among open models on LMSys Arena. Nisten checked LM Arena and confirmed that on coding specifically, Mistral Large 3 is scoring as one of the best open source coding models available. Yam made an important point that we should compare Mistral to other open source players like Qwen and DeepSeek rather than to closed models—and in that context, this is a solid release. But the real stars of this release are the Ministral 3 small models: 3B, 8B, and 14B, all with vision capabilities. These are edge-optimized, multimodal, and the 3B actually runs completely in the browser with WebGPU using transformers.js. The 14B reasoning variant achieves 85% on AIME 2025, which is state-of-the-art for its size class. Wolfram confirmed that the multilingual performance is excellent, particularly for German. There’s been some discussion about whether Mistral Large 3 is a DeepSeek finetune given the architectural similarities, but Mistral claims these are fully trained models. As Nisten noted, even if they used similar architecture (which is Apache 2.0 licensed), there’s nothing wrong with that—it’s an excellent architecture that works. Lucas Atkins later confirmed on the show that “Mistral Large looks fantastic... it is DeepSeek through and through architecture wise. But Kimi also does that—DeepSeek is the GOAT. Training MOEs is not as easy as just import deepseak and train.” Check out Mistral Large 3 [https://huggingface.co/collections/mistralai/mistral-large-3] and Ministral 3 [https://huggingface.co/collections/mistralai/ministral-3] on Hugging Face. Arcee Trinity: US-Trained MOEs Are Back We had Lucas Atkins, CTO of Arcee AI, join us on the show to talk about their new Trinity family of models, and this conversation was packed with insights about what it takes to train MOEs from scratch in the US. Trinity is a family of open-weight MOEs fully trained end-to-end on American infrastructure with 10 trillion curated tokens from Datology.ai [Datology.ai]. They released Trinity-Mini (26B total, 3B active) and Trinity-Nano-Preview (6B total, 1B active), with Trinity-Large (420B parameters, 13B active) coming in mid-January 2026. The benchmarks are impressive: Trinity-Mini hits 84.95% on MMLU (0-shot), 92.1% on Math-500, and 65% on GPQA Diamond. But what really caught my attention was the inference speed—Nano generates at 143 tokens per second on llama.cpp, and Mini hits 157 t/s on consumer GPUs. They’ve even demonstrated it running on an iPhone via MLX Swift. I asked Lucas why it matters where models come from, and his answer was nuanced: for individual developers, it doesn’t really matter—use the best model for your task. But for Fortune 500 companies, compliance and legal teams are getting increasingly particular about where models were trained and hosted. This is slowing down enterprise AI adoption, and Trinity aims to solve that. Lucas shared a fascinating insight about why they decided to do full pretraining instead of just post-training on other people’s checkpoints: “We at Arcee were relying on other companies releasing capable open weight models... I didn’t like the idea of the foundation of our business being reliant on another company releasing models.” He also dropped some alpha about Trinity-Large: they’re going with 13B active parameters instead of 32B because going sparser actually gave them much faster throughput on Blackwell GPUs. The conversation about MOEs being cheaper for RL was particularly interesting. Lucas explained that because MOEs are so inference-efficient, you can do way more rollouts during reinforcement learning, which means more RL benefit per compute dollar. This is likely why we’re seeing labs like MiniMax go from their original 456B/45B-active model to a leaner 220B/10B-active model—they can get more gains in post-training by being able to do more steps. Check out Trinity-Mini [https://huggingface.co/arcee-ai/Trinity-Mini] and Trinity-Nano-Preview [https://huggingface.co/arcee-ai/Trinity-Nano-Preview] on Hugging Face, or read The Trinity Manifesto [https://www.arcee.ai/blog/the-trinity-manifesto]. OpenAI Code Red: Panic at the Disco (and Garlic?) It was ChatGPT’s 3rd birthday this week (Nov 30th), but the party vibes seem… stressful. Reports came out that Sam Altman has declared a “Code Red” at OpenAI. Why? Gemini 3.The user numbers don’t lie. ChatGPT apparently saw a 6% drop in daily active users following the Gemini 3 launch. Google’s integration is just too good, and their free tier is compelling. In response, OpenAI has supposedly paused “side projects” (ads, shopping bots) to focus purely on model intelligence and speed. Rumors point to a secret model codenamed “Garlic”—a leaner, more efficient model that beats Gemini 3 and Claude Opus 4.5 on coding reasoning, targeting a release in early 2026 (or maybe sooner if they want to save Christmas). Wolfram and Yam nailed the sentiment here: Integration wins. Wolfram’s family uses Gemini because it’s right there on the Pixel, controlling the lights and calendar. OpenAI needs to catch up not just on IQ, but on being helpful in the moment. Post the live show, OpenAI also finally added GPT 5.1 Codex Max we covered 2 weeks ago to their API and it’s now available in Cursor, for free, until Dec 11! Amazon Nova 2: Enterprise Push with Serious Agentic Chops Amazon came back swinging with Nova 2, and the jump on Artificial Analysis is genuinely impressive—from around 30% to 61% on their index. That’s a massive improvement. The family includes Nova 2 Lite (7x cheaper, 5x faster than Nova Premier), Nova 2 Pro (93% on τ²-Bench Telecom, 70% on SWE-Bench Verified), Nova 2 Sonic (speech-to-speech with 1.39s time-to-first-audio), and Nova 2 Omni (unified text/image/video/speech with 1M token context window—you can upload 90 minutes of video!). Gemini 3 Deep Think Mode Google launched Gemini 3 Deep Think mode exclusively for AI Ultra subscribers, and it’s hitting some wild benchmarks: 45.1% on ARC-AGI-2 (a 2x SOTA leap using code execution), 41% on Humanity’s Last Exam, and 93.8% on GPQA Diamond. This builds on their Gemini 2.5 variants that earned gold medals at IMO and ICPC World Finals. The parallel reasoning approach explores multiple hypotheses simultaneously, but it’s compute-heavy—limited to 10 prompts per day at $77 per ARC-AGI-2 task. This Week’s Buzz: Mid-Training Evals are Here! A huge update from us at Weights & Biases this week: We launched LLM Evaluation Jobs. (Docs [https://docs.wandb.ai/models/launch?https://wandb.ai/site/weave?utm_source=thursdai&utm_medium=referral&utm_campaign=dec4]) If you are training models or finetuning, you usually wait until the end to run your expensive benchmarks. Now, directly inside W&B, you can trigger evaluations on mid-training checkpoints. It integrates with Inspect Evals (over 100+ public benchmarks). You just point it to your checkpoint or an API endpoint (even OpenRouter!), select the evals (MMLU-Pro, GPQA, etc.), and we spin up the managed GPUs to run it. You get a real-time leaderboard of your runs vs. the field. Also, a shoutout to users of Neptune.ai [Neptune.ai]—congrats on the acquisition by OpenAI, but since the service is shutting down, we have built a migration script [http://wandb.me/migrateneptune] to help you move your history over to W&B seamlessly. We aren’t going anywhere! Video & Vision: Physics, Audio, and Speed The multimodal space was absolutely crowded this week. Runway Gen 4.5 (”Whisper Thunder”) Runway revealed that the mysterious “Whisper Thunder” model topping the leaderboards is actually Gen 4.5. The key differentiator? Physics and Multi-step adherence. It doesn’t have that “diffusion wobble” anymore. We watched a promo video where the shot changes every 3-4 seconds, and while it’s beautiful, it shows we still haven’t cracked super long consistent takes yet. But for 8-second clips? It’s apparently the new SOTA. Kling 2.6: Do you hear that? Kling hit back with Video 2.6, and the killer feature is Native Audio. I generated a clip of two people arguing, and the lip sync was perfect. Not “dubbed over” perfect, but actively generated with the video. It handles multi-character dialogue, singing, and SFX. It’s huge for creators. Kling was on a roll this week, releasing not one, but two Video Models (O1 Video is an omni modal one that takes Text, Images and Audio as inputs) and O1 Image and Kling Avatar 2.0 are also great updates! (Find all their releases on X [https://x.com/i/trending/1996381535348707447]) P-Image: Sub-Second Generation at Half a Cent Last week we talked about ByteDance’s Z-Image, which was super cool and super cheap. Well, this week Pruna AI came out with P-Image, which is even faster and cheaper: image generation under one second for $0.005, and editing under one second for $0.01. I built a Chrome extension this week (completely rewritten by Opus 4.5, by the way—more on that in a second) that lets me play with these new image models inside the Infinite Craft game. When I tested P-Image Turbo against Z-Image, I was genuinely impressed by the quality at that speed. If you want quick iterations before moving to something like Nano Banana Pro for final 4K output, these sub-second models are perfect. The extension is available on GitHub [https://github.com/altryne/infinite-fun-extension] if you want to try it—you just need to add your Replicate or Fal API keys. SeeDream 4.5: ByteDance Levels Up ByteDance also launched SeeDream 4.5 in open beta, with major improvements in detail fidelity, spatial reasoning, and multi-image reference fusion (up to 10 inputs for consistent storyboards). The text rendering is much sharper, and it supports multilingual typography including Japanese. Early tests show it competing well with Nano Banana Pro in prompt adherence and logic. 🎤 Voice & Audio Microsoft VibeVoice-Realtime-0.5B In a surprise drop, Microsoft open-sourced VibeVoice-Realtime-0.5B, a compact TTS model optimized for real-time applications. It delivers initial audible output in just 300 milliseconds while generating up to 10 minutes of speech. The community immediately started creating mirrors because, well, Microsoft has a history of releasing things on Hugging Face and then having legal pull them down. Get it while it’s hot! Use Cases: Code, Cursors, and “Antigravity” We wrapped up with some killer practical tips: * Opus 4.5 is a beast: As I mentioned, using Opus inside Cursor’s “Ask” mode is currently the supreme coding experience. It debugs logic flaws that Gemini misses completely. I also used Opus as a prompt engineer for my infographics, and it absolutely demolished GPT at creating the specific layouts I needed * Google’s Secret: Nisten dropped a bomb at the end of the show—Opus 4.5 is available for free inside Google’s Antigravity (and Colab)! If you want to try the model that’s beating GPT-5 without paying, go check Antigravity now before they patch it or run out of compute. * Microsoft VibeVoice: A surprise drop of a 0.5B speech model on HuggingFace that does real-time TTS (300ms latency). It was briefly questionable if it would stay up, but mirrors are already everywhere. That’s a wrap for this week, folks. Next week is probably going to be our final episode of the year, so we’ll be doing recaps and looking at our predictions from last year. Should be fun to see how wrong we were about everything! Thank you for tuning in. If you missed the live stream, subscribe to our Substack [https://thursdai.substack.com], YouTube [https://thursdai.news/yt], and wherever you get your podcasts. See you next Thursday! TL;DR and Show Notes Hosts and Guests * Alex Volkov - AI Evangelist & Weights & Biases (@altryne [https://x.com/altryne]) * Co Hosts - @WolframRvnwlf [https://x.com/WolframRvnwlf], @yampeleg [https://x.com/yampeleg], @nisten [https://x.com/nisten], @ldjconfirmed [https://x.com/ldjconfirmed] * Guest - Lucas Atkins (@latkins [https://x.com/latkins]) - CTO Arcee AI Open Source LLMs * DeepSeek V3.2 and V3.2-Speciale - Gold medal olympiad wins, MIT license (X [https://x.com/deepseek_ai/status/1995452641430651132], HF V3.2 [https://huggingface.co/deepseek-ai/DeepSeek-V3.2], HF Speciale [https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale], Announcement [https://platform.deepseek.com/blog/deepseek-v3-2]) * Mistral 3 family - Large 3 and Ministral 3, Apache 2.0 (X [https://x.com/MistralAI/status/1995872766177018340], Blog [https://mistral.ai/news/mistral-3/], HF Large [https://huggingface.co/collections/mistralai/mistral-large-3], HF Ministral [https://huggingface.co/collections/mistralai/ministral-3]) * Arcee Trinity - US-trained MOE family (X [https://x.com/latkins/status/1995592664637665702], HF Mini [https://huggingface.co/arcee-ai/Trinity-Mini], HF Nano [https://huggingface.co/arcee-ai/Trinity-Nano-Preview], Blog [https://www.arcee.ai/blog/the-trinity-manifesto]) * Hermes 4.3 - Decentralized training, SOTA RefusalBench (X [https://x.com/nousresearch], HF [https://huggingface.co/NousResearch/Hermes-4.3-36B]) Big CO LLMs + APIs * OpenAI Code Red - ChatGPT 3rd birthday, Garlic model in development (The Information [https://www.theinformation.com/articles/openai-developing-garlic-model-counter-googles-recent-gains]) * Amazon Nova 2 - Lite, Pro, Sonic, and Omni models (X [https://x.com/amazonnews/status/1995898375649050753], Blog [https://aws.amazon.com/blogs/aws/introducing-amazon-nova-2-lite-a-fast-cost-effective-reasoning-model/]) * Gemini 3 Deep Think - 45.1% ARC-AGI-2 (X [https://x.com/GeminiApp/status/1996656314983109003], Blog [https://blog.google/products/gemini/gemini-3-deep-think/]) * Cursor + GPT-5.1-Codex-Max - Free until Dec 11 (X [https://x.com/cursor_ai/status/1996645841063604711], Blog [https://cursor.com/blog/codex-model-harness]) This Week’s Buzz * WandB LLM Evaluation Jobs - Evaluate any OpenAI-compatible API (X [https://x.com/wandb/status/1995921086257791070], Announcement [https://wandb.ai/site/articles/llm-evaluation-jobs]) Vision & Video * Runway Gen-4.5 - #1 on text-to-video leaderboard, 1,247 Elo (X [https://x.com/runwayml/status/1995493445243461846]) * Kling VIDEO 2.6 - First native audio generation (X [https://x.com/Kling_ai/status/1996238606814593196]) * Kling O1 Image - Image generation (X [https://x.com/Kling_ai/status/1995741899517542818]) Voice & Audio * Microsoft VibeVoice-Realtime-0.5B - 300ms latency TTS (X [https://x.com/Presidentlin/status/1996461134388625628], HF [https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B]) AI Art & Diffusion * Pruna P-Image - Sub-second generation at $0.005 (X [https://x.com/PrunaAI/status/1995524846948700495], Blog [https://pruna.ai/p-image], Demo [https://demo.pruna.ai]) * SeeDream 4.5 - Multi-reference fusion, text rendering (X [https://x.com/BytePlusGlobal/status/1996212339096576463]) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe [https://sub.thursdai.news/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

05. dec. 2025 - 1 h 33 min

ThursdAI Special: Google's New Anti-Gravity IDE, Gemini 3 & Nano Banana Pro Explained (ft. Kevin Hou, Ammaar Reshi & Kat Kampf)

Hey, Alex here, I recorded these conversations just in front of the AI Engineer auditorium, back to back, after these great folks gave their talks, and at the epitome of the most epic AI week we’ve seen since I started recording ThursdAI. This is less our traditional live recording, and more a real podcast-y conversation with great folks, inspired by Latent.Space [https://substack.com/profile/89230629-latentspace]. I hope you enjoy this format as much as I’ve enjoyed recording and editing it. AntiGravity with Kevin Kevin Hou and team just launched Antigravity, Google’s brand new Agentic IDE based on VSCode, and Kevin (second timer on ThursdAI) was awesome enough to hop on and talk about some of the product decisions they made, what makes Antigravity special and highlighted Artifacts as a completely new primitive. Gemini 3 in AI Studio If you aren’t using Google’s AI Studio (ai.dev [http://ai.dev]) then you’re missing out! We talk about AI Studio all the time on the show, and I’m a daily user! I generate most of my images with Nano Banana Pro in there, most of my Gemini conversations are happening there as well! Ammaar and Kat were so fun to talk to, as they covered the newly shipped “build mode” which allows you to vibe code full apps and experiences inside AI Studio, and we also covered Gemini 3’s features, multimodality understanding, UI capabilities. These folks gave a LOT of Gemini 3 demo’s so they know everything there is to know about this model’s capabilities! Tried new things with this one, multi camera angels, conversation with great folks, if you found this content valuable, please subscribe :) Topics Covered: * Inside Google’s new “AntiGravity” IDE * How the “Agent Manager” changes coding workflows * Gemini 3’s new multimodal capabilities * The power of “Artifacts” and dynamic memory * Deep dive into AI Studio updates & Vibe Coding * Generating 4K assets with Nano Banana Pro Timestamps for your viewing convenience. 00:00 - Introduction and Overview 01:13 - Conversation with Kevin Hou: Anti-Gravity IDE 01:58 - Gemini 3 and Nano Banana Pro Launch Insights 03:06 - Innovations in Anti-Gravity IDE 06:56 - Artifacts and Dynamic Memory 09:48 - Agent Manager and Multimodal Capabilities 11:32 - Chrome Integration and Future Prospects 20:11 - Conversation with Ammar and Kat: AI Studio Team 21:21 - Introduction to AI Studio 21:51 - What is AI Studio? 22:52 - Ease of Use and User Feedback 24:06 - Live Demos and Launch Week 26:00 - Design Innovations in AI Studio 30:54 - Generative UIs and Vibe Coding 33:53 - Nano Banana Pro and Image Generation 39:45 - Voice Interaction and Future Roadmap 44:41 - Conclusion and Final Thoughts Looking forward to seeing you on Thursday 🫡 P.S - I’ve recorded one more conversation during AI Engineer, and will be posting that soon, same format, very interesting person, look out for that soon! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe [https://sub.thursdai.news/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

02. dec. 2025 - 46 min

🦃 ThursdAI - Thanksgiving special 25’ - Claude 4.5, Flux 2 & Z-image vs 🍌, MCP gets Apps + New DeepSeek!?

Hey ya’ll, Happy Thanskgiving to everyone who celebrates and thank you for being a subscriber, I truly appreciate each and every one of you! Just wrapped up the third (1 [https://sub.thursdai.news/p/thursdai-thanksgiving-special-openai], 2 [https://sub.thursdai.news/p/thursdai-thanksgiving-special-24]) Thanksgiving special Episode of ThursdAI, can you believe November is almost over? We had another banger week in AI, with a full feast of AI released, Anthropic dropped the long awaited Opus 4.5, which quickly became the top coding LLM, DeepSeek resurfaced with a math model, BFL and Tongyi both tried to take on Nano Banana, and Microsoft dropped a 7B computer use model in Open Source + Intellect 3 from Prime Intellect! With so much news to cover, we also had an interview with Ido Sal & Liad Yosef (their second time on the show!) about MCP-Apps, the new standard they are spearheading together with Anthropic, OpenAI & more! Exciting episode, let’s get into it! (P.S - I started generating infographics, so the show became much more visual, LMK if you like them) ThursdAI - I put a lot of work on a weekly basis to bring you the live show, podcast and a sourced newsletter! Please subscribe if you find this content valuable! Anthropic’s Opus 4.5: The “Premier Intelligence” Returns (Blog [https://www.anthropic.com/news/claude-opus-4-5]) Folks, Anthropic absolutely cooked. After Sonnet and Haiku had their time in the sun, the big brother is finally back. Opus 4.5 launched this week, and it is reclaiming the throne for coding and complex agentic tasks. First off, the specs are monstrous. It hits 80.9% on SWE-bench Verified, topping GPT-5.1 (77.9%) and Gemini 3 Pro (76.2%). But the real kicker? The price! It is now $5 per million input tokens and $25 per million output—literally one-third the cost of the previous Opus. Yam, our resident coding wizard, put it best during the show: “Opus knows a lot of tiny details about the stack that you didn’t even know you wanted... It feels like it can go forever.” Unlike Sonnet, which sometimes spirals or loses context on extremely long tasks, Opus 4.5 maintains coherence deep into the conversation. Anthropic also introduced a new “Effort” parameter, allowing you to control how hard the model thinks (similar to o1 reasoning tokens). Set it to high, and you get massive performance gains; set it to medium, and you get Sonnet-level performance at a fraction of the token cost. Plus, they’ve added [https://www.anthropic.com/engineering/advanced-tool-use] Tool Search (cutting enormous token overhead for agents with many tools) and Programmatic Tool Calling, which effectively lets Opus write and execute code loops to manage data. If you are doing heavy software engineering or complex automations, Opus 4.5 is the new daily driver. 📱 The Agentic Web: MCP Apps & MCP-UI Standard Speaking of MCP updates, Can you believe it’s been exactly one year since the Model Context Protocol (MCP) launched? We’ve been “MCP-pilled” for a while, but this week, the ecosystem took a massive leap forward. We brought back our friends Ido and Liad, the creators of MCP-UI, to discuss huge news: MCP-UI has been officially standardized as MCP Apps. This is a joint effort adopted by both Anthropic and OpenAI. Why does this matter? Until now, when an LLM used a tool (like Spotify or Zillow), the output was just text. It lost the brand identity and the user experience. With MCP Apps, agents can now render full, interactive HTML interfaces directly inside the chat! Ido and Liad explained that they worked hard to avoid an “iOS vs. Android” fragmentation war. Instead of every lab building their own proprietary app format, we now have a unified standard for the “Agentic Web.” This is how AI stops being a chatbot and starts being an operating system. Check out the standard at mcpui.dev [mcpui.dev]. 🦃 The Open Source Thanksgiving Feast While the big labs were busy, the open-source community decided to drop enough papers and weights to feed us for a month. Prime Intellect unveils INTELLECT-3, a 106B MoE (X [https://x.com/PrimeIntellect/status/1993895068290388134], HF [https://huggingface.co/PrimeIntellect/INTELLECT-3], Blog [https://www.primeintellect.ai/blog/intellect-3], Try It [https://chat.primeintellect.ai/]) Prime Intellect releases INTELLECT-3, a 106B parameter Mixture-of-Experts model (12B active params) based on GLM-4.5-Air, achieving state-of-the-art performance for its size—including ~90% on AIME 2024/2025 math contests, 69% on LiveCodeBench v6 coding, 74% on GPQA-Diamond reasoning, and 74% on MMLU-Pro—outpacing larger models like DeepSeek-R1. Trained over two months on 512 H200 GPUs using their fully open-sourced end-to-end stack (PRIME-RL async trainer, Verifiers & Environments Hub, Prime Sandboxes), it’s now hosted on Hugging Face, OpenRouter, Parasail, and Nebius, empowering any team to scale frontier RL without big-lab resources. Especially notable is their very detailed release blog [https://www.primeintellect.ai/blog/intellect-3], covering how a lab that previously trained 32B, finetunes a monster 106B MoE model! Tencent’s HunyuanOCR: Small but Mighty (X [https://x.com/TencentHunyuan/status/1993202595264131436], HF [https://huggingface.co/tencent/HunyuanOCR], Github [https://github.com/Tencent-Hunyuan/HunyuanOCR], Blog [https://hunyuan.tencent.com/vision/zh]) Tencent released HunyuanOCR, a 1 billion parameter model that is absolutely crushing benchmarks. It scored 860 on OCRBench, beating massive models like Qwen3-VL-72B. It’s an end-to-end model, meaning no separate detection and recognition steps. Great for parsing PDFs, docs, and even video subtitles. It’s heavily restricted (no EU/UK usage), but technically impressive. Microsoft’s Fara-7B: On-Device Computer Use Microsoft quietly dropped Fara-7B, a model fine-tuned from Qwen 2.5, specifically designed for computer use agentic tasks. It hits 73.5% on WebVoyager, beating OpenAI’s preview models, all while running locally on-device. This is the dream of a local agent that can browse the web for you, click buttons, and book flights without sending screenshots to the cloud. DeepSeek-Math-V2: open-weights IMO-gold math LLM (X [https://x.com/simonw], HF [https://huggingface.co/deepseek-ai/DeepSeek-Math-V2]) DeepSeek-Math-V2 is a 685B-parameter, Apache-2.0 licensed, open-weights mathematical reasoning model claiming gold-medal performance on IMO 2025 and CMO 2024, plus a near-perfect 118/120 on Putnam 2024. Nisten did note some limitations—specifically that the context window can get choked up on extremely long, complex proofs—but having an open-weight model of this caliber is a gift to researchers everywhere. 🐝 This Week’s Buzz: Serverless LoRA Inference A huge update from us at Weights & Biases! We know fine-tuning is powerful, but serving those fine-tunes can be a pain and expensive. We just launched Serverless LoRA Inference. This means you can upload your small LoRA adapters (which you can train cheaply) to W&B Artifacts, and we will serve them instantly on CoreWeave GPUs on top of a base model. No cold starts, no dedicated expensive massive GPU instances for just one adapter. I showed a demo of a “Mocking SpongeBob” model I trained in 25 minutes. It just adds that SaRcAsTiC tExT style to the Qwen 2.5 base model. You pass the adapter ID in the API call, and boom—customized intelligence instantly. You can get more details HERE [https://wandb.ai/wandb_fc/llm_tools/reports/Serverless-LoRA-inference-on-W-B-CoreWeave--Vmlldzo5MjUwNzAx] and get started with your own LORA in this nice notebook [https://wandb.me/lora_nb] the team made! 🎨 Visuals: Image & Video Generation Explosion Flux.2: The Multi-Reference Image Creator from BFL (X [https://x.com/bfl_ml/status/1993345470945804563], HF [https://huggingface.co/black-forest-labs/FLUX.2-dev], Blog [https://bfl.ai/blog/flux-2]) Black Forest Labs released Flux.2, a series of models including a 32B Flux 2[DEV]. The killer feature here is Multi-Reference Editing. You can feed it up to 10 reference images to maintain character consistency, style, or specific objects. It also outputs native 4-megapixel images. Honestly, the launch timing was rough, coming right after Google’s Nano Banana Pro and alongside Z-Image, but for precise, high-res editing, this is a serious tool. Tongyi drops Z-Image Turbo: 6B single-stream DiT lands sub‑second, 8‑step text‑to‑image (GitHub [https://github.com/Tongyi-MAI/Z-Image], Hugging Face [https://huggingface.co/Tongyi-MAI/Z-Image-Turbo]) Alibaba’s Tongyi Lab released Z-Image Turbo, a 6B parameter model that generates images in sub-second time on H800s (and super fast on consumer cards). I built a demo to show just how fast this is. You know that “Infinite Craft [https://neal.fun/infinite-craft/]“ game? I hooked it up to Z-Image Turbo so that every time you combine elements (like Pirate + Ghost), it instantly generates the image for “Ghost Pirate.” It changes the game completely when generation is this cheap and fast. HunyuanVideo 1.5 – open video gets very real Tencent also shipped HunyuanVideo 1.5, which they market as “the strongest open‑source video generation model.” For once, the tagline isn’t entirely hype. Under the hood it’s an 8.3B‑parameter Diffusion Transformer (DiT) model with a 3D causal VAE in front. The VAE compresses videos aggressively in both space and time, and the DiT backbone models that latent sequence. The important bits for you and me: * It generates 5–10 second clips at 480p/720p with good motion coherence and physics. * With FP16 or FP8 configs you can run it on a single consumer GPU with around 14GB VRAM. * There’s a built‑in path to upsample to 1080p for “real” quality. LTX Studio Retake: Photoshop for Video (X [https://x.com/LTXStudio/status/1993715247031767298], Try It [https://replicate.com/lightricks/ltx-2-retake]) For the video creators, LTX Studio launched Retake. This isn’t just “regenerate video.” This allows you to select a specific 2-second segment of a video, change the dialogue (keeping the voice!), change the emotion, or edit the action, all for like $0.10. It blends it perfectly back into the original clip. We are effectively getting a “Director Mode” for AI video where you can fix mistakes without rolling the dice on a whole new generation. A secret new model on the Arena called Whisper Thunder beats even Veo 3? This was a surprise of the week, while new video models get released often, Veo 3 has been the top one for a while, and now we’re getting a reshuffling of the video giants! But... we don’t yet know who this video model is from! You can sometimes get its generations at the Artificial Analysis [https://artificialanalysis.ai/video/arena] video arena here, and the outputs look quite awesome! Thanksgiving reflections from the ThursdAI team As we wrapped up the show, Wolfram suggested we take a moment to think about what we’re thankful for in AI, and I think that’s a perfect note to end on. Wolfram put it well: he’s thankful for everyone contributing to this wonderful community—the people releasing models, creating open source tools, writing tutorials, sharing knowledge. It’s not just about the money; it’s about the love of learning and building together. Yam highlighted something I think is crucial: we’ve reached a point where there’s no real competition between open source and closed source anymore. Everything is moving forward together. Even if you think nobody’s looking at that random code you posted somewhere, chances are someone found it and used it to accelerate their own work. That collective effort is what’s driving this incredible pace of progress. For me, I want to thank Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Łukasz Kaiser, and Ilya Polosukhin for the 2017 paper “Attention Is All You Need.” Half Joking! But without the seminal attention is you need paper none of this AI was possible. But mostly I want to thank all of you—the audience, the co-hosts, the guests—for making ThursdAI what it is. If you go back and watch our 2024 Thanksgiving episode [https://sub.thursdai.news/p/thursdai-thanksgiving-special-24], or the one from 2023 [https://sub.thursdai.news/p/thursdai-thanksgiving-special-openai], you’ll be shocked at how far we’ve come. Tools that seemed magical a year ago are now just... normal. That’s hedonic adaptation at work, but it’s also a reminder to stay humble and appreciate just how incredible this moment in history really is. We’re living through the early days of a technological revolution, and we get to document it, experiment with it, and help shape where it goes. That’s something to be genuinely thankful for. TL;DR and Show Notes * Hosts and Guests * Alex Volkov - AI Evangelist & Weights & Biases (@altryne [https://x.com/altryne]) * Co-Hosts - @WolframRvnwlf [https://x.com/WolframRvnwlf] @yampeleg [https://x.com/yampeleg] @nisten [https://x.com/nisten] @ldjconfirmed [https://x.com/ldjconfirmed] * Guests: @idosal1 [https://x.com/idosal1] @liadyosef [https://x.com/liadyosef] - MCP-UI/MCP Apps * Big CO LLMs + APIs * Anthropic launches Claude Opus 4.5 - world’s top model for coding, agents, and tool use (X [https://x.com/claudeai/status/1993030546243699119], Announcement [https://www.anthropic.com/news/claude-opus-4-5], Blog [https://www.anthropic.com/engineering/advanced-tool-use]) * OpenAI Integrates ChatGPT Voice Mode Directly into Chats (X [https://x.com/OpenAI/status/1993381101369458763]) * Open Source LLMs * Prime Intellect - INTELLECT-3 106B MoE (X [https://x.com/PrimeIntellect/status/1993895068290388134], HF [https://huggingface.co/PrimeIntellect/INTELLECT-3], Blog [https://www.primeintellect.ai/blog/intellect-3], Try It [https://chat.primeintellect.ai/]) * Tencent - HunyuanOCR 1B SOTA OCR model (X [https://x.com/TencentHunyuan/status/1993202595264131436], HF [https://huggingface.co/tencent/HunyuanOCR], Github [https://github.com/Tencent-Hunyuan/HunyuanOCR], Blog [https://hunyuan.tencent.com/vision/zh]) * Microsoft - Fara-7B on-device computer-use agent (X [https://x.com/MSFTResearch/status/1993024319186674114], Blog [https://www.microsoft.com/en-us/research/blog/fara-7b-best-in-class-7b-parameter-vision-language-model-for-computer-use/], HF [https://huggingface.co/microsoft/Fara-7B], Github [https://github.com/microsoft/fara]) * DeepSeek - Math-V2 IMO-gold math LLM (HF [https://huggingface.co/deepseek-ai/DeepSeek-Math-V2]) * Interview: MCP Apps * MCP-UI standardized as MCP Apps by Anthropic and OpenAI (X [https://x.com/idosal1/status/1992636462186029233], Blog [https://blog.modelcontextprotocol.io/posts/2025-11-21-mcp-apps-extending-servers-with-interactive-user-interfaces], Announcement [https://mcpui.dev]) * Vision & Video * Tencent - HunyuanVideo 1.5 lightweight DiT open video model (X [https://x.com/TencentHunyuan/status/1991721236855156984], GitHub [https://github.com/Tencent/HunyuanVideo], HF [https://huggingface.co/tencent/HunyuanVideo]) * LTX Studio - Retake AI video editing tool (X [https://x.com/LTXStudio/status/1993715247031767298], Try It [https://replicate.com/lightricks/ltx-2-retake]) * Whisper Thunder - mystery #1 ranked video model on arena * AI Art & Diffusion * Black Forest Labs - FLUX.2 32B multi-reference image model (X [https://x.com/bfl_ml/status/1993345470945804563], HF [https://huggingface.co/black-forest-labs/FLUX.2-dev], Blog [https://bfl.ai/blog/flux-2]) * Tongyi - Z-Image Turbo sub-second 6B image gen (GitHub [https://github.com/Tongyi-MAI/Z-Image], HF [https://huggingface.co/Tongyi-MAI/Z-Image-Turbo]) * This Week’s Buzz * W&B launches Serverless LoRA Inference on CoreWeave (X [https://x.com/wandb/status/1993032159985385978], Blog [https://wandb.ai/wandb_fc/llm_tools/reports/Serverless-LoRA-inference-on-W-B-CoreWeave--Vmlldzo5MjUwNzAx], Notebook [https://wandb.me/lora_nb]) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe [https://sub.thursdai.news/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

27. nov. 2025 - 1 h 21 min

📆 ThursdAI - the week that changed the AI landscape forever - Gemini 3, GPT codex max, Grok 4.1 & fast, SAM3 and Nano Banana Pro

Hey everyone, Alex here 👋 I’m writing this one from a noisy hallway at the AI Engineer conference in New York, still riding the high (and the sleep deprivation) from what might be the craziest week we’ve ever had in AI. In the span of a few days: Google dropped Gemini 3 Pro, a new Deep Think mode, generative UIs, and a free agent-first IDE called Antigravity.xAI shipped Grok 4.1, then followed it up with Grok 4.1 Fast plus an Agent Tools API.OpenAI answered with GPT‑5.1‑Codex‑Max, a long‑horizon coding monster that can work for more than a day, and quietly upgraded ChatGPT Pro to GPT‑5.1 Pro.Meta looked at all of that and said “cool, we’ll just segment literally everything and turn photos into 3D objects” with SAM 3 and SAM 3D.Robotics folks dropped a home robot trained with almost no robot data.And Google, just to flex, capped Thursday with Nano Banana Pro, a 4K image model and a provenance system while we were already live! For the first time in a while it doesn’t just feel like “new models came out.” It feels like the future actually clicked forward a notch. This is why ThursdAI exists. Weeks like this are basically impossible to follow if you have a day job, so my co‑hosts and I do the no‑sleep version so you don’t have to. Plus, being at AI Engineer makes it easy to get super high quality guests so this week we had 3 folks join us, Swyx from Cognition/Latent Space, Thor from DeepMind (on his 3rd day) and Dominik from OpenAI! Alright, deep breath. Let’s untangle the week. TL;DR If you only skim one section, make it this one (links in the end): * Google * Gemini 3 Pro: 1M‑token multimodal model, huge reasoning gains - new LLM king * ARC‑AGI‑2: 31.11% (Pro), 45.14% (Deep Think) – enormous jumps * Antigravity IDE: free, Gemini‑powered VS Code fork with agents, plans, walkthroughs, and browser control * Nano Banana Pro: 4K image generation with perfect text + SynthID provenance; dynamic “generative UIs” in Gemini * xAI * Grok 4.1: big post‑training upgrade – #1 on human‑preference leaderboards, much better EQ & creative writing, fewer hallucinations * Grok 4.1 Fast + Agent Tools API: 2M context, SOTA tool‑calling & agent benchmarks (Berkeley FC, T²‑Bench, research evals), aggressive pricing and tight X + web integration * OpenAI * GPT‑5.1‑Codex‑Max: “frontier agentic coding” model built for 24h+ software tasks with native compaction for million‑token sessions; big gains on SWE‑Bench, SWE‑Lancer, TerminalBench 2 * GPT‑5.1 Pro: new “research‑grade” ChatGPT mode that will happily think for minutes on a single query * Meta * SAM 3: open‑vocabulary segmentation + tracking across images and video (with text & exemplar prompts) * SAM 3D: single‑image → 3D objects & human bodies; surprisingly high‑quality 3D from one photo * Robotics * Sunday Robotics – ACT‑1 & Memo: home robot foundation model trained from a $200 skill glove instead of $20K teleop rigs; long‑horizon household tasks with solid zero‑shot generalization * Developer Tools * Antigravity and Marimo’s VS Code / Cursor extension both push toward agentic, reactive dev workflows Live from AI Engineer New York: Coding Agents Take Center Stage We recorded this week’s show on location at the AI Engineer Summit in New York, inside a beautiful podcast studio the team set up right on the expo floor. Huge shout out to Swyx, Ben, and the whole AI Engineer crew for that — last time I was balancing a mic on a hotel nightstand, this time I had broadcast‑grade audio while a robot dog tried to steal the show behind us. This year’s summit theme is very on‑the‑nose for this week: coding agents. Everywhere you look, there’s a company building an “agent lab” on top of foundation models. Amp, Cognition, Cursor, CodeRabbit, Jules, Google Labs, all the open‑source folks, and even the enterprise players like Capital One and Bloomberg are here, trying to figure out what it means to have real software engineers that are partly human and partly model. Swyx framed it nicely when he said that if you take “vertical AI” seriously enough, you eventually end up building an agent lab. Lawyers, healthcare, finance, developer tools — they all converge on “agents that can reason and code.” The big labs heard that theme loud and clear. Almost every major release this week is about agents, tools, and long‑horizon workflows, not just chat answers. Google Goes All In: Gemini 3 Pro, Antigravity, and the Agent Revolution Let’s start with Google because, after years of everyone asking “where’s Google?” in the AI race, they showed up this week with multiple bombshells that had even the skeptics impressed. Gemini 3 Pro: Multimodal Intelligence That Actually Delivers Google finally released Gemini 3 Pro, and the numbers are genuinely impressive. We’re talking about a 1 million token context window, massive benchmark improvements, and a model that’s finally competing at the very top of the intelligence charts. Thor from DeepMind joined us on the show (literally on day 3 of his new job!) and you could feel the excitement. The headline numbers: Gemini 3 Pro with Deep Think mode achieved 45.14% on ARC-AGI-2—that’s roughly double the previous state-of-the-art on some splits. For context, ARC-AGI has been one of those benchmarks that really tests genuine reasoning and abstraction, not just memorization. The standard Gemini 3 Pro hits 31.11% on the same benchmark, both scores are absolutely out of this world in Arc! On GPQA Diamond, Gemini 3 Pro jumped about 10 points compared to prior models. We’re seeing roughly 81% on MMLU-Pro, and the coding performance is where things get really interesting—Gemini 3 Pro is scoring around 56% on SciCode, representing significant improvements in actual software engineering tasks. But here’s what made Ryan from Amp switch their default model to Gemini 3 Pro immediately: the real-world usability. Ryan told us on the show that they’d never switched default models before, not even when GPT-5 came out, but Gemini 3 Pro was so noticeably better that they made it the default on Tuesday. Of course, they hit rate limits almost immediately (Google had to scale up fast!), but those have since been resolved. Antigravity: Google’s Agent-First IDE Then Google dropped Antigravity, and honestly, this might be the most interesting part of the whole release. It’s a free IDE (yes, free!) that’s basically a fork of VS Code, but reimagined around agents rather than human-first coding. The key innovation here is something they call the “Agent Manager”—think of it like an inbox for your coding agents. Instead of thinking in folders and files, you’re managing conversations with agents that can run in parallel, handle long-running tasks, and report back when they need your input. I got early access and spent time playing with it, and here’s what blew my mind: you can have multiple agents working on different parts of your codebase simultaneously. One agent fixing bugs, another researching documentation, a third refactoring your CSS—all at once, all coordinated through this manager interface. The browser integration is crazy too. Antigravity can control Chrome directly, take screenshots and videos of your app, and then use those visuals to debug and iterate. It’s using Gemini 3 Pro for the heavy coding, and even Nano Banana for generating images and assets. The whole thing feels like it’s from a couple years in the future. Wolfram on the show called out how good Gemini 3 is for creative writing too—it’s now his main model, replacing GPT-4.5 for German language tasks. The model just “gets” the intention behind your prompts rather than following them literally, which makes for much more natural interactions. Nano Banana Pro: 4K Image Generation With Thinking And because Google apparently wasn’t done announcing things, they also dropped Nano Banana Pro on Thursday morning—literally breaking news during our live show. This is their image generation model that now supports 4K resolution and includes “thinking” traces before generating. I tested it live by having it generate an infographic about all the week’s AI news (which you can see on the top), and the results were wild. Perfect text across the entire image (no garbled letters!), proper logos for all the major labs, and compositional understanding that felt way more sophisticated than typical image models. The file it generated was 8 megabytes—an actual 4K image with stunning detail. What’s particularly clever is that Nano Banana Pro is really Gemini 3 Pro doing the thinking and planning, then handing off to Nano Banana for the actual image generation. So you get multimodal reasoning about your request, then production-quality output. You can even upload reference images—up to 14 of them—and it’ll blend elements while maintaining consistency. Oh, and every image is watermarked with SynthID (Google’s invisible watermarking tech) and includes C2PA metadata, so you can verify provenance. This matters as AI-generated content becomes more prevalent. Generative UIs: The Future of Interfaces One more thing Google showed off: generative UIs in the Gemini app. Wolfram demoed this for us, and it’s genuinely impressive. Instead of just text responses, Gemini can generate full interactive mini-apps on the fly—complete dashboards, data visualizations, interactive widgets—all vibe-coded in real time. He asked for “four panels of the top AI news from last week” and Gemini built an entire news dashboard with tabs, live market data (including accurate pre-market NVIDIA stats!), model comparisons, and clickable sections. It pulled real information, verified facts, and presented everything in a polished UI that you could interact with immediately. This isn’t just a demo—it’s rolling out in Gemini now. The implication is huge: we’re moving from static responses to dynamic, contextual interfaces generated just-in-time for your specific need. xAI Strikes Back: Grok 4.1 and the Agent Tools API Not to be outdone, xAI released Grok 4.1 at the start of the week, briefly claimed the #1 spot on LMArena (at 1483 Elo, not 2nd to Gemini 3), and then followed up with Grok 4.1 Fast and a full Agent Tools API. Grok 4.1: Emotional Intelligence Meets Raw Performance Grok 4.1 brought some really interesting improvements. Beyond the benchmark numbers (64% win rate over the previous Grok in blind tests), what stood out was the emotional intelligence. On EQ-Bench3, Grok 4.1 Thinking scored 1586 Elo, beating every other model including Gemini, GPT-5, and Claude. The creative writing scores jumped by roughly 600 Elo points compared to earlier versions. And perhaps most importantly for practical use, hallucination rates dropped from around 12% to 4%—that’s roughly a 3x improvement in reliability on real user queries. xAI’s approach here was clever: they used “frontier agentic reasoning models as reward models” during RL training, which let them optimize for subjective qualities like humor, empathy, and conversational style without just scaling up model size. Grok 4.1 Fast: The Agent Platform Play Then came Grok 4.1 Fast, released just yesterday, and this is where things get really interesting for developers. It’s got a 2 million token context window (compared to Gemini 3’s 1 million) and was specifically trained for agentic, tool-calling workflows. The benchmark performance is impressive: 93-100% on τ²-Bench Telecom (customer support simulation), ~72% on Berkeley Function Calling v4 (top of the leaderboard), and strong scores across research and browsing tasks. But here’s the kicker: the pricing is aggressive. At $0.20 per million input tokens and $0.50 per million output tokens, Grok 4.1 Fast is dramatically cheaper than GPT-5 and Claude while matching or exceeding their agentic performance. For the first two weeks, it’s completely free via the xAI API and OpenRouter, which is smart—get developers hooked on your agent platform. The Agent Tools API gives Grok native access to X search, web browsing, code execution, and document retrieval. This tight integration with X is a genuine advantage—where else can you get real-time access to breaking news, sentiment, and conversation? Yam tested it on the show and confirmed that Grok will search Reddit too, which other models often refuse to do. I’ve used both these models this week in my N8N research agent and I gotta say, 4.1 fast is a MASSIVE improvement! OpenAI’s Endurance Play: GPT-5.1-Codex-Max and Pro OpenAI clearly saw Google and xAI making moves and decided they weren’t going to let this week belong to anyone else. They dropped two significant releases: GPT-5.1-Codex-Max and an update to GPT-5.1 Pro. GPT-5.1-Codex-Max: Coding That Never Stops This is the headline: GPT-5.1-Codex-Max can work autonomously for over 24 hours. Not 24 minutes, not 24 queries—24 actual hours on a single software engineering task. I talked to someone from OpenAI at the conference who told me internal checkpoints ran for nearly a week on and off. How is this even possible? The secret is something OpenAI calls “compaction”—a native mechanism trained into the model that lets it prune and compress its working session history while preserving the important context. Think of it like the model taking notes on itself, discarding tool-calling noise and keeping only the critical design decisions and state. The performance numbers back this up: * SOTA 77.9% on SWE-Bench Verified (up from 73.7%) * SOTA 79.9% on SWE-Lancer IC SWE (up from 66.3%) * 58.1% on TerminalBench 2.0 (up from 52.8%) And crucially, in medium reasoning mode, it uses 30% fewer thinking tokens while achieving better results. There’s also an “Extra High” reasoning mode for when you truly don’t care about latency and just want maximum capability. Yam, one of our co-hosts who’s been testing extensively, said you can feel the difference immediately. The model just “gets it” faster, powers through complex problems, and the earlier version’s quirk of ignoring your questions and just starting to code is fixed—now it actually responds and collaborates. Dominic from OpenAI joined us on the show and confirmed that compaction was trained natively into the model using RL, similar to how Claude trained natively for MCP. This means the model doesn’t waste reasoning tokens on maintaining context—it just knows how to do it efficiently. GPT-5.1 Pro: Research-Grade Intelligence & ChatGPT joins your group chat1 Then there’s GPT-5.1 Pro, which is less about coding and more about deep, research-level reasoning. This is the model that can run for 10-17 minutes on a single query, thinking through complex problems with the kind of depth that previously required human experts. OpenAI also quietly rolled out group chats—basically, you can now have multiple people in a ChatGPT conversation together, all talking to the model simultaneously. Useful for planning trips, brainstorming with teams, or working through problems collaboratively. If agent mode works in group chats (we haven’t confirmed yet), that could get really interesting. Meta drops SAM3 & SAM3D - image and video segmentation models powered by natural language Phew ok, big lab releases now done, oh.. wait not yet! Because Meta has decided to also make a dent on this Week with SAM3 and SAM3D, which both are crazy. I’ll just add their video release here instead of going on and on! This Week’s Buzz from Weights & Biases It’s been a busy week at Weights & Biases as well! We are proud Gold Sponsors of the AI Engineer conference here in NYC. If you’re at the event, please stop by our booth—we’re even giving away a $4,000 robodog! This week, I want to highlight a fantastic update from Marimo, the reactive Python notebook company we acquired. Marimo just shipped a native VS Code and Cursor extension. This brings Marimo’s reactive, Git-friendly notebooks directly into your favorite editors. Crucially, it integrates deeply with uv for lightning-fast package installs and reproducible environments. If you import a package you don’t have, the extension prompts you to install it and records the dependency in the script metadata. This bridges the gap between experimental notebooks and production-ready code, and it’s a huge boost for AI-native development workflows. (Blog [https://marimo.io/blog] , GitHub [https://github.com/marimo-team/marimo-lsp] ) The Future Arrived Early Phew... if you read all the way until this point, can you leave a ⚡ emoji in the comemnts? I was writing this and it.. is a lot! I was wondering who would even read all the way till here! This week we felt the acceleration! 🔥 I can barely breathe, I need a nap! A huge thank you to our guests—Ryan, Swyx, Thor, and Dominik—for navigating the chaos with us live on stage, and to the AI Engineer team for hosting us. We’ll be back next week to cover whatever the AI world throws at us next. Stay tuned, because at this rate, AGI might be here by Christmas. TL;DR - show notes and links Hosts and Co‑hosts * Alex Volkov – AI Evangelist at Weights & Biases / CoreWeave, host of ThursdAI (X [https://x.com/altryne]) * Co‑hosts - Wolfram Ravenwolf – (X [https://x.com/WolframRvnwlf]), Yam Peleg (X [https://x.com/yampeleg]) LDJ (X [https://x.com/ldjconfirmed]) Guests * Swyx – Founder of AI Engineer World’s Fair and Summit, now at Cognition ( Latent.Space [https://substack.com/profile/89230629-latentspace] , X [https://x.com/swyx]) * Ryan Carson – Amp (X [https://x.com/ryancarson]) * Thor Schaeff – Google DeepMind, Gemini API and AI Studio (X [https://x.com/thorwebdev]) * Dominik Kundel – Developer Experience at OpenAI (X [https://x.com/dkundel]) Open Source LLMs * Allen Institute Olmo 3 - 7B/32B fully open reasoning suite with end-to-end training transparency (X [https://x.com/allen_ai/status/1991507983881379896], Blog [https://allenai.org/olmo]) Big CO LLMs + APIs * Google Gemini 3 Pro - 1M-token, multimodal, agentic model with Generative UIs (X [https://x.com/altryne/status/1990812491304350130], X [https://x.com/sundarpichai/status/1990812770762215649], X [https://x.com/GoogleDeepMind/status/1990812966074376261]) * Google Antigravity - Agent-first IDE powered by Gemini 3 Pro (Blog [https://antigravity.google/blog/introducing-google-antigravity], X [https://x.com/GoogleDeepMind/status/1990827890435346787]) * xAI Grok 4.1 and Grok 4.1 Thinking - big gains in Coding, EQ, creativity, and honesty (X [https://x.com/altryne/status/1990526775148097662], Blog [https://x.ai/blog/grok-4-1]) * xAI Grok 4.1 Fast and Agent Tools API - 2M-token context, state-of-the-art tool-calling (X [https://x.com/xai/status/1991284813727474073]) * OpenAI GPT-5.1-Codex-Max - long-horizon agentic coding model for 24-hour+ software tasks (X [https://x.com/polynoamial/status/1991212955250327768], X [https://x.com/OpenAIDevs/status/1991217488550359066]) * OpenAI GPT-5.1 Pro - research-grade reasoning model in ChatGPT Pro * Microsoft, NVIDIA, and Anthropic partnership - to scale Claude on Azure with massive GPU investments (Announcement [https://www.anthropic.com/news/microsoft-nvidia-partnership], NVIDIA [https://nvidianews.nvidia.com/news/nvidia-microsoft-anthropic-partnership], Microsoft Blog [https://blogs.microsoft.com/2025/11/partnership-anthropic-nvidia-azure]) This weeks Buzz * Marimo ships native VS Code & Cursor extension with reactive notebooks and uv-powered environments (X [https://x.com/marimo_io/status/1991207581763981722], Blog [https://marimo.io/blog], GitHub [https://github.com/marimo-team/marimo-lsp]) Vision & Video & 3D * Meta SAM 3 & SAM 3D - promptable segmentation, tracking, and single-image 3D reconstruction (X [https://x.com/AIatMeta/status/1991178519557046380], Blog [https://ai.meta.com/blog/sam-3d/], GitHub [https://github.com/facebookresearch/segment-anything]) AI Art & Diffusion * Google Nano Banana Pro and SynthID verification - 4K image generation with provenance (Blog [https://blog.google/technology/developers/gemini-3-developers/]) Show Notes and other Links * AI Engineer Summit NYC - Live from the conference * Full livestream available on YouTube * ThursdAI - Nov 20, 2025 This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe [https://sub.thursdai.news/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

20. nov. 2025 - 1 h 29 min

GPT‑5.1’s New Brain, Grok’s 2M Context, Omnilingual ASR, and a Terminal UI That Sparks Joy

Hey, this is Alex! We’re finally so back! Tons of open source releases, OpenAI updates GPT and a few breakthroughs in audio as well, makes this a very dense week! Today on the show, we covered the newly released GPT 5.1 update, a few open source releases like Terminal Bench and Project AELLA (renamed OASSAS), and Baidu’s Ernie 4.5 VL that shows impressive visual understanding! Also, chatted with Paul from 11Labs and Dima Duev from the wandb SDK team, who brought us a delicious demo of LEET, our new TUI for wandb! Tons of news coverage, let’s dive in 👇 (as always links and show notes in the end) Open Source AI Let’s jump directly into Open Source as this week has seen some impressive big company models. Terminal-Bench 2.0 - a harder, highly‑verified coding and terminal benchmark (X [https://x.com/alexgshaw/status/1986911106108211461], Blog [https://harborframework.com/], Leaderboard [https://www.tbench.ai/leaderboard]) We opened with Terminal‑Bench 2.0 plus its new harness, Harbor, because this is the kind of benchmark we’ve all been asking for. Terminal‑Bench focuses on agentic coding in a real shell. Version 2.0 is a hard set of 89 terminal tasks, each one painstakingly vetted by humans and LLMs to make sure it’s solvable and realistic. Think “I checked out master and broke my personal site, help untangle the git mess” or “implement GPT‑2 code golf with the fewest characters.” On the new leaderboard, top agents like Warp’s agentic console and Codex CLI + GPT‑5 sit around fifty percent success. That number is exactly what excites me: we’re nowhere near saturation. When everyone is in the 90‑something range, tiny 0.1 improvements are basically noise. When the best models are at fifty percent, a five‑point jump really means something. A huge part of our conversation focused on reproducibility. We’ve seen other benchmarks like OSWorld turn out to be unreliable, with different task sets and non‑reproducible results making scores incomparable. Terminal‑Bench addresses this with Harbor, a harness designed to run sandboxed, containerized agent rollouts at scale in a consistent environment. This means results are actually comparable. It’s a ton of work to build an entire evaluation ecosystem like this, and with over a thousand contributors on their Discord, it’s a fantastic example of a healthy, community‑driven effort. This is one to watch! Baidu’s ERNIE‑4.5‑VL “Thinking”: a 3B visual reasoner that punches way up (X [https://x.com/Baidu_Inc/status/1988182106359411178], HF [https://huggingface.co/ERNIE/ERNIE-4.5-VL-28B-A3B-Thinking], GitHub [https://github.com/ERNIE/ERNIE-4.5-VL-28B-A3B-Thinking]) Next up, Baidu dropped a really interesting model, ERNIE‑4.5‑VL‑28B‑A3B‑Thinking. This is a compact, 3B active‑parameter multimodal reasoning model focused on vision, and it’s much better than you’d expect for its size. Baidu’s own charts show it competing with much larger closed models like Gemini‑2.5‑Pro and GPT‑5‑High on a bunch of visual benchmarks like ChartQA and DocVQA. During the show, I dropped a fairly complex chart into the demo, and ERNIE‑4.5‑VL gave me a clean textual summary almost instantly—it read the chart more cleanly than I could. The model is built to “think with images,” using dynamic zooming and spatial grounding to analyze fine details. It’s released under an Apache‑2.0 license, making it a serious candidate for edge devices, education, and any product where you need a cheap but powerful visual brain. Open Source Quick Hits: OSSAS, VibeThinker, and Holo Two We also covered a few other key open-source releases. Project AELLA was quickly rebranded to OSSAS (Open Source Summaries At Scale), an initiative to make scientific literature machine‑readable. They’ve released 100k paper summaries, two fine-tuned models for the task, and a 3D visualizer. It’s a niche but powerful tool if you’re working with massive amounts of research. (X [https://x.com/samhogan/status/1988306424309706938], HF [https://huggingface.co/inference-net]) WeiboAI (from the Chinese social media company) released VibeThinker‑1.5B, a tiny 1.5B‑parameter reasoning model that is making bold claims about beating the 671B DeepSeek R1 on math benchmarks. We discussed the high probability of benchmark contamination, especially on tests like AIME24, but even with that caveat, getting strong chain‑of‑thought math out of a 1.5B model is impressive and useful for resource‑constrained applications. (X [https://x.com/WeiboLLM/status/1988109435902832896], HF [https://huggingface.co/WeiboAI/VibeThinker-1.5B], Arxiv [https://arxiv.org/abs/2511.06221]) Finally, we had some breaking news mid‑show: H Company released Holo Two, their next‑gen multimodal agent for controlling desktops, websites, and mobile apps. It’s a fine‑tune of Qwen3‑VL and comes in 4B and 8B Apache‑2.0 licensed versions, pushing the open agent ecosystem forward. (X [https://x.com/hcompany_ai/status/1989013556134638039], Blog [https://hcompany.ai/blog], HF [https://huggingface.co/hcompany-ai/Holo2-8B]) ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Big Companies & APIs GPT‑5.1: Instant vs Thinking, and a new personality bar The biggest headline of the week was OpenAI shipping GPT‑5.1, and this was a hot topic of debate on the show. The update introduces two modes: “Instant” for fast, low‑compute answers, and “Thinking” for deeper reasoning on hard problems. OpenAI claims Instant mode uses 57% fewer tokens on easy tasks, while Thinking mode dedicates 71% more compute to difficult ones. This adaptive approach is a smart evolution. The release also adds a personality dropdown with options like Professional, Friendly, Quirky, and Cynical, aiming for a more “warm” and customizable experience. Yam and I felt this was a step in the right direction, as GPT‑5 could often feel a bit cold and uncommunicative. However, Wolfram had a more disappointing experience, finding that GPT‑5.1 performed significantly worse on his German grammar and typography tasks compared to GPT‑4 or Claude Sonnet 4.5. It’s a reminder that “upgrades” can be subjective and task‑dependent. Since the show was recorded, GPT 5.1 is also released in the API and they have published a prompting guide [https://cookbook.openai.com/examples/gpt-5/gpt-5-1_prompting_guide] and some evals! With some significant jumps across SWE-bench verified and GPQA Diamond! We’ll be testing this model out all week. The highlight for this model is the creative writing, it was made public that this model was being tested on OpenRouter as Polaris-alpha and that one tops the eqbench creative writing benchmarks beating Sonnet 4.5 and Gemini! Grok‑4 Fast: 2M context and a native X superpower Grok‑4 Fast from xAI apparenly quietly got a substantial upgrade to a 2M‑token context window, but the most interesting part is its unique integration with X. The API version has access to internal tools for semantic search over tweets, retrieving top quote tweets, and understanding embedded images and videos. I’ve started using it as a research agent in my show prep, and it feels like having a research assistant living inside X’s backend—something you simply can’t replicate with public tools. I still have my gripes about their “stealth upgrade” versioning strategy, which makes rigorous evaluation difficult, but as a practical tool, Grok‑4 Fast is incredibly powerful. It’s also surprisingly fast and cost‑effective, holding its own against other top models on benchmarks while offering a superpower that no one else has. Google SIMA 2: Embodied Agents in Virtual Worlds Google’s big contribution this week was SIMA 2, DeepMind’s latest embodied agent for 3D virtual worlds. SIMA lives inside real games like No Man’s Sky and Goat Simulator, seeing the screen and controlling the game via keyboard and mouse, using Gemini as its reasoning brain. Demos showed it following complex, sketch‑based instructions, like finding an object that looks like a drawing of a spaceship and jumping on top of it. When you combine this with Genie 3—Google’s world model that can generate playable environments from a single image—you see the bigger picture: agents that learn physics, navigation, and common sense by playing in millions of synthetic worlds. We’re not there yet, but the pieces are clearly being assembled. We also touched on the latest Gemini Live voice upgrade, which users are reporting feels much more natural and responsive More Big Company News: Qwen Deep Research, Code Arena, and Cursor We also briefly covered Qwen’s new [https://x.com/Alibaba_Qwen/status/1989026687611461705] Deep Research feature, which offers an OpenAI‑style research agent inside their ecosystem. LMSYS launched Blog [https://arena.lmsys.org/blog/code-arena], a fantastic live evaluation platform where models build real web apps agentically, with humans voting on the results. And in the world of funding, the AI‑native code editor Cursor raised a staggering $2.3 billion, a clear sign that AI is becoming the default way developers interact with code. This Week’s Buzz: W&B LEET – a terminal UI that sparks joy For this week’s buzz, I brought on Dima Duev from our SDK team at Weights & Biases to show off a side project that has everyone at the company excited: LEET, the Lightweight Experiment Exploration Tool. Imagine you’re training on an air‑gapped HPC cluster, living entirely in your terminal. How do you monitor your runs? With LEET. You run your training script in W&B offline mode, and in another terminal, you type wandb beta leet. Your terminal instantly turns into a full TUI dashboard with live metric plots, system stats, and run configs. You can zoom into spikes in your loss curve, filter metrics, and see everything updating in real time, all without a browser or internet connection. It’s one of those tools that just sparks joy. It ships with the latest wandb SDK (v0.23.0+), so just upgrade and give it a try! Voice & Audio: Scribe v2 Realtime and Omnilingual ASR ElevenLabs Scribe v2 Realtime: ASR built for agents (X [https://x.com/elevenlabsio/status/1988282248445976987], Announcement [https://docs.elevenlabs.io], Demo [https://captions.events/]) We’ve talked a lot on this show about ElevenLabs as “the place you go to make your AI talk.” This week, they came for the other half of the conversation. Paul Asjes from ElevenLabs joined us to walk through Scribe v2 Realtime, their new low‑latency speech‑to‑text model. If you’re building a voice agent, you need ears, a brain, and a mouth. ElevenLabs already nailed the mouth, and now they’ve built some seriously good ears. Scribe v2 Realtime is designed to run at around 150 milliseconds median latency, across more than ninety languages. Watching Paul’s live demo, it felt comfortably real‑time. When he switched from English to Dutch mid‑sentence, the system just followed along without missing a beat. Community benchmarks and our own impressions show it holding its own or beating competitors like Whisper and Deepgram in noisy, accented, and multi‑speaker scenarios. It’s also context‑aware enough to handle code, initialisms, and numbers correctly, which is critical for real‑world agents. This is a production‑ready ASR for anyone building live voice experiences. Meta’s drops Omnilingual ASR: 1,600+ languages, many for the first time + a bunch of open source models (X [https://x.com/AIatMeta/status/1987946571439444361], Blog [https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition], Announcement [https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/], HF [https://huggingface.co/datasets/facebook/omnilingual-asr-corpus]) On the other end of the spectrum, Meta released something that’s less about ultra‑low latency and more about sheer linguistic coverage: Omnilingual ASR. This is a family of models and a dataset designed to support speech recognition for more than 1,600 languages, including about 500 that have never had any ASR support before. That alone is a massive contribution. Technically, it uses a wav2vec 2.0 backbone scaled up to 7B parameters with both CTC and LLM‑style decoders. The LLM‑like architecture allows for in‑context learning, so communities can add support for new languages with only a handful of examples. They’re also releasing the Omnilingual ASR Corpus with data for 350 underserved languages. The models and code are Apache‑2.0, making this a huge step forward for more inclusive speech tech. AI Art, Diffusion & 3D Qwen Image Edit + Multi‑Angle LoRA: moving the camera after the fact (X [https://x.com/linoy_tsaban/status/1986090375409533338], HF [https://huggingface.co/spaces/linoyts/Qwen-Image-Edit-Angles], Fal [https://x.com/fal/status/1988693046267969804?s=20]) This one was pure fun. A new set of LoRAs for Qwen Image Edit adds direct camera control to still images. A Hugging Face demo lets you upload a photo and use sliders to rotate the camera up to 90 degrees, tilt from a bird’s‑eye to a worm’s‑eye view, and adjust the lens. We played with it live on the show with a portrait of Wolfram and a photo of my cat, generating different angles and then interpolating them into a short “fly‑around” video. It’s incredibly cool and preserves details surprisingly well, feeling like you have a virtual camera inside a 2D picture. NVIDIA ChronoEdit‑14B Upscaler LoRA (X [https://x.com/HuanLing6/status/1988098676838060246], HF [https://huggingface.co/NVIDIA/ChronoEdit-14B-Diffusers-Upscaler-LoRA]) Finally, NVIDIA released an upscaler LoRA based on their ChronoEdit‑14B model and merged the pipeline into Hugging Face Diffusers. ChronoEdit reframes image editing as a temporal reasoning task, like generating a tiny video. This makes it good for maintaining consistency in edits and upscales. It’s a heavy model, requiring ~34GB of VRAM, and for aggressive upscaling, specialized tools might still be better. But for moderate upscales where temporal coherence matters, it’s a very interesting new tool in the toolbox. Phew, we made it through this dense week! Looking to next week, I’ll be recoridng the show live from the AI Engieer CODE summit in NY, and we’ll likely see a few good releases from the big G? Maybe? finally? As always, if this was helpful, please subscribe to ThursdAI and share it with 2 friends, see you next week 🫡 TL;DR and Show Notes * Hosts and Guests * Alex Volkov - AI Evangelist & Weights & Biases (@altryne [http://x.com/altryne]) * Co-Hosts - @WolframRvnwlf [http://x.com/WolframRvnwlf], @yampeleg [https://x.com/yampeleg], @ldjconfirmed [http://x.com/ldjconfirmed] * Guest: Dima Duev - SDK team Wandb * Guest: Paul Asjes - Eleven Labs (@paul_asjes [https://x.com/paul_asjes]) * Open Source LLMs * Terminal-Bench 2.0 and Harbor launch (X [https://x.com/alexgshaw/status/1986911106108211461], Blog [https://harborframework.com/], Docs [https://harborframework.com/docs/running-tbench], Announcement [https://www.tbench.ai/leaderboard]) * Baidu releases ERNIE-4.5-VL-28B-A3B-Thinking (X [https://x.com/Baidu_Inc/status/1988182106359411178], HF [https://huggingface.co/ERNIE/ERNIE-4.5-VL-28B-A3B-Thinking], GitHub [https://github.com/ERNIE/ERNIE-4.5-VL-28B-A3B-Thinking], Blog [https://ernie.baidu.com/blog/ernie-4-5-vl-28b-a3b-thinking], Platform [https://ai.baidu.com/ai-studio]) * Project AELLA (OSSAS): 100K LLM-generated paper summaries (X [https://x.com/samhogan/status/1988306424309706938], HF [https://huggingface.co/inference-net]) * WeiboAI’s VibeThinker-1.5B (X [https://x.com/WeiboLLM/status/1988109435902832896], HF [https://huggingface.co/WeiboAI/VibeThinker-1.5B], Arxiv [https://arxiv.org/abs/2511.06221], Announcement [https://venturebeat.com/ai/weibos-new-open-source-ai-model-vibethinker-1-5b-outperforms-deepseek-r1-on]) * Code Arena — live, agentic coding evaluations (X [https://x.com/arena/status/1988665193275240616], Blog [https://arena.lmsys.org/blog/code-arena], Announcement [https://arena.lmsys.org/]) * Big CO LLMs + APIs * Grok 4 Fast, Grok Imagine and Nano Banana v1/v2 (X [https://x.com/chatgpt21/status/1987976808562589946], X [https://x.com/cowowhite/status/1988213138333069314], X [https://x.com/RasNas1994/status/1986426297900245106], X [https://x.com/XFreeze/status/1987396781861212353]) * OpenAI launches GPT-5.1 (X [https://x.com/fidjissimo/status/1988683216681889887], X [https://x.com/sama/status/1988692165686620237]) * This weeks Buzz * W&B LEET — an open-source Terminal UI (TUI) to monitor runs (X [https://x.com/wandb/status/1988401253156876418], Blog [https://app.getbeamer.com/wandb/en/meet-wb-leet-a-new-terminal-ui-for-weights-biases-JXSFhyt2]) * Voice & Audio * ElevenLabs launches Scribe v2 Realtime (X [https://x.com/elevenlabsio/status/1988282248445976987], Blog [https://elevenlabs.io/agents], Docs [https://docs.elevenlabs.io]) * Meta releases Omnilingual ASR for 1,600+ languages (X [https://x.com/AIatMeta/status/1987946571439444361], Blog [https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition], Paper [https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/], HF Dataset [https://huggingface.co/datasets/facebook/omnilingual-asr-corpus], HF Demo [https://huggingface.co/spaces/facebook/omniasr-transcriptions], GitHub [https://github.com/facebookresearch/omnilingual-asr]) * Gemini Live conversational upgrade (X [https://x.com/carlovarrasi/status/1988691309591425234]) * AI Art & Diffusion & 3D * Qwen Image Edit + Multi‑Angle LoRA for camera control (X [https://x.com/linoy_tsaban/status/1986090375409533338], HF [https://huggingface.co/spaces/linoyts/Qwen-Image-Edit-Angles], Fal [https://x.com/fal/status/1988693046267969804?s=20]) * NVIDIA releases ChronoEdit-14B Upscaler LoRA (X [https://x.com/HuanLing6/status/1988098676838060246], HF [https://huggingface.co/NVIDIA/ChronoEdit-14B-Diffusers-Upscaler-LoRA], Docs [https://huggingface.co/docs/diffusers/main/en/api/pipelines/chronoedit]) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe [https://sub.thursdai.news/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

13. nov. 2025 - 1 h 10 min

En fantastisk app med et enormt stort udvalg af spændende podcasts. Podimo formår virkelig at lave godt indhold, der takler de lidt mere svære emner. At der så også er lydbøger oveni til en billig pris, gør at det er blevet min favorit app.

Jack

App Store

Jack

App Store

Rigtig god tjeneste med gode eksklusive podcasts og derudover et kæmpe udvalg af podcasts og lydbøger. Kan varmt anbefales, om ikke andet så udelukkende pga Dårligdommerne, Klovn podcast, Hakkedrengene og Han duo 😁 👍

Søren Jensen

App Store

Podimo er blevet uundværlig! Til lange bilture, hverdagen, rengøringen og i det hele taget, når man trænger til lidt adspredelse.

ACKH

App Store

Vælg dit abonnement

Begrænset tilbud

Premium

20 timers lydbøger