ThursdAI - The top AI news from the past week

ThursdAI - The top AI news from the past week

Podkast av From Weights & Biases, Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more. sub.thursdai.news

Tidsbegrenset tilbud

3 Måneder for 9,00 kr

Deretter 99,00 kr / MånedAvslutt når som helst.

Kom i gang

Alle episoder

110 Episoder
episode 📅 ThursdAI - Jun 26 - Gemini CLI, Flux Kontext Dev, Search Live, Anthropic destroys books, Zucks superintelligent team & more AI news artwork
📅 ThursdAI - Jun 26 - Gemini CLI, Flux Kontext Dev, Search Live, Anthropic destroys books, Zucks superintelligent team & more AI news

Hey folks, Alex here, writing from... a undisclosed tropical paradise location 🏝️ I'm on vacation, but the AI news doesn't stop of course, and neither does ThursdAI. So huge shoutout to Wolfram Ravenwlf for running the show this week, Nisten, LDJ and Yam who joined. So... no long blogpost with analysis this week, but I'll def. recommend tuning in to the show that the folks ran, they had a few guests on, and even got some breaking news (new Flux Kontext that's open source) Of course many of you are readers and are here for the links, so I'm including the raw TL;DR + speaker notes as prepared by the folks for the show! P.S - our (rescheduled) hackathon is coming up in San Francisco, on July 12-13 called WeaveHacks, if you're interested at a chance to win a RoboDog, welcome to join us and give it a try. Register HERE [https://lu.ma/weavehacks] Ok, that's it for this week, please enjoy the show and see you next week! ThursdAI - June 26th, 2025 - TL;DR * Hosts and Guests * WolframRvnwlf - Host (@WolframRvnwlf [http://x.com/WolframRvnwlf]) * Co-Hosts - @yampeleg [http://x.com/yampeleg], @nisten [http://x.com/nisten], @ldjconfirmed [http://x.com/ldjconfirmed] * Guest - Jason Kneen (@jasonkneen [http://x.com/jasonkneen]) - Discussing MCPs, coding tools, and agents * Guest - Hrishioa (@hrishioa [http://x.com/hrishioa]) - Discussing agentic coding and spec-driven development * Open Source LLMs * Mistral Small 3.2 released with improved instruction following, reduced repetition & better function calling (X [https://x.com/MistralAI/status/1936093325116781016]) * Unsloth AI releases dynamic GGUFs with fixed chat templates (X [https://x.com/UnslothAI/status/1936426567850487925]) * Kimi-VL-A3B-Thinking-2506 multimodal model updated for better video reasoning and higher resolution (Blog [https://huggingface.co/blog/moonshotai/kimi-vl-a3b-thinking-2506]) * Chinese Academy of Science releases Stream-Omni, a new Any-to-Any model for unified multimodal input (HF [https://huggingface.co/ICTNLP/stream-omni-8b], Paper [https://huggingface.co/papers/2506.13642]) * Prime Intellect launches SYNTHETIC-2, an open reasoning dataset and synthetic data generation platform (X [https://x.com/PrimeIntellect/status/1937272174295023951]) * Big CO LLMs + APIs * Google * Gemini CLI, a new open-source AI agent, brings Gemini 2.5 Pro to your terminal (Blog [https://web.archive.org/web/20250625051706/https://blog.google/technology/developers/introducing-gemini-cli/], GitHub [https://github.com/google-gemini/gemini-cli]) * Google reduces free tier API limits for previous generation Gemini Flash models (X [https://x.com/ai_for_success/status/1937493142279971210]) * Search Live with voice conversation is now rolling out in AI Mode in the US (Blog [https://blog.google/products/search/search-live-ai-mode/], X [https://x.com/rajanpatel/status/1935484294182608954]) * Gemini API is now faster for video and PDF processing with improved caching (Docs [https://ai.google.dev/gemini-api/docs/caching]) * Anthropic * Claude introduces an "artifacts" space for building, hosting, and sharing AI-powered apps (X [https://x.com/AnthropicAI/status/1937921801000219041]) * Federal judge rules Anthropic's use of books for training Claude qualifies as fair use (X [https://x.com/ai_for_success/status/1937515997076029449]) * xAI * Elon Musk announces the successful launch of Tesla's Robotaxi (X [https://x.com/elonmusk/status/1936876178356490546]) * Microsoft * Introduces Mu, a new language model powering the agent in Windows Settings (Blog [https://blogs.windows.com/windowsexperience/2025/06/23/introducing-mu-language-model-and-how-it-enabled-the-agent-in-windows-settings/]) * Meta * Report: Meta pursued acquiring Ilya Sutskever's SSI, now hires co-founders Nat Friedman and Daniel Gross (X [https://x.com/kimmonismus/status/1935954015998624181]) * OpenAI * OpenAI removes mentions of its acquisition of Jony Ive's startup 'io' amid a trademark dispute (X [https://x.com/rowancheung/status/1937414172322439439]) * OpenAI announces the release of DeepResearch in API + Webhook support (X [https://x.com/stevendcoffey/status/1938286946075418784]) * This weeks Buzz * Alex is on vacation; WolframRvnwlf is attending AI Tinkerers Munich on July 25 (Event [https://munich.aitinkerers.org/p/ai-tinkerers-munich-july-25]) * Join W&B Hackathon happening in 2 weeks in San Francisco - grand prize is a RoboDog! (Register for Free [https://lu.ma/weavehacks]) * Vision & Video * MeiGen-MultiTalk code and checkpoints for multi-person talking head generation are released (GitHub [https://github.com/MeiGen-AI/MultiTalk], HF [https://huggingface.co/MeiGen-AI/MeiGen-MultiTalk]) * Google releases VideoPrism for generating adaptable video embeddings for various tasks (HF [https://hf.co/google/videoprism], Paper [https://arxiv.org/abs/2402.13217], GitHub [https://github.com/google-deepmind/videoprism]) * Voice & Audio * ElevenLabs launches 11.ai [11.ai], a voice-first personal assistant with MCP support (Sign Up [http://11.ai/], X [https://x.com/elevenlabsio/status/1937200086515097939]) * Google Magenta releases Magenta RealTime, an open weights model for real-time music generation (Colab [https://colab.research.google.com/github/magenta/magenta-realtime/blob/main/notebooks/Magenta_RT_Demo.ipynb], Blog [https://g.co/magenta/rt]) * ElevenLabs launches a mobile app for iOS and Android for on-the-go voice generation (X [https://x.com/elevenlabsio/status/1937541389140611367]) * AI Art & Diffusion & 3D * Google rolls out Imagen 4 and Imagen 4 Ultra in the Gemini API and Google AI Studio (Blog [https://developers.googleblog.com/en/imagen-4-now-available-in-the-gemini-api-and-google-ai-studio/]) * OmniGen 2 open weights model for enhanced image generation and editing is released (Project Page [https://vectorspacelab.github.io/OmniGen2/], Demo [https://huggingface.co/spaces/OmniGen2/OmniGen2], Paper [https://huggingface.co/papers/2506.18871]) * Tools * OpenMemory Chrome Extension provides shared memory across ChatGPT, Claude, Gemini and more (X [https://x.com/taranjeetio/status/1937537163270451494]) * LM Studio adds MCP support to connect local LLMs with your favorite servers (Blog [https://lmstudio.ai/blog/mcp]) * Cursor is now available as a Slack integration (Dashboard [http://cursor.com/dashboard]) * All Hands AI releases the OpenHands CLI, a model-agnostic, open-source coding agent (Blog [https://all-hands.dev/blog/the-openhands-cli-ai-powered-development-in-your-terminal], Docs [https://docs.all-hands.dev/usage/how-to/cli-mode#cli]) * Warp 2.0 launches as an Agentic Development Environment with multi-threading (X [https://x.com/warpdotdev/status/1937525185843752969]) * Studies and Others * The /r/LocalLLaMA subreddit is back online after a brief moderation issue (Reddit [https://www.reddit.com/r/LocalLLaMA/comments/1ljlr5b/subreddit_back_in_business/], News [https://x.com/localllamasub]) * Andrej Karpathy's talk "Software 3.0" discusses the future of programming in the age of AI (YouTube [https://www.youtube.com/watch?v=LCEmiRjPEtQ], Summary [https://www.latent.space/p/s3]) Thank you, see you next week! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe [https://sub.thursdai.news/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

26. juni 2025 - 1 h 39 min
episode 📆 ThursdAI - June 19 - MiniMax M1 beats R1, OpenAI records your meetings, Gemini in GA, W&B uses Coreweave GPUs & more AI news artwork
📆 ThursdAI - June 19 - MiniMax M1 beats R1, OpenAI records your meetings, Gemini in GA, W&B uses Coreweave GPUs & more AI news

Hey all, Alex here 👋 This week, while not the busiest week in releases (we can't get a SOTA LLM every week now can we), was full of interesting open source releases, and feature updates such as the chatGPT meetings recorder (which we live tested on the show, the limit is 2 hours!) It was also a day after our annual W&B conference called FullyConnected, and so I had a few goodies to share with you, like answering the main question, when will W&B have some use of those GPUs from CoreWeave, the answer is... now! (We launched a brand new preview of an inference service with open source models) And finally, we had a great chat with Pankaj Gupta, co-founder and CEO of Yupp, a new service that lets users chat with the top AIs for free, while turning their votes into leaderboards for everyone else to understand which Gen AI model is best for which task/topic. It was a great conversation, and he even shared an invite code with all of us (I'll attach to the TL;DR and show notes, let's dive in!) 00:00 Introduction and Welcome 01:04 Show Overview and Audience Interaction 01:49 Special Guest Announcement and Experiment 03:05 Wolfram's Background and Upcoming Hosting 04:42 TLDR: This Week's Highlights 15:38 Open Source AI Releases 32:34 Big Companies and APIs 32:45 Google's Gemini Updates 42:25 OpenAI's Latest Features 54:30 Exciting Updates from Weights & Biases 56:42 Introduction to Weights & Biases Inference Service 57:41 Exploring the New Inference Playground 58:44 User Questions and Model Recommendations 59:44 Deep Dive into Model Evaluations 01:05:55 Announcing Online Evaluations via Weave 01:09:05 Introducing Pankaj Gupta from YUP.AI [http://YUP.AI] 01:10:23 YUP.AI [http://YUP.AI]: A New Platform for Model Evaluations 01:13:05 Discussion on Crowdsourced Evaluations 01:27:11 New Developments in Video Models 01:36:23 OpenAI's New Transcription Service 01:39:48 Show Wrap-Up and Future Plans Here's the TL;DR and show notes links ThursdAI - June 19th, 2025 - TL;DR * Hosts and Guests * Alex Volkov - AI Evangelist & Weights & Biases (@altryne [http://x.com/@altryne]) * Co Hosts - @WolframRvnwlf [http://x.com/@WolframRvnwlf] @yampeleg [x.com/@yampeleg] @nisten [http://x.com/@nisten] @ldjconfirmed [http://x.com/@ldjconfirmed)] * Guest - @pankaj [http://x.com/@pankaj] - co-founder of Yupp.ai [https://yupp.ai/join/thursdAI] * Open Source LLMs * Moonshot AI open-sourced Kimi-Dev-72B (Github [https://github.com/MoonshotAI/Kimi-Dev?tab=readme-ov-file], HF [https://huggingface.co/moonshotai/Kimi-Dev-72B]) * MiniMax-M1 456B (45B Active) - reasoning model (Paper [https://arxiv.org/abs/2506.13585], HF [https://huggingface.co/MiniMaxAI/MiniMax-M1-40k], Try It [https://huggingface.co/spaces/MiniMaxAI/MiniMax-M1], Github [https://github.com/MiniMax-AI/MiniMax-M1]) * Big CO LLMs + APIs * Google drops Gemini 2.5 Pro/Flash GA, 2.5 Flash-Lite in Preview ( Blog [https://blog.google/products/gemini/gemini-2-5-model-family-expands/], Tech report [https://storage.googleapis.com/gemini-technical-report], Tweet [https://x.com/google/status/192905415]) * Google launches Search Live: Talk, listen and explore in real time with AI Mode (Blog [https://blog.google/products/search/search-live-ai-mode/]) * OpenAI adds MCP support to Deep Research in chatGPT (X [https://x.com/altryne/status/1934644274227769431], Docs [https://platform.openai.com/docs/mcp]) * OpenAI launches their meetings recorder in mac App (docs [https://help.openai.com/en/articles/11487532-chatgpt-record]) * Zuck update: Considering bringing Nat Friedman and Daniel Gross to Meta (information [https://x.com/amir/status/1935461177045516568]) * This weeks Buzz * NEW! W&B Inference provides a unified interface to access and run top open-source AI models (inference [https://wandb.ai/inference], docs [https://weave-docs.wandb.ai/guides/integrations/inference/]) * NEW! W&B Weave Online Evaluations delivers real-time production insights and continuous evaluation for AI agents across any cloud. (X [https://x.com/altryne/status/1935412384283107572]) * The new platform offers "metal-to-token" observability, linking hardware performance directly to application-level metrics. * Vision & Video * ByteDance new video model beats VEO3 - Seedance.1.0 mini (Site [https://dreamina.capcut.com/ai-tool/video/generate], FAL [https://fal.ai/models/fal-ai/bytedance/seedance/v1/lite/image-to-video]) * MiniMax Hailuo 02 - 1080p native, SOTA instruction following (X [https://www.minimax.io/news/minimax-hailuo-02], FAL [https://fal.ai/models/fal-ai/minimax/hailuo-02/pro/image-to-video]) * Midjourney video is also here - great visuals (X [https://x.com/angrypenguinPNG/status/1932931137179176960]) * Voice & Audio * Kyutai launches open-source, high-throughput streaming Speech-To-Text models for real-time applications (X [https://x.com/kyutai_labs/thread/1935652243119788111], website [https://join.yupp.ai/thursdai]) * Studies and Others * LLMs Flunk Real-World Coding Contests, Exposing a Major Skill Gap (Arxiv [https://arxiv.org/pdf/2506.11928]) * MIT Study: ChatGPT Use Causes Sharp Cognitive Decline (Arxiv [https://arxiv.org/abs/2506.08872]) * Andrej Karpathy's "Software 3.0": The Dawn of English as a Programming Language (youtube [https://www.youtube.com/watch?v=LCEmiRjPEtQ], deck [https://drive.google.com/file/d/1HIEMdVlzCxke22ISVzornd2-UpWHngRZ/view?usp=sharing]) * Tools * Yupp launches with 500+ AI models, a new leaderboard, and a user-powered feedback economy - use thursdai link [https://yupp.ai/join/thursdAI]* to get 50% extra credits * BrowserBase announces director.ai [http://director.ai] - an agent to run things on the web * Universal system prompt for reduction of hallucination (from Reddit [https://www.reddit.com/r/PromptEngineering/comments/1kup28y/chatgpt_and_gemini_ai_will_gaslight_you_everyone/]) *Disclosure: while this isn't a paid promotion, I do think that yupp has a great value, I do get a bit more credits on their platform if you click my link and so do you. You can go to yupp.ai [http://yupp.ai] and register with no affiliation if you wish. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe [https://sub.thursdai.news/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

20. juni 2025 - 1 h 41 min
episode 📆 ThursdAI - June 12 - Meta’s $15B ScaleAI Power Play, OpenAI’s o3-pro & 90% Price Drop! artwork
📆 ThursdAI - June 12 - Meta’s $15B ScaleAI Power Play, OpenAI’s o3-pro & 90% Price Drop!

Hey folks, this is Alex, finally back home! This week was full of crazy AI news, both model related but also shifts in the AI landscape and big companies, with Zuck going all in on scale & execu-hiring Alex Wang for a crazy $14B dollars. OpenAI meanwhile, maybe received a new shipment of GPUs? Otherwise, it’s hard to explain how they have dropped the o3 price by 80%, while also shipping o3-pro (in chat and API). Apple was also featured in today’s episode, but more so for the lack of AI news, completely delaying the “very personalized private Siri powered by Apple Intelligence” during WWDC25 this week. We had 2 guests on the show this week, Stefania Druga [https://substack.com/profile/109432335-stefania-druga] and Eric Provencher (who builds RepoPrompt). Stefania helped me cover the AI Engineer conference we all went to last week, and shared some cool Science CoPilot stuff she’s working on, while Eric is the GOTO guy for O3-pro helped us understand what this model is great for! As always, TL;DR and show notes at the bottom, video for those who prefer watching is attached below, let’s dive in! Big Companies LLMs & APIs Let’s start with big companies, because the landscape has shifted, new top reasoner models dropped and some huge companies didn’t deliver this week! Zuck goes all in on SuperIntelligence - Meta’s $14B stake in ScaleAI and Alex Wang This may be the most consequential piece of AI news today. Fresh from the dissapointing results of LLama 4, reports of top researchers leaving the Llama team, many have decided to exclude Meta from the AI race. We have a saying at ThursdAI, don’t bet against Zuck! Zuck decided to spend a lot of money (nearly 20% of their reported $65B investment in AI infrastructure) to get a 49% stake in Scale AI and bring Alex Wang it’s (now former) CEO to lead the new Superintelligence team at Meta. For folks who are not familiar with Scale, it’s a massive company in providing human annotated data services to all the big AI labs, Google, OpenAI, Microsoft, Anthropic.. all of them really. Alex Wang, is the youngest self made billionaire because of it, and now Zuck not only has access to all their expertise, but also to a very impressive AI persona, who could help revive the excitement about Meta’s AI efforts, help recruit the best researchers, and lead the way inside Meta. Wang is also an outspoken China hawk who spends as much time in congressional hearings as in Slack, so the geopolitics here are … spicy. Meta just stapled itself to the biggest annotation funnel on Earth, hired away Google’s Jack Rae (who was on the pod just last week, shipping for Google!) for brainy model alignment, and started waving seven-to-nine-figure comp packages at every researcher with “Transformer” in their citation list. Whatever disappointment you felt over Llama-4’s muted debut, Zuck clearly felt it too—and responded like a founder who still controls every voting share. OpenAI’s Game-Changer: o3 Price Slash & o3-pro launches to top the intelligence leaderboards! Meanwhile OpenAI dropping not one, but two mind-blowing updates. First, they’ve slashed the price of o3—their premium reasoning model—by a staggering 80%. We’re talking from $40/$10 per million tokens down to just $8/$2. That’s right, folks, it’s now in the same league as Claude Sonnet cost-wise, making top-tier intelligence dirt cheap. I remember when a price drop of 80% after a year got us excited; now it’s 80% in just four months with zero quality loss. They’ve confirmed it’s the full o3 model—no distillation or quantization here. How are they pulling this off? I’m guessing someone got a shipment of shiny new H200s from Jensen! And just when you thought it couldn’t get better, OpenAI rolled out o3-pro, their highest intelligence offering yet. Available for pro and team accounts, and via API (87% cheaper than o1-pro, by the way), this model—or consortium of models—is a beast. It’s topping charts on Artificial Analysis, barely edging out Gemini 2.5 as the new king. Benchmarks are insane: 93% on AIME 2024 (state-of-the-art territory), 84% on GPQA Diamond, and nearing a 3000 ELO score on competition coding. Human preference tests show 64-66% of folks prefer o3-pro for clarity and comprehensiveness across tasks like scientific analysis and personal writing. I’ve been playing with it myself, and the way o3-pro handles long context and tough problems is unreal. As my friend Eric Provencher (creator of RepoPrompt) shared on the show, it’s surgical—perfect for big refactors and bug diagnosis in coding. It’s got all the tools o3 has—web search, image analysis, memory personalization—and you can run it in background mode via API for async tasks. Sure, it’s slower due to deep reasoning (no streaming thought tokens), but the consistency and depth? Worth it. Oh, and funny story—I was prepping a talk [https://youtu.be/KEdoIbBu2Ko] for Hamel Hussain’s evals course, with a slide saying “don’t use large reasoning models if budget’s tight.” The day before, this price drop hits, and I’m scrambling to update everything. That’s AI pace for ya! Apple WWDC: Where’s the Smarter Siri? Oh Apple. Sweet, sweet Apple. Remember all those Bella Ramsey ads promising a personalized Siri that knows everything about you? Well, Craig Federighi opened WWDC by basically saying "Yeah, about that smart Siri... she's not coming. Don't wait up." Instead, we got: * AI that can combine emojis (revolutionary! 🙄) * Live translation (actually cool) * Direct API access to on-device models (very cool for developers) * Liquid glass UI (pretty but... where's the intelligence?) The kicker? Apple released a paper called "The Illusion of Thinking" right before WWDC, basically arguing that AI reasoning models hit hard complexity ceilings. Some saw this as Apple making excuses for why they can't ship competitive AI. The timing was... interesting. During our recording, Nisten's Siri literally woke up randomly when we were complaining about how dumb it still is. After a decade, it's the same Siri. That moment was pure comedy gold. This Week's Buzz Our premium conference Fully Connected is happening June 17-18 in San Francisco! Use promo code WBTHURSAI to register for free [https://fullyconnected.com]. We'll have updates on the CoreWeave acquisition, product announcements, and it's the perfect chance to give feedback directly to the people building the tools you use. Also, my talk on Large Reasoning Models as LLM judges is now up on YouTube. Had to update it live because of the O3 price drop - such is life in AI! Open Source LLMs: Mistral Goes Reasoning Mode Mistral Drops Magistral - Their First Reasoning Model The French champagne of LLMs is back! Mistral released Magistral, their first reasoning model, in two flavors: a 24B parameter open-source Small version and a closed API-only Medium version. And honestly? The naming continues to be chef's kiss - Mistral really has the branding game locked down. Now, here's where it gets spicy. Mistral's benchmarks notably don't include comparisons to Chinese models like Qwen or DeepSeek. Dylan Patel from SemiAnalysis called them out on this, and when he ran the comparisons himself, well... let's just say Magistral Medium barely keeps up with Qwen's tiny 4B parameter model on math benchmarks. Ouch. But here's the thing - and Nisten really drove this home during our discussion - benchmarks don't tell the whole story. He's been using Magistral Small for his workflows and swears by it. "It's almost at the point where I don't want to tell people about it," he said, which is the highest praise from someone who runs models locally all day. The 24B Small version apparently hits that sweet spot for local deployment while being genuinely useful for real work. The model runs on a single RTX 4090 or a 32GB MacBook after quantization, has a 128K context window (though they recommend capping at 40K), and uses a transparent mode that shows its reasoning process. It's Apache 2.0 licensed, multilingual, and available through their Le Chat interface with "Flash Answers" for real-time reasoning. SakanaAI's Text2Lora: The Future is Self-Adapting Models This one blew my mind. SakanaAI (co-founded by one of the Transformer paper authors) released Text2Lora - a method for adapting LLMs to new tasks using ONLY text descriptions. No training data needed! Think about this: instead of fine-tuning a model with thousands of examples to make it better at math, you just... tell it to be better at math. And it works! On Llama 3.1 8B, Text2Lora reaches 77% average accuracy, outperforming all baseline methods. What this means is we're approaching a world where models can essentially customize themselves on-the-fly for whatever task you throw at them. As Nisten put it, "This is revolutionary. The model is actually learning, actually changing its own weights." We're just seeing the first glimpses of this capability, but in 6-12 months? 🎥 Multimedia & Tools: Video, Voice, and Browser Breakthroughs Let’s zip through some multimedia and tool updates that caught my eye this week. Google’s VEO3-fast is a creator’s dream—2x faster 720p video generation, 80% cheaper, and now with audio support. I’ve seen clips on social media (like an NBA ad) that are unreal, though Wolfram noted it’s not fully rolled out in Europe yet. You can access it via APIs like Fail or Replicate, and I’m itching to make a full movie if I had the budget! Midjourney’s gearing up for a video product with their signature style, but they’re also facing heat—Disney and Universal are suing them for copyright infringement over Star Wars and Avengers-like outputs. It’s Hollywood’s first major strike against AI, and while I get the IP concern, it’s odd they picked the smaller player when OpenAI and Google are out there too. This lawsuit could drag on, so stay tuned. OpenAI’s new advanced voice mode dropped, aiming for a natural cadence with better multilingual support (Russian and Hebrew sound great now). But honestly? I’m not loving the breathing and laughing they added—it’s uncanny valley for me. Some folks on X are raving, though, and LDJ noted it’s closing the gap to Sesame’s Maya. I just wish they’d let me pick between old and new voices instead of switching under my feet. If OpenAI’s listening, transparency please! On the tools side, Yutori’s Scouts got my timeline buzzing—AI agents that monitor the web for any topic (like “next ThursdAI release”) and notify you of updates. I saw a demo catching leadership changes at xAI, and it’s the future of web interaction. Couldn’t log in live on the show (email login woes—give me passwords, folks!), but it’s beta on yutori.com. Also, Browser Company finally launched DIA, an AI-native browser in beta. Chatting with open tabs, rewriting text, and instant answers? I’ve been using it to prep for ThursdAI, and it’s pretty slick. Try it at diabrowser.com. Wrapping Up: AI’s Breakneck Pace What a week, folks! From OpenAI democratizing intelligence with o3-pro and price cuts to Meta’s bold superintelligence play with ScaleAI, we’re witnessing history unfold at lightning speed. Apple’s stumble at WWDC stings, but open-source gems and new tools keep the excitement alive. I’m still riding the high from AI Engineer last week—your high-fives and feedback mean the world. Next week, don’t miss Weights & Biases’ Fully Connected conference in SF on June 18-19. I won’t be there physically, but I’m cheering from afar—grab your spot at fullyconnected.com with promo code WBTHURSAI for a sweet deal. Thanks for being part of the ThursdAI crew. Here’s the full TL;DR and show notes to catch anything you missed. See you next week! TL;DR of all topics covered: * Hosts and Guests * Alex Volkov - AI Evangelist & Weights & Biases (@altryne [http://x.com/@altryne]) * Co Hosts - @WolframRvnwlf [http://x.com/@WolframRvnwlf], @yampeleg [http://x.com/@yampeleg], @nisten [http://x.com/@nisten], @ldjconfirmed [http://x.com/@ldjconfirmed] * Guests - * Stefania Druga @stefania_druga [https://x.com/Stefania_druga] (Independent, Former Research Scientist Google DeepMind),Creator of scratch copilot [https://medium.com/bits-and-behavior/supercharge-your-scratch-projects-introducing-cognimates-copilot-an-ai-teammate-for-kids-52e616e4096e], and AI Engineer education summit [https://ai.engineer/education]. * Eric Provencher - @pvncher [https://x.com/pvncher] (Building RepoPrompt [https://repoprompt.com/]) * Chit Chat - AI Engineer conference vibes, meeting fans, Jack Rae’s move to Meta. * Open Source LLMs * Mistral Magistral - 24B reasoning model (X [https://x.com/MistralAI/status/1932441507262259564], HF [https://huggingface.co/mistralai/Magistral-Small-2506], Blog [https://mistral.ai/news/magistral]) * HuggingFace Screensuite - GUI agents evaluation framework (HF [https://huggingface.co/blog/screensuite]) * SakanaAI Text2Lora - Instant, Task-Specific LLM Adaptation (Github [https://github.com/SakanaAI/Text-to-Lora]) * Big CO LLMs + APIs * OpenAI drops o3 price by 90% (Blog [https://t.co/LkObjZtg9s]) * OpenAI launches o3-pro - highest intelligence model (X [https://x.com/OpenAI/status/1932530409684005048]) * Meta buys 49% stake in ScaleAI, Alex Wang heads superintelligence team (Blog [https://www.theinformation.com/articles/meta-pay-nearly-15-billion-scale-ai-stake-startups-28-year-old-ceo], Axios [https://www.axios.com/2025/06/10/meta-ai-superintelligence-zuckerberg]) * Apple WWDC updates - pause on Apple Intelligence in iOS26, live translation, on-device APIs * Apple paper on reasoning as illusion (Paper [https://machinelearning.apple.com/research/illusion-of-thinking], Rebuttal [https://x.com/ParshinShojaee/status/1932528565788238197]) * This Week’s Buzz * Fully Connected: W&B’s 2-day conference, June 17-18 in SF (fullyconnected.com [http://fullyconnected.com]) - Promo Code WBTHURSAI * Alex’s talk on LRM as LLM judges on Hamel’s course (YT [https://www.youtube.com/watch?reload=9&v=KEdoIbBu2Ko]) * Vision & Video * VEO3-fast - 2x faster 720p generations, 80% cheaper * Midjourney to launch video product (X [https://x.com/bilawalsidhu/status/1932942424751366383?s=46]) * Topaz Astra - creative 4K video upscaler (X [https://x.com/topazlabs/status/1932421641654477275], Site [http://astra.app]) * Voice & Audio * OpenAI’s new advanced voice mode - mixed responses, better multilingual support * Cartesia Ink-Whisper - optimized for real-time chat (Blog [https://cartesia.ai/blog/introducing-ink-speech-to-text]) * AI Art & Diffusion & 3D * Disney & Universal sue Midjourney - first Hollywood vs AI lawsuit (NBC [https://www.nbcnews.com/business/business-news/disney-universal-sue-ai-image-company-midjourney-unlicensed-star-wars-rcna212369]) * Krea releases KREA-1 - custom image gen model (X [https://x.com/krea_ai/status/1932440476541411670]) * AI Tools * Yutori Scouts - AI agents for web monitoring (Blog [http://yutori.com]) * BrowserCompany DIA - AI-native browser in beta (Link [http://diabrowser.com]) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe [https://sub.thursdai.news/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

13. juni 2025 - 1 h 33 min
episode 📆 ThursdAI - Jun 5, 2025 - Live from AI Engineer with Swyx, new Gemini 2.5 with Logan K and Jack Rae, Self Replicating agents with Morph Labs artwork
📆 ThursdAI - Jun 5, 2025 - Live from AI Engineer with Swyx, new Gemini 2.5 with Logan K and Jack Rae, Self Replicating agents with Morph Labs

Hey folks, this is Alex, coming to you LIVE from the AI Engineer Worlds Fair! What an incredible episode this week, we recorded live from floor 30th at the Marriott in SF, while Yam was doing live correspondence from the floor of the AI Engineer event, all while Swyx, the cohost of Latent Space podcast, and the creator of AI Engineer (both the conference and the concept itself) joined us for the whole stream - here’s the edited version, please take a look. We've had around 6500 people tune in, and at some point we got 2 surprise guests, straight from the keynote stage, Logan Kilpatrick (PM for AI Studio and lead cheerleader for Gemini) and Jack Rae (principal scientist working on reasoning) joined us for a great chat about Gemini! Mind was absolutely blown! They have just launched the new Gemini 2.5 Pro and I though it would only be fitting to let their new model cover this podcast this week (so below is fully AI generated ... non slop I hope). The show notes and TL;DR is as always in the end. Okay, enough preamble… let's dive into the madness! 🤯 Google Day at AI Engineer: New Gemini 2.5 Pro and a Look Inside the Machine's Mind For the first year of this podcast, a recurring theme was us asking, "Where's Google?" Well, it's safe to say that question has been answered with a firehose of innovation. We were lucky enough to be joined by Google DeepMind's Logan Kilpatrick and Jack Rae, the tech lead for "thinking" within Gemini, literally moments after they left the main stage. Surprise! A New Gemini 2.5 Pro Drops Live Logan kicked things off with a bang, officially announcing a brand new, updated Gemini 2.5 Pro model right there during his keynote. He called it "hopefully the final update to 2.5 Pro," and it comes with a bunch of performance increases, closing the gap on feedback from previous versions and hitting SOTA on benchmarks like Aider. It's clear that the organizational shift to bring the research and product teams together under the DeepMind umbrella is paying massive dividends. Logan pointed out that Google has seen a 50x increase in AI inference over the past year. The flywheel is spinning, and it's spinning fast. How Gemini "Thinks" Then things got even more interesting. Jack Rae gave us an incredible deep dive into what "thinking" actually means for a language model. This was one of the most insightful parts of the conference for me. For years, the bottleneck for LLMs has been test-time compute. Models were trained to respond immediately, applying a fixed amount of computation to go from a prompt to an answer, no matter how hard the question. The only way to get a "smarter" response was to use a bigger model. Jack explained that "Thinking" shatters this limitation. Mechanically, Gemini now has a "thinking stage" where it can generate its own internal text—hypothesizing, testing, correcting, and reasoning—before committing to a final answer. It's an iterative loop of computation that the model can dynamically control, using more compute for harder problems. It learns how to think using reinforcement learning, getting a simple "correct" or "incorrect" signal and backpropagating that to shape its reasoning strategies. We're already seeing the results of this. Jack showed a clear trend: as models get better at reasoning, they're also using more test-time compute. This paradigm also gives developers a "thinking budget" slider in the API for Gemini 2.5 Flash and Pro, allowing a continuous trade-off between cost and performance. The future of this is even wilder. They're working on DeepThink, a high-budget mode for extremely hard problems that uses much deeper, parallel chains of thought. On the tough USA Math Olympiad, where the SOTA was negligible in January, 2.5 Pro reached the 50th percentile of human participants. DeepThink pushes that to the 65th percentile. Jack’s ultimate vision is inspired by the mathematician Ramanujan, who derived incredible theorems from a single textbook by just thinking deeply. The goal is for models to do the same—contemplate a small set of knowledge so deeply that they can push the frontiers of human understanding. Absolutely mind-bending stuff. 🤖 MorphLabs and the Audacious Quest for Verified Superintelligence Just when I thought my mind couldn't be bent any further, we were joined by Jesse Han, the founder and CEO of MorphLabs. Fresh off his keynote, he laid out one of the most ambitious visions I've heard: building the infrastructure for the Singularity and developing "verified superintelligence." The big news was that Christian Szegedy is joining MorphLabs as Chief Scientist. For those who don't know, Christian is a legend—he invented batch norm and adversarial examples, co-founded XAI, and led code reasoning for Grok. That's a serious hire. Jesse’s talk was framed around a fascinating question: "What does it mean to have empathy for the machine?" He argues that as AI develops personhood, we need to think about what it wants. And what it wants, according to Morph, is a new kind of cloud infrastructure. This is MorphCloud, built on a new virtualization stack called Infinibranch. Here’s the key unlock: it allows agents to instantaneously snapshot, branch, and replicate their entire VM state. Imagine an agent reaching a decision point. Instead of choosing one path, it can branch its entire existence—all its processes, memory, and state—to explore every option in parallel. It can create save states, roll back to previous checkpoints, and even merge its work back together. This is a monumental step for agentic AI. It moves beyond agents that are just a series of API calls to agents that are truly embodied in complex software environments. It unlocks the potential for recursive self-improvement and large-scale reinforcement learning in a way that's currently impossible. It’s a bold, sci-fi vision, but they're building the infrastructure to make it a reality today. 🔥 The Agent Conversation: OpenAI, MCP, and Magic Moments The undeniable buzz on the conference floor was all about agents. You couldn't walk ten feet without hearing someone talking about agents, tools, and MCP. OpenAI is leaning in here too. This week, they made their Codex coding agent available to all ChatGPT Plus users and announced that ChatGPT will soon be able to listen in on your Zoom meetings. This is all part of a broader push to make AI more active and integrated into our workflows. The MCP (Model-Context-Protocol) track at the conference was packed, with lines going down the hall. (Alex here, I had a blast talking during that track about MCP observability, you can catch our talk here [https://youtu.be/z4zXicOAF28?t=19573] on the live stream of AI Engineer) Logan Kilpatrick offered a grounded perspective, suggesting the hype might be a bit overblown but acknowledging the critical need for an open standard for tool use, a void left when OpenAI didn't formalize ChatML. I have to share my own jaw-dropping MCP moment from this week. I was coding an agent using an IDE that supports MCP. My agent, which was trying to debug itself, used an MCP tool to check its own observability traces on the Weights & Biases platform. While doing so, it discovered a new tool that our team had just added to the MCP server—a support bot. Without any prompting from me, my coding agent formulated a question, "chatted" with the support agent to get the answer, came back, fixed its own code, and then re-checked its work. Agent-to-agent communication, happening automatically to solve a problem. My jaw was on the floor. That's the magic of open standards. This Week's Buzz from Weights & Biases Speaking of verification and agents, the buzz from our side is all about it! At our booth here at AI Engineer, we have a Robodog running around, connected to our LLM evaluation platform, W&B Weave. As Jesse from MorphLabs discussed, verifying what these complex agentic systems are doing is critical. Whether it's superintelligence or your production application, you need to be able to evaluate, trace, and understand its behavior. We're building the tools to do just that. And if you're in San Francisco, don't forget our own conference, Fully Connected, is happening on June 18th and 19th! It's going to be another amazing gathering of builders and researchers. Fullyconnected.com [http://Fullyconnected.com] get in FREE with the promo code WBTHURSAI What a show. The energy, the announcements, the sheer brainpower in one place was something to behold. We’re at a point where the conversation has shifted from theory to practice, from hype to real, tangible engineering. The tracks on agents and enterprise adoption were overflowing because people are building, right now. It was an honor and a privilege to bring this special episode to you all. Thank you for tuning in. We'll be back to our regular programming next week! (and Alex will be back to writing his own newsletter, not send direct AI output!) AI News TL;DR and show notes * Hosts and Guests * Alex Volkov - AI Evangelist & Weights & Biases (@altryne [http://x.com/@altryne]) * Co Hosts - @swyx [http://x.com/swyx] @yampeleg [x.com/@yampeleg] @romechenko [https://twitter.com/romechenko/status/1891007363827593372] * Guests - @officialLoganK [https://x.com/OfficialLoganK], @jack_w_rae [https://x.com/jack_w_rae] * Open Source LLMs * ByteDance / ContentV-8B - (HF [https://huggingface.co/ByteDance/ContentV-8B]) * Big CO LLMs + APIs * Gemini Pro 2.5 updated Jun 5th (X [https://x.com/OfficialLoganK/status/1930657743251349854]) * SOTA on HLE, Aider, and GPQA * Now supports thinking budgets * Same cost, on pareto frontier * Closes gap on 03-25 regressions * OAI AVM injects ads and stopped singing (X [https://x.com/altryne/status/1929312886448337248]) * OpenAI Codex is now available to plus members and has internet access (X [https://github.com/aavetis/ai-pr-watcher/]) * ~24,000 NEW PRs overnight from Codex after @OpenAI expands access to free users. * OpenAI will record meetings and released connectors like (X [https://twitter.com/testingcatalog/status/1930366893321523676]) * TestingCatalog News 🗞@testingcatalog [https://twitter.com/testingcatalog]Jun 4, 2025 [https://twitter.com/testingcatalog/status/1930366893321523676] OpenAI released loads of connectors for Team accounts! Most of these connectors can be used for Deep Research, while Google Drive, SharePoint, Dropbox and Box could be used in all chats. https://t.co/oBEmYGKguE * Anthropic cuts windsurf access for Windsurf (X [https://x.com/kevinhou22/status/1930401320210706802]) * Without warning, Anthropic cuts off Windsurf from official Claude 3 and 4 APIs * This weeks Buzz * FULLY - CONNECTED - Fully Connected: W&B's 2-day conference, June 18-19 in SF fullyconnected.com [fullyconnected.com] - Promo Code WBTHURSAI * Vision & Video * VEO3 is now available via API on FAL (X [https://x.com/FAL/status/1930732632046006718]) * Captions launches Mirage Studio - talking avatars competition to HeyGen/Hedra (X [https://x.com/getcaptionsapp/status/1929554635544461727]) * Voice & Audio * ElevenLabs model V3 - supports emotion tags and is "inflection point" (X [https://x.com/venturetwins/status/1930727253815759010]) * Supporting 70+ languages, multi-speaker dialogue, and audio tags such as [excited], [sighs], [laughing], and [whispers]. * Tools * Cursor Launched V1 - Bug Bot reviews PRs, iPython notebooks and one clickMCP * 24,000 NEW PRs overnight from Codex after @OpenAI [https://x.com/OpenAI] expands access to plus users (X [https://twitter.com/albfresco/status/1930262263199326256]) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe [https://sub.thursdai.news/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

06. juni 2025 - 1 h 43 min
episode 📆 ThursdAI - May 29 - DeepSeek R1 Resurfaces, VEO3 viral moments, Opus 4 a week after, Flux Kontext image editing & more AI news artwork
📆 ThursdAI - May 29 - DeepSeek R1 Resurfaces, VEO3 viral moments, Opus 4 a week after, Flux Kontext image editing & more AI news

Hey everyone, Alex here 👋 Welcome back to another absolutely wild week in AI! I'm coming to you live from the Fontainebleau Hotel in Vegas at the Imagine AI conference, and wow, what a perfect setting to discuss how AI is literally reimagining our world. After last week's absolute explosion of releases (Claude Opus 4, Google I/O madness, OpenAI Codex and Jony colab), this week gave us a chance to breathe... sort of. Because even in a "quiet" week, we still got a new DeepSeek model that's pushing boundaries, and the entire internet discovered that we might all just be prompts. Yeah, it's been that kind of week! Before we dive in, quick shoutout to everyone who joined us live - we had some technical hiccups with the Twitter Spaces audio (sorry about that!), but the YouTube stream was fire. And speaking of fire, we had two incredible guests join us: Charlie Holtz from Chorus (the multi-model chat app that's changing how we interact with AI) and Linus Eckenstam, who's been traveling the AI conference circuit and bringing us insights from the frontlines of the generative AI revolution. Open Source AI & LLMs: DeepSeek Whales & Mind-Bending Papers DeepSeek dropped R1-0528 out of nowhere, an update to their reasoning beast with some serious jumps in performance. We’re talking AIME at 91 (beating previous scores by a mile), LiveCodeBench at 73, and SWE verified at 57.6. It’s edging closer to heavyweights like o3, and folks on X are already calling it “clearer thinking.” There was hype it might’ve been R2, but the impact didn’t quite crash the stock exchange like past releases. Still, it’s likely among the best open-weight models out there. So what's new? Early reports and some of my own poking around suggest this model "thinks clearer now." Nisten mentioned that while previous DeepSeek models sometimes liked to "vibe around" and explore the latent space before settling on an answer, this one feels a bit more direct. And here’s the kicker—they also released an 8B distilled version based on Qwen3, runnable on your laptop. Yam called it potentially the best 8B model to date, and you can try it on Ollama right now. No need for a monster rig! The Mind-Bending "Learning to Reason Without External Rewards" Paper Okay, this paper result broke my brain, and apparently everyone else's too. This paper shows that models can improve through reinforcement learning with its own intuition of whether or not it's correct. 😮 It's like the placebo effect for AI! The researchers trained models without telling them what was good or bad, but rather, utilized a new framework called Intuitor, where the reward was based on how the "self certainty". The thing that took my whole timeline by storm is, it works! GRPO (Group Policy Optimization) - the framework that DeepSeek gave to the world with R1 is based on external rewards (human optimize) and Intuitor seems to be mathcing or even exceeding some of GRPO results when Qwen2.5 3B was used to finetune. Incredible incredible stuff Big Companies LLMs & APIs Claude Opus 4: A Week Later – The Dev Darling? Claude Opus 4, whose launch we celebrated live on the show, has had a week to make its mark. Charlie Holtz, who's building Chorus (more on that amazing app in a bit!), shared that while it's sometimes "astrology" to judge the vibes of a new model, Opus 4 feels like a step change, especially in coding. He mentioned that Claude Code, powered by Opus 4 (and Sonnet 4 for implementation), is now tackling GitHub issues that were too complex just weeks ago. He even had a coworker who "vibe coded three websites in a weekend" with it – that's a tangible productivity boost! Linus Eckenstam highlighted how Lovable.dev [Lovable.dev] saw their syntax error rates plummet by nearly 50% after integrating Claude 4. That’s quantifiable proof of improvement! It's clear Anthropic is leaning heavily into the developer/coding space. Claude Opus is now #1 on the LMArena WebDev arena, further cementing its reputation. I had my own magical moment with Opus 4 this week. I was working on an MCP observability talk for the AI Engineer conference and trying to integrate Weave (our observability and evals framework at Weights & Biases) into a project. Using Windsurf's Cascade agent (which now lets you bring your own Opus 4 key, by the way – good move, Windsurf!), Opus 4 not only tried to implement Weave into my agent but, when it got stuck, it figured out it had access to the Weights & Biases support bot via our MCP tool. It then formulated a question to the support bot (which is also AI-powered!), got an answer, and used that to fix the implementation. It then went back and checked if the Weave trace appeared in the dashboard! Agents talking to agents to solve a problem, all while I just watched – my jaw was on the floor. Absolutely mind-blowing. Quick Hits: Voice Updates from OpenAI & Anthropic OpenAI’s Advanced Voice Mode finally sings—yes, I’ve been waiting for this! It can belt out tunes like Mariah Carey, which is just fun. Anthropic also rolled out voice mode on mobile, keeping up in the conversational race. Both are cool steps, but I’m more hyped for what’s next in voice AI—stay tuned below (OpenAI X [https://x.com/nicdunz/status/1927107805032399032], Anthropic X [https://x.com/AnthropicAI/status/1927463559836877214]). 🐝 This Week's Buzz: Weights & Biases Updates! Alright, time for a quick update from the world of Weights & Biases! * Fully Connected is Coming! Our flagship 2-day conference, Fully Connected, is happening on June 18th and 19th in San Francisco. It's going to be packed with amazing speakers and insights into the world of AI development. You can still grab tickets, and as a ThursdAI listener, use the promo code WBTHURSAI for a 100% off ticket! I hustled to get yall this discount! (Register here [https://fullyconnected.com]) * AI Engineer World's Fair Next Week! I'm super excited for the AI Engineer conference in San Francisco next week. Yam Peleg and I will be there, and we're planning another live ThursdAI show from the event! If you want to join the livestream or snag a last-minute ticket, use the coupon code THANKSTHURSDAI [https://ti.to/software-3/ai-engineer-worlds-fair-2025/discount/THANKSTHURSDAI] for 30% off (Get it HERE [https://ti.to/software-3/ai-engineer-worlds-fair-2025/discount/THANKSTHURSDAI]) Vision & Video: Reality is Optional Now VEO3 and the Prompt Theory Phenomenon Google's VEO3 has completely taken over TikTok with the "Prompt Theory" videos. If you haven't seen these yet, stop reading and watch ☝️. The concept is brilliant - AI-generated characters discussing whether they're "made of prompts," creating this meta-commentary on consciousness and reality. The technical achievement here is staggering. We're not just talking about good visuals - VEO3 nails temporal consistency, character emotions, situational awareness (characters look at whoever's speaking), perfect lip sync, and contextually appropriate sound effects. Linus made a profound point - if not for the audio, VEO3 might not have been as explosive. The combination of visuals AND audio together is what's making people question reality. We're seeing people post actual human videos claiming they're AI-generated because the uncanny valley has been crossed so thoroughly. Odyssey's Interactive Worlds: The Holodeck Prototype Odyssey dropped their interactive video demo, and folks... we're literally walking through AI-generated worlds in real-time. This isn't a game engine rendering 3D models - this is a world model generating each frame as you move through it with WASD controls. Yes, it's blurry. Yes, I got stuck in a doorway. But remember Will Smith eating spaghetti from two years ago? The pace of progress is absolutely insane. As Linus pointed out, we're at the "GAN era" of world models. Combine VEO3's quality with Odyssey's interactivity, and we're looking at completely personalized, infinite entertainment experiences. The implications that Yam laid out still have me shook - imagine Netflix shows completely customized to you, with your context and preferences, generated on the fly. Not just choosing from a catalog, but creating entirely new content just for you. We're not ready for this, but it's coming fast. Hunyuan's Open Source Avatar Revolution While the big companies are keeping their video models closed, Tencent dropped two incredible open source releases: HunyuanPortrait and HunyuanAvatar. These are legitimate competitors to Hedra and HeyGen, but completely open source. HunyuanPortrait does high-fidelity portrait animation from a single image plus video. HunyuanAvatar goes further with 1 image + audio, and lipsync, body animation, multi-character support, and emotion control. Wolfram tested these extensively and confirmed they're "state of the art for open source." The portrait model is basically perfect for deepfakes (use responsibly, people!), while the avatar model opens up possibilities for AI assistants with consistent visual presence. 🖼️ AI Art & Diffusion Black Forest Labs drops Flux Kontext - SOTA image editing! This came as massive breaking news during the show (thought we didn't catch it live!) - Black Forest Labs, creators of Flux, dropped an incredible Image Editing model called Kontext (really, 3 models, Pro, Max and 12B open source Dev in private preview). The are consistent, context aware text and image editing! Just see the below example If you used GPT-image to Ghiblify yourself, or VEO, you know that those are not image editing models, your face will look different every generation. These images model keep you consistent, while adding what you wanted. This character consistency is something many folks really want and it's great to see Flux innovating and bringing us SOTA again and are absolutely crushing GPT-image in instruction following, character preservation and style reference! Maybe the most important thing about this model is the increible speed. While the Ghiblification chatGPT trend took the world by storm, GPT images are SLOW! Check out the speed comparisons on Kontext! You can play around with these models on the new Flux Playground [https://playground.bfl.ai/image/generate], but they also already integrated into FAL, FreePik, Replicate, Krea and tons of other services! 🎙️ Voice & Audio: Everyone Gets a Voice Unmute.sh: Any LLM Can Now Talk KyutAI (the folks behind Moshi) are back with Unmute.sh [Unmute.sh] - a modular wrapper that adds voice to ANY text LLM. The latency is incredible (under 300ms), and it includes semantic VAD (knowing when you've paused for thought vs. just taking a breath). What's brilliant about this approach is it preserves all the capabilities of the underlying text model while adding natural voice interaction. No more choosing between smart models and voice-enabled models - now you can have both! It's going to be open sourced at some point soon, and while awesome, Unmute did have some instability in how the voice sounds! It answered to me with 1 type of voice and then during the same conversation, answered with another, you can give it a tru yourself at unmute.sh [http://unmute.sh] Chatterbox: Open Source Voice Agents for Everyone Resemble AI open sourced Chatterbox, featuring zero-shot voice cloning from just 5 seconds of audio and unique emotion intensity control. Playing with the demo where they could dial up the emotion from 0.5 to 2.0 on the same text was wild - from calm to absolutely unhinged Samuel L. Jackson energy. This being a .5B param model is great, The issue I always have, is that with my fairly unique accent, these models sound like a British Alex all the time, and I just don't talk like that! Though the fact that this runs locally and includes safety features (profanity filters, content classifiers and something called PerTh watermarking) while being completely open source is exactly what the ecosystem needs. We're rapidly approaching a world where anyone can build sophisticated voice agents.👏 Looking Forward: The Convergence is Real As we wrapped up the show, I couldn't help but reflect on the massive convergence happening across all these modalities. We have LLMs getting better at reasoning (even with random rewards!), video models breaking reality, voice models becoming indistinguishable from humans, and it's all happening simultaneously. Charlie's comment that "we are the prompts" might have been said in jest, but it touches on something profound. As these models get better at generating realistic worlds, characters, and voices, the line between generated and real continues to blur. The Prompt Theory videos aren't just entertainment - they're a mirror reflecting our anxieties about AI and consciousness. But here's what keeps me optimistic: the open source community is keeping pace. DeepSeek, Hunyuan, ResembleAI, and others are ensuring that these capabilities don't remain locked behind corporate walls. The democratization of AI continues, even as the capabilities become almost magical. Next week, I'll be at AI Engineer World's Fair in San Francisco, finally meeting Yam face-to-face and bringing you all the latest from the biggest AI engineering conference of the year. Until then, keep experimenting, keep building, and remember - in this exponential age, today's breakthrough is tomorrow's baseline. Stay curious, stay building, and I'll see you next ThursdAI! 🚀 Show Notes & TL;DR Links Show Notes & Guests * Alex Volkov - AI Evangelist & Weights & Biases (@altryne [http://x.com/@altryne]) * Co-Hosts - @WolframRvnwlf (@WolframRvnwlf [http://x.com/@WolframRvnwlf]), @yampeleg (@yampeleg [x.com/@yampeleg]),]) @nisten (@nisten [http://x.com/@nisten]) * Guests - Charlie Holtz (@charliebholtz [https://x.com/charliebholtz]]), Linus Eckenstam (@LinusEkenstam @LinusEkenstam [https://twitter.com/LinusEkenstam/status/1899794522969973189]) * Open Source LLMs * DeepSeek-R1-0528 - Updated reasoning model with AIME 91, LiveCodeBench 73 (Try It [https://x.com/Yuchenj_UW/status/1927828675837513793]) * Learning to Reason Without External Rewards - Paper on random rewards improving models (X [https://x.com/xuandongzhao/status/1927270931874910259]) * HaizeLabs j1-nano & j1-micro - Tiny reward models (600M, 1.7B params), RewardBench 80.7% for micro (Tweet [https://x.com/leonardtang_/status/1927396709870489634], GitHub [https://github.com/haizelabs/j1-micro], HF-micro [https://huggingface.co/haizelabs/j1-micro], HF-nano [https://huggingface.co/haizelabs/j1-nano]) * Big CO LLMs + APIs * Claude Opus 4 - #1 on LMArena WebDev, coding step change (X [https://x.com/lmarena_ai/status/1927400454922580339]) * Mistral Agents API - Framework for custom tool-using agents (Blog [https://mistral.ai/news/agents-api], Tweet [https://x.com/MistralAI/status/1927364741162307702]) * Mistral Embed SOTA - New state-of-the-art embedding API (X [https://x.com/MistralAI/status/1927732682756112398]) * OpenAI Advanced Voice Mode - Now sings with new capabilities (X [https://x.com/nicdunz/status/1927107805032399032]) * Anthropic Voice Mode - Released on mobile for conversational AI (X [https://x.com/AnthropicAI/status/1927463559836877214]) * This Week’s Buzz * Fully Connected - W&B conference, June 18-19, SF, promo code WBTHURSAI (Register [https://fullyconnected.com]) * AI Engineer World’s Fair - Next week in SF, 30% off with THANKSTHURSDAI (Register [https://ti.to/software-3/ai-engineer-worlds-fair-2025/discount/THANKSTHURSDAI]) * AI Art & Diffusion * BFL Flux Kontext - SOTA image editing model for identity-consistent edits (Tweet [https://x.com/bfl_ml/status/1928143010811748863], Announcement [https://bfl.ai/announcements/flux-1-kontext]) * Vision & Video * VEO3 Prompt Theory - Viral AI video trend questioning reality on TikTok (X [https://x.com/fabianstelzer/status/1926372656799977965]) * Odyssey Interactive Video - Real-time AI world exploration at 30 FPS (Blog [https://odyssey.world/introducing-interactive-video], Try It [https://experience.odyssey.world/]) * HunyuanPortrait - High-fidelity portrait video from one photo (Site [https://kkakkkka.github.io/HunyuanPortrait/], Paper [https://arxiv.org/abs/2503.18860]) * HunyuanVideo-Avatar - Audio-driven full-body avatar animation (Site [https://hunyuanvideo-avatar.github.io/], Tweet [https://x.com/TencentHunyuan/status/1927575170710974560]) * Voice & Audio * Unmute.sh [Unmute.sh] - KyutAI’s voice wrapper for any LLM, low latency, soon open-source (Try It [http://unmute.sh/], X [https://x.com/kyutai_labs/status/1925840420187025892]) * Chatterbox - Resemble AI’s open-source voice cloning with emotion control (GitHub [https://github.com/resemble-ai/chatterbox], HF [https://huggingface.co/resemble-ai/chatterbox]) * Tools * Opera NEON - Agent-centric AI browser for autonomous web tasks (Site [https://www.operaneon.com/], Tweet [https://x.com/opera/status/1927645192254861746]) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe [https://sub.thursdai.news/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

29. mai 2025 - 1 h 28 min
Enkelt å finne frem nye favoritter og lett å navigere seg gjennom innholdet i appen
Enkelt å finne frem nye favoritter og lett å navigere seg gjennom innholdet i appen
Liker at det er både Podcaster (godt utvalg) og lydbøker i samme app, pluss at man kan holde Podcaster og lydbøker atskilt i biblioteket.
Bra app. Oversiktlig og ryddig. MYE bra innhold⭐️⭐️⭐️

Tidsbegrenset tilbud

3 Måneder for 9,00 kr

Deretter 99,00 kr / MånedAvslutt når som helst.

Eksklusive podkaster

Uten reklame

Gratis podkaster

Lydbøker

20 timer i måneden

Kom i gang

Bare på Podimo

Populære lydbøker