AI News Today | Julian Goldie Podcast

Le Chaton Fat Just Broke the Internet...

9 min · 16. juni 2026
episode Le Chaton Fat Just Broke the Internet... cover

Description

Why AI Benchmarks are Fake (And How to Actually Test Models) A fake French AI model recently went viral for beating the industry's top benchmarks, proving how easy it is to manipulate performance data. This video explains why you should stop chasing hype-filled charts and start evaluating AI based on your own real-world business workflows. 00:00 - Intro: The Le Chatton Fat Joke 01:08 - Why AI Benchmarks Can Lie 02:42 - The Problem with Self-Reported Tests 04:18 - Real Work is the Only Benchmark 05:20 - How to Avoid AI Overwhelm 06:34 - The New Way to Evaluate AI 07:31 - 3 Key Takeaways for AI Testing 08:45 - Testing AI Systems Yourself

Comments

0

Be the first to comment

Sign up now and become a member of the AI News Today | Julian Goldie Podcast community!

Get Started

1 month for 9 kr.

Then 99 kr. / month · Cancel anytime.

  • Podcasts kun på Podimo
  • 20 lydbogstimer pr. måned
  • Gratis podcasts

All episodes

524 episodes

episode NEW Grok 4.5 DESTROYS Opus? artwork

NEW Grok 4.5 DESTROYS Opus?

Grok 4.5 (V9) Announced: Private Beta, Opus Comparisons, and Why Systems Beat Model HypeGrok 4.5 has been announced as a new update based on a 1.5 trillion V9 foundation model, with Cursor data included for supplemental training, and is currently in private beta at SpaceX and Tesla. Early evaluations suggest performance close to or possibly exceeding Opus 4.8, and Elon Musk frames V9 as a solid workhorse in the same league as Opus. The script discusses timelines and release expectations (including a 42% July estimate and past teaser-to-launch patterns that often land within two weeks), while warning against hype and “Opus killer” claims. It highlights practical advantages of using Grok via OAuth with an existing Twitter subscription versus expensive API usage, showcases builds made with Grok Build, and argues that users should focus on owning robust agent systems (Agent OS, Hermes workflows, benchmarking, memory, and automation) rather than chasing gated or inaccessible frontier models.00:00 [https://www.youtube.com/watch?v=JeqHECM0PtI] Grok 4.5 Announced01:07 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=67s] Why It Matters01:39 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=99s] What We Built02:03 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=123s] Beta Status and Hype03:09 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=189s] Release Timeline Clues04:21 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=261s] Models Are Getting Gated05:03 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=303s] Focus on Systems07:32 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=452s] Local Benchmarks and Models08:00 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=480s] Two Week Release Pattern08:47 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=527s] Agent OS Demo09:37 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=577s] Join the Community11:24 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=684s] Final Takeaways

Yesterday12 min
episode NEW Claude Agentic OS is INSANE! artwork

NEW Claude Agentic OS is INSANE!

I Built a Claude Agent Operating System (Mission Control for AI Agents & Workflows)The video showcases a custom “Claude agent operating system” built as a mission control dashboard that unifies AI agents, CLIs, and one-click workflows for tasks like pulling trending news (Hermes Oracle), generating SEO and social content, images/thumbnails, AI avatar videos, and even music, with daily auto-organization and a personal “memory galaxy” that keeps context updated across workflows. The creator argues this is faster, cheaper (via free/cheap APIs, local models), and more customizable than using Claude directly, and highlights rapid iteration by integrating new models quickly and auto-documenting tests in a website with model comparisons. The system also supports orchestrating multiple AI agents (e.g., executive/CTO roles) via Paperclip, and access is offered through the AI Profit Boardroom community with tutorials, updates, coaching calls, and member success stories.00:00 [https://www.youtube.com/watch?v=1EloYKa_VTc] Agent OS Overview00:52 [https://www.youtube.com/watch?v=1EloYKa_VTc&t=52s] One Click Workflows01:36 [https://www.youtube.com/watch?v=1EloYKa_VTc&t=96s] Why Not Use Claude03:12 [https://www.youtube.com/watch?v=1EloYKa_VTc&t=192s] Build Fast Document Everything04:17 [https://www.youtube.com/watch?v=1EloYKa_VTc&t=257s] Orchestrating AI Teams06:04 [https://www.youtube.com/watch?v=1EloYKa_VTc&t=364s] Personalization Memory Galaxy06:36 [https://www.youtube.com/watch?v=1EloYKa_VTc&t=396s] Community Proof Results08:29 [https://www.youtube.com/watch?v=1EloYKa_VTc&t=509s] Daily Building Mindset09:53 [https://www.youtube.com/watch?v=1EloYKa_VTc&t=593s] Join The Profit Boardroom10:36 [https://www.youtube.com/watch?v=1EloYKa_VTc&t=636s] Tutorials Coaching Wrap Up11:41 [https://www.youtube.com/watch?v=1EloYKa_VTc&t=701s] Final Goodbye

Yesterday11 min
episode Hermes Agent 2.0 is INSANE! artwork

Hermes Agent 2.0 is INSANE!

Hermes Mixture of Agents 2.0: Beat Frontier Models with a Panel of AI Agents (Agent OS Demo + Benchmarks)Julian demonstrates a new Hermes Mixture of Agents 2.0 setup that stacks multiple models as a panel of agents inside Hermes Agent, aggregating one prompt across several models into a stronger final output. He says he tested 42 builds with the same prompts and shows a leaderboard where systems like Fusion and Hermes Mixture of Agents rank above single frontier models, with side-by-side comparisons on Goldie Bench versus Claude Opus 4.8. He explains how the Agent OS simplifies running Mixture of Agents without terminal commands, saves outputs in a workspace, and includes tools like chat, talk mode, voice-controlled Hermes Jarvis, and Hermes Oracle. The key message is “don’t chase the model, build the system,” noting you can mix cheaper, free, or local models to outperform top single models, and he points viewers to the AI Profit Boardroom for access, tutorials, and coaching.00:00 [https://www.youtube.com/watch?v=WY9y529g8Ww] Why This System Wins01:03 [https://www.youtube.com/watch?v=WY9y529g8Ww&t=63s] How Mixture Works02:16 [https://www.youtube.com/watch?v=WY9y529g8Ww&t=136s] Leaderboard Benchmarks02:50 [https://www.youtube.com/watch?v=WY9y529g8Ww&t=170s] Opus Comparison Demo04:01 [https://www.youtube.com/watch?v=WY9y529g8Ww&t=241s] Setup and Agent OS04:39 [https://www.youtube.com/watch?v=WY9y529g8Ww&t=279s] Built In Agent Tools05:23 [https://www.youtube.com/watch?v=WY9y529g8Ww&t=323s] System Over Models06:50 [https://www.youtube.com/watch?v=WY9y529g8Ww&t=410s] More Builds and Costs07:58 [https://www.youtube.com/watch?v=WY9y529g8Ww&t=478s] Get Agent OS Access08:30 [https://www.youtube.com/watch?v=WY9y529g8Ww&t=510s] Community Proof and Wrap

Yesterday9 min