Le Chaton Fat Just Broke the Internet...

Description

Why AI Benchmarks are Fake (And How to Actually Test Models) A fake French AI model recently went viral for beating the industry's top benchmarks, proving how easy it is to manipulate performance data. This video explains why you should stop chasing hype-filled charts and start evaluating AI based on your own real-world business workflows. 00:00 - Intro: The Le Chatton Fat Joke 01:08 - Why AI Benchmarks Can Lie 02:42 - The Problem with Self-Reported Tests 04:18 - Real Work is the Only Benchmark 05:20 - How to Avoid AI Overwhelm 06:34 - The New Way to Evaluate AI 07:31 - 3 Key Takeaways for AI Testing 08:45 - Testing AI Systems Yourself

NEW Grok 4.5 DESTROYS Opus?

Grok 4.5 (V9) Announced: Private Beta, Opus Comparisons, and Why Systems Beat Model HypeGrok 4.5 has been announced as a new update based on a 1.5 trillion V9 foundation model, with Cursor data included for supplemental training, and is currently in private beta at SpaceX and Tesla. Early evaluations suggest performance close to or possibly exceeding Opus 4.8, and Elon Musk frames V9 as a solid workhorse in the same league as Opus. The script discusses timelines and release expectations (including a 42% July estimate and past teaser-to-launch patterns that often land within two weeks), while warning against hype and “Opus killer” claims. It highlights practical advantages of using Grok via OAuth with an existing Twitter subscription versus expensive API usage, showcases builds made with Grok Build, and argues that users should focus on owning robust agent systems (Agent OS, Hermes workflows, benchmarking, memory, and automation) rather than chasing gated or inaccessible frontier models.00:00 [https://www.youtube.com/watch?v=JeqHECM0PtI] Grok 4.5 Announced01:07 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=67s] Why It Matters01:39 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=99s] What We Built02:03 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=123s] Beta Status and Hype03:09 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=189s] Release Timeline Clues04:21 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=261s] Models Are Getting Gated05:03 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=303s] Focus on Systems07:32 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=452s] Local Benchmarks and Models08:00 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=480s] Two Week Release Pattern08:47 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=527s] Agent OS Demo09:37 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=577s] Join the Community11:24 [https://www.youtube.com/watch?v=JeqHECM0PtI&t=684s] Final Takeaways

Yesterday12 min

Le Chaton Fat Just Broke the Internet...

Description

Comments

1 month for 9 kr.

All episodes