AI News Today | Julian Goldie Podcast
GLM 52 vs Qwen 37 Max vs Claude Opus 48: Real-World Tests vs Benchmarks (No Second Chances)The episode compares GLM 52 (ZAI), Qwen 37 Max (Alibaba), and Claude Opus 48 (Anthropic) head-to-head on five one-shot tasks, arguing that benchmark rankings didn’t match real usability. In coding-focused tests like a voxel runner game, a liquid-in-a-bowl animation, a business landing page, and an arcade game, GLM 52 produced the most fun, polished, and feature-rich results, while Claude’s outputs were often basic and Qwen’s were sometimes buggy or incomplete; Claude clearly won the solar-system orbit map task. The script also notes Qwen’s strong reported benchmarks and faster replies, GLM’s slower responses in agents but strong CLI coding, and highlights limitations integrating Claude into agent workflows compared to Qwen/GLM in Hermes and the creator’s agent operating system.00:00 [https://www.youtube.com/watch?v=MnFwz8O3F-U] Head To Head Setup01:27 [https://www.youtube.com/watch?v=MnFwz8O3F-U&t=87s] Coding Tests Results04:09 [https://www.youtube.com/watch?v=MnFwz8O3F-U&t=249s] Arcade Game Showdown04:50 [https://www.youtube.com/watch?v=MnFwz8O3F-U&t=290s] Benchmarks Versus Reality06:01 [https://www.youtube.com/watch?v=MnFwz8O3F-U&t=361s] Agents Workflow Tradeoffs07:59 [https://www.youtube.com/watch?v=MnFwz8O3F-U&t=479s] Final Recommendations
496 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de AI News Today | Julian Goldie Podcast!