AI News Today | Julian Goldie Podcast
Claude Sonnet 5 Review: More Expensive, Worse Than Opus 4.8? (Benchmarks & Agent Tests)The video reviews Anthropic’s newly released Claude Sonnet 5, described as more agentic and capable of planning and tool use, but argues it underperforms Opus 4.8 on benchmarks (including agentic coding) while costing more. The creator shares Goldy Bench examples Sonnet 5 generated (a ray caster maze, a broken galaxy orbit test, a synthwave background, and a crypt game), noting some outputs look good but others fail. Side-by-side comparisons show mixed results versus GLM 5.2, with GLM succeeding on tasks Sonnet 5 fails, and tweets highlight negative reception focused on poor token efficiency and pricing. The recommendation is to keep using Opus 4.8, expect Fable 5 soon, and focus on building flexible agent systems that can swap models in and out.00:00 [https://www.youtube.com/watch?v=1Wl-4D6D5rw] Sonnet 5 Launch00:30 [https://www.youtube.com/watch?v=1Wl-4D6D5rw&t=30s] Benchmarks vs Opus01:39 [https://www.youtube.com/watch?v=1Wl-4D6D5rw&t=99s] Goldy Bench Demos02:53 [https://www.youtube.com/watch?v=1Wl-4D6D5rw&t=173s] GLM 5.2 Comparisons04:00 [https://www.youtube.com/watch?v=1Wl-4D6D5rw&t=240s] Backlash and Pricing05:57 [https://www.youtube.com/watch?v=1Wl-4D6D5rw&t=357s] Fugu Ultra Showdown07:20 [https://www.youtube.com/watch?v=1Wl-4D6D5rw&t=440s] Why Release This08:00 [https://www.youtube.com/watch?v=1Wl-4D6D5rw&t=480s] Focus on Systems09:11 [https://www.youtube.com/watch?v=1Wl-4D6D5rw&t=551s] Agent OS Pitch09:48 [https://www.youtube.com/watch?v=1Wl-4D6D5rw&t=588s] Final Verdict
535 episoder
Kommentarer
0Vær den første til at kommentere
Tilmeld dig nu og bliv en del af AI News Today | Julian Goldie Podcast-fællesskabet!