Rubber Duck Radio

Where Open Source LLMs Are Actually Ahead

1 h 0 min · 2 de may de 2026
Portada del episodio Where Open Source LLMs Are Actually Ahead

Descripción

Open source LLMs just hit a stunning milestone: Kimi K2.6 tied GPT-5.5 on the industry's toughest coding benchmark — and costs a fraction of the price to run. But this episode goes beyond the headlines to unpack where open source models still trail proprietary ones, why the new Temporal API is finally fixing JavaScript's 30-year date nightmare, and a growing concern that AI-driven development and the trend toward closed-source licensing could starve the open source commons that made all of this innovation possible in the first place. From production AI economics to the future of web framework innovation, Tim and Paul explore what the numbers actually mean for developers building real systems today.

Comentarios

0

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de Rubber Duck Radio!

Prueba gratis

Empieza 7 días de prueba

$99 / mes después de la prueba. · Cancela cuando quieras.

  • Podcasts solo en Podimo
  • 20 horas de audiolibros al mes
  • Podcast gratuitos

Todos los episodios

15 episodios

episode GPT-5.5 vs Reality: Do Benchmarks Lie? artwork

GPT-5.5 vs Reality: Do Benchmarks Lie?

Tim and Paul dissect the GPT-5.5 launch, weighing state-of-the-art benchmarks against real-world user vibes and token efficiency to determine if the upgrade is truly worth the increased cost for developers building production workloads at scale. They also unpack the groundbreaking HTML-in-Canvas proposal that promises to bridge the DOM and canvas rendering gap, unlocking new possibilities for accessibility, interactive web graphics, and shader-driven transitions without fragile hacks. Finally, Tim reveals exclusive results from a unique creative AI benchmark testing model taste and planning, exposing surprising winners beyond standard leaderboards and proving that real-world performance often diverges significantly from the spec sheet while highlighting which models possess the creative judgment required for complex multi-step tasks without hand-holding.

25 de abr de 20261 h 0 min