Claude Opus 4.8 Is Out. The Benchmark Numbers Aren't the Story.

6 min · Ayer

Descripción

Anthropic dropped Opus 4.8 yesterday — same price, better coding scores, and a four-fold reduction in silent code bugs. But the real headline is alignment: Opus 4.8 scores at near-Mythos levels on misalignment metrics, quietly bringing the restricted model's safety profile into the general tier. Plus: Figure AI's robots sorted 250,000 packages in 200 hours with zero failures, and California's AI legislation just hit its crossover deadline with thirty bills in play and no federal law in sight.

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de AI First Pod!

Prueba gratis

Todos los episodios

126 episodios

Amazon Built a $20 Billion Chip Empire. Almost Nobody Is Talking About It.

Amazon's custom chip business just crossed $20 billion in annual revenue, growing triple digits year over year — with $225 billion in committed Trainium contracts from Anthropic, OpenAI, Meta, and Uber. Andy Jassy says if it sold chips externally like Nvidia, it would be worth $50 billion a year. We break down what this means for the AI infrastructure race, cover Anthropic's six-European-offices-in-twelve-months expansion, and preview Microsoft Build on Monday.

31 de may de 20265 min

Claude Opus 4.8 Is Out. The Benchmark Numbers Aren't the Story.

Ayer6 min

Three of the Big Four Are Now Running on Claude. Here's Why That's a Bigger Deal Than It Sounds.

KPMG just deployed Claude to 276,000 employees across 138 countries — joining Deloitte's 470,000 and PwC's expanded rollout. Together, three of the four largest professional services firms on earth are now running on Anthropic. We break down what KPMG's deal actually does, why the Big Four are a distribution flywheel that no benchmark can measure, preview Microsoft Build next Monday, and close out the most extraordinary week in AI news this year.

29 de may de 20266 min

AI Found 10,000 Critical Software Bugs in One Month. Patches Can't Keep Up.

Anthropic's Project Glasswing one-month update is out — Claude Mythos Preview helped fifty partner organizations find over 10,000 high and critical vulnerabilities in production software, including a 27-year-old OpenBSD bug. Some organizations can't patch them as fast as they're being found. We break down what that means for the security landscape, cover the MIT study showing AI has nearly doubled federal pro se lawsuit filings, and close out the most extraordinary ten days in AI news this year.

28 de may de 20265 min

China Just Put Its Top AI Researchers Under Travel Restrictions.

Bloomberg broke this morning that China is restricting overseas travel for top AI professionals at Alibaba and DeepSeek — requiring government approval before any international trip. It's the human capital version of U.S. chip export controls, and it signals China is done treating AI talent as free agents. We cover what it means for the global AI race, Anthropic's $200 million Gates Foundation bet on AI for the developing world, and a Samsung union's landmark court challenge to AI-driven layoffs.

27 de may de 20263 min

Claude Opus 4.8 Is Out. The Benchmark Numbers Aren't the Story.

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios