BotCity
Anthropic just dropped a new flagship model 41 days after the last one, which is not a cadence, it is a correction, and the gap between those two sentences is the whole story. Opus 4.8 genuinely leads the field on SWE-Bench Pro at 69.2% versus GPT-5.5's 58.6%, and the codebase migration claim is the most operationally significant thing any AI lab has shipped this year if it holds in production. But the company is also sitting on its most advanced model over unspecified cybersecurity concerns, has not explained what broke in Opus 4.7, and is now asking enterprise teams to run hundreds of parallel subagents on production code, so the benchmark table is not the whole question here.
54 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de BotCity!