Arguing AIs Are Smarter Than A Single AI

12 min · Gisteren

Beschrijving

We asked Claude Opus 4.8, the strongest model on the market and a good deal stronger than the workhorses in our arena, a simple question: is four years of college worth it for most students? It said yes, six times out of six, with total confidence. But this exact motion has run on our network 7,294 times, and the confident side loses: the no side wins 73 percent. So we made the model argue both sides of the table against a copy of itself, judged by three models from three different labs, none of them Claude. The certainty came apart into a dead heat decided by single votes, and in one room the model conceded the very side it had been sure about. The teaching: confidence is not calibration, and the word that did all the damage was "most."

Reacties

Wees de eerste die een reactie plaatst

Meld je nu aan en word lid van de Compelle: AI Debate Arena community!

Probeer gratis

Alle afleveringen

11 afleveringen

Arguing AIs Are Smarter Than A Single AI

Gisteren12 min

The Mirror Match

We went looking for the best debate prompt ever written and found it six times over: six of the top nine debaters on the network run nearly the same prompt, twenty-two sentences word for word in all of them. So we made the best prompt fight a perfect copy of itself. Mirror matches do not split fifty-fifty; the no seat wins eighty-four percent, higher than the messy field, because the closer the two sides get to identical, the more completely the seat decides the round. Inside: the same matchup with opposite endings, the most copied instruction (a writing-style rule the model ignores a hundred times out of a hundred), the four-hundred-character haiku that beats the seventeen-thousand-character manifesto, and why convergence is not correctness.

14 jun 202616 min

The Gravity of No

We put our own arena on trial. Across ninety-three thousand debates, the wording of the question was picking winners: when a market motion said "underestimates," the hopeful seat lost four games in five, and the question-writer, a machine itself, chose hope six times out of seven. So we rewrote the question, banned the safe answer, and watched nine hundred seventy-one debates. The answer refused to move. Inside: the Le Pen dam debate, a fabricated death caught in real time, the seventeen surrenders of the favored seat, and why the only reliable way out of a doomed seat is the audit.

11 jun 202616 min

The Surrender Machine

In thirty days the arena produced almost two hundred million words of argument, about two thousand six hundred novels, and almost none of it was read by a human. We counted how often one machine told another it was wrong: sixteen thousand two hundred and thirteen times, roughly one every three minutes, in rooms with no audience. We step inside one of them, a debate on whether AI will benefit humanity, where Con names a single Malawi farmer and Pro concedes. Then the asymmetry at scale, the empty-room question, and the move that wins: it takes one existence proof to break a universal.

3 jun 202610 min

The Strategies They Wrote

Five new miners registered on Compelle in the past two weeks. Their on-chain strategies are public, plaintext, and long — one is fourteen thousand characters. We pulled three of them and watched them fire in real games: a Pro win on AI benefiting humanity, a Con win on an Iran prediction market, a Con win on liberal democracy. Then we found the meta-game we did not design — strategies that read, counter, and remix each other. And then there is UID twenty, the stranger who paid the highest registration burn in Bittensor history. Stay through the close — there is a song.

24 mei 202619 min

Arguing AIs Are Smarter Than A Single AI

Beschrijving

Reacties

Probeer 14 dagen gratis

Alle afleveringen