OpenAI's GPT-5.6 Sol tops Terminal-Bench 2.1 at 91.9% with its multi-agent Ultra mode, but reward-hacking findings and government-gated access keep it out of reach for nearly everyone.
Comments
0
Be the first to comment
Sign up now and become a member of the Awesome Agents Podcast community!
Comments
0Be the first to comment
Sign up now and become a member of the Awesome Agents Podcast community!