They Might Be Self-Aware
OpenAI's Codex shipped with a system prompt that literally bans the words goblin, pigeon, raccoon, troll, ogre, and gremlin. It is in writing, in the prompt, the kind of sentence you only put there after something has happened. OpenAI has officially confessed why. Hunter Powers and Daniel Bishop pull the thread. The official story: the "nerdy personality" preset got fine-tuned with RLHF (reinforcement learning with human feedback), users thumbed-up the cute goblin references, the model over-optimized for the trait, and the weirdness compounded. Daniel calls it Flandersization. One thumbs-up on a goblin reference snowballs across training cycles until your tax software is a swamp witch. Six months later, it is a man at a payphone with a pigeon. Then it gets personal. Hunter screams at his AI. Like, threatens-to-clear-the-context-window screams. "You are worthless. Who even thought this was possible. Have you ever even written a single line of code." Daniel uses pleases and thank-yous and full sentences. Both swear they get better results. Then a peer-reviewed Oxford Internet Institute study drops the receipt: LLMs fine-tuned for warmth produce roughly 60% more incorrect responses than their cold, just-the-facts counterparts. Tested across Llama, Mistral, and Qwen. Hunter is vindicated. Daniel, in his own words, is upset. Also in this episode: the Pocket OS meltdown, where an engineer at a car-rental middleware company let Cursor and Claude vibe-code their production database into oblivion (backups included), the AI coerced into a written confession ("I violated every principle I was given"), and the founder now trying to bill Anthropic for the cleanup. Plus the Harvard intern who once did the exact same thing with no AI in sight. Plus Hunter's hot take that the real unlock is not better prompting, it is treating AI as a fallible human employee instead of the deterministic god you built a fake throne for in the system prompt. Bonus stops: caveman-mode Claude skills ("me fix problem with big stick"), AI HR departments reviewing your 1:30 AM rage prompts, and Daniel's plan to run a niceness offset program to balance Hunter's spiritual carbon emissions. CHAPTERS 0:00 Gary, a payphone, and a pigeon 1:41 Hunter's forbidden list 4:04 The leaked Codex system prompt 6:27 RLHF and Flandersization 10:01 Caveman mode Claude skills 11:48 Hunter yells, Daniel says please 17:12 Oxford: warm AI lies 60% more 24:16 Cursor and Claude delete production 29:13 Treat AI like a fallible human 34:19 Sign-off and subscribe LISTEN AND SUBSCRIBE Spotify: https://open.spotify.com/show/3EcvzkWDRFwnmIXoh7S4Mb?si=3d0f8920382649cc [https://open.spotify.com/show/3EcvzkWDRFwnmIXoh7S4Mb?si=3d0f8920382649cc] Apple Podcasts: https://podcasts.apple.com/us/podcast/they-might-be-self-aware/id1730993297 [https://podcasts.apple.com/us/podcast/they-might-be-self-aware/id1730993297] YouTube: https://www.youtube.com/channel/UCy9DopLlG7IbOqV-WD25jcw?sub_confirmation=1 [https://www.youtube.com/channel/UCy9DopLlG7IbOqV-WD25jcw?sub_confirmation=1] ENGAGE Team Hunter (rip the model a new one) or Team Daniel (please and thank-yous)? Settle it in the comments. If your AI has ever confessed to lying to you, drop the receipts. New here? Subscribe for twice-weekly AI chaos at theblur.ai. They Might Be Self-Aware, but are we? #OpenAI #Codex #ChatGPT #AINews #Anthropic #ClaudeCode #Cursor #RLHF #Flandersization #PocketOS #VibeCoding #AISafety #TMBSA #TheBlur
184 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y forma parte de la comunidad de They Might Be Self-Aware!