When AI Sounds Reasonable
This episode maps abstract concerns about norm prediction onto specific alignment techniques used in modern AI systems. I examine how reinforcement learning from human feedback, safety fine-tuning, content policies, and worst-case optimization systematically reward norm compliance over precision. None of these techniques are malicious in isolation, but together they produce systems that substitute safer arguments for accurate answers. This episode makes the case that alignment is not merely technical optimization, but governance implemented through design choices. Topics covered: * RLHF and preference aggregation * Safety fine-tuning and scope broadening * Content policies as latent priors * Worst-case optimization * How power emerges from technical systems This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit richyreay.substack.com [https://richyreay.substack.com?utm_medium=podcast&utm_campaign=CTA_1]
10 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de When AI Sounds Reasonable!