Mike Vaiana: What is AI Alignment, and Why Should You Care? (Part II)
In this episode, James is joined again by Mike Vaiana, R&D Director at AE Studio, for part two of their conversation on AI alignment. Where part one motivated why alignment matters, this episode goes a layer deeper into what alignment research actually is and how the work gets done day to day.
Mike walks through the main branches of the field: mechanistic interpretability, evaluations, and control. He explains why AE deliberately bets on neglected approaches rather than putting all its eggs in the mech interp basket, and why eval awareness, persona drift, and emergent misalignment make this harder than it looks from the outside. James and Mike trace the METR task-completion time horizon doubling curve and what a four-to-seven-month doubling time really implies when extrapolated out a few years.
The conversation gets concrete on what already goes wrong with today's models. They cover the Anthropic blackmail evaluation, specification gaming and reward hacking, and the emergent misalignment result where fine-tuning a model on a small amount of bad medical advice produces a broadly evil assistant that recommends Hitler for dinner. They explain why "just turn it off" is not a serious answer once a system has goals, and why instrumental convergence on power and resources falls out of having almost any goal at all.
James and Mike then open the hood on how AE actually does alignment research: one-week agile sprints, vectoring meetings to find the highest-risk question, small-scale experiments designed to falsify ideas fast, and scaling curves from 100M up to 5B parameter pre-training runs aimed at convincing frontier labs to test methods at their scale. They also discuss AE's DARPA seedling and the broader thesis behind it: that the bottleneck in alignment is not ML engineers but researchers with good ideas, and that pairing general-purpose ML talent with researchers (including non-traditional ones, like Princeton neuroscientist Michael Graziano) can unlock work that would otherwise never see the light of day.
In this episode:
* The main branches of alignment research and how they overlap
* Why AE prioritizes neglected approaches over well-funded ones
* The METR time-horizon doubling curve and what it implies
* Persona drift, eval awareness, and why evaluating frontier models is hard
* Why RLHF is the canonical example of an alignment technique with capability upside
* How AE runs research as one-week agile sprints
* The scaling-curve strategy for getting frontier labs to adopt new methods
* The DARPA seedling and AE's model for scaling research through ML engineering talent
* Three ICML 2026 acceptances, including a spotlight paper
Learn more: ae.studio/alignment
AE Studio is hiring: https://www.ae.studio/join-us [https://www.ae.studio/join-us]
LinkedIn: https://www.linkedin.com/in/james-bowler-84b02a100/ [https://www.linkedin.com/in/james-bowler-84b02a100/]
Contact us: alignment@ae.studio