High Output: The Future of Engineering

When Craft Meets Non-Determinism

39 min · 14 de may de 2026
Portada del episodio When Craft Meets Non-Determinism

Descripción

Superhuman built its reputation on a number: 100 milliseconds. Every interaction in the product has to feel instantaneous. Not fast. Instantaneous. That’s the threshold where the human brain stops perceiving lag and starts feeling like the software is an extension of thought. They’ve been engineering to that constraint for years, and it has shaped everything — the architecture, the hiring bar, the way even a billing email gets crafted like a product. Then they added AI. And for the first time, they were shipping something they couldn’t fully control. The feeling that built a company “Every single interaction needs to be below 100 milliseconds, because this is when you feel that things are instantaneous,” Loic says. The number didn’t come from a product spec. It came from game design. Rahul Vohra, Superhuman’s CEO, studied how games create the feeling of flow, and bet that people hate email because of how email works, not because email is email. The architecture follows from that constraint. Superhuman assumes the network will slow you down, so they build as if the network isn’t there — local-first, syncing in the background, optimistic UI throughout. “You need to build without a backend. How do you do that across multiple devices and make it crazy fast?” People pay $40 a month for email and feel it’s worth it. Their users — mostly executives and salespeople who average three hours a day in their inboxes — describe the experience the way people describe good tools: the software stops mattering and the work takes over. How taste becomes infrastructure Loic joined at the beginning of 2025 as an outsider. “I came in with genuine curiosity. I was blown away.” What surprised him wasn’t the rule but how thoroughly it had been internalized. “Even a backend engineer will think about the latency of their API and how this will reflect in the experience.” In most engineering organizations, backend engineers think about correctness and throughput. At Superhuman, they think about how the user will feel. It starts in hiring — product sense is a criterion for every role, not just product and design. The finance team applies the same scrutiny to the email a customer gets when they’re being told what they owe as the product team applies to the inbox. The offer letter is a product experience. “The offer is a ceremony. It’s not transactional — it’s already an experience.” Candidates who got that treatment show up acting like it. Rahul reviews everything going into production. “Within the organization, this is building a muscle in every single engineer, designer, product manager — everyone knows the bond is that high.” You can’t work at Superhuman long without developing an eye for when something feels off — a slightly slow animation, a misaligned pixel, an API call that’s a few milliseconds slower than it ought to be. Loic calls it sensation transference. Packaging changes how you experience the product inside. They take that idea seriously enough that the bill you get from the finance team is treated like part of the product. The part they can’t control For ten years, everything in Superhuman’s stack was deterministic. Same input, same output. That’s what made the 100ms promise keepable: you could engineer to it, measure it, hold it. AI broke that. “The consistency we were used to is not there anymore,” Loic says. “We all face the surprising change of behavior of a model that is technically not changing its version.” A model API doesn’t update its version number, but its outputs shift. The same query returns different results this week than last week. For most products, this is annoying. For Superhuman, it’s a more serious problem, because their users aren’t tolerant of inconsistency. “We are similar to Apple in the sense that people expect the best. They pay a bunch, so they always expect the best.” The specific problem is what happens when AI meets user-generated input. Superhuman can engineer every designed interaction. They cannot engineer how users phrase search queries. “We were controlling every single part of the interaction — feels fast, feels right, feels correct — and all of a sudden, the outcome of the search box is not what I was looking for. Garbage in, garbage out. But how do you control the garbage in?” There’s no bug to fix and no perf target to chase. The product was built on consistency, and now consistency is the thing they can’t fully promise. What the numbers don’t say Superhuman’s AI adoption numbers look good: 90% of engineers using AI daily, 70% of PRs AI-augmented, 90% of those interactions net positive, some engineers claiming 40% velocity gains. Loic is careful about how he explains this. The numbers work partly because of who their engineers are. “We have a very senior team — over-optimized on seniority. Those people tend to use AI with care. They know the outcome they want, and they just use AI to get faster to that outcome.” The 40% gains aren’t coming from code generation. They’re coming from everything before the code. “Coming into a new codebase, trying to understand what this library is doing — before, you had to find the entry point, map the dependencies, build your own mental model. Now Claude Code does that so much faster.” The win is in comprehension and orientation, not typing speed. But the same playbook doesn’t transfer automatically. “If you have a lot of junior engineers, vibe coding’s impact on code quality might be real. It’s not a problem for us — it’s not part of our DNA.” Taste filters the output. Senior engineers with strong judgment about what “right” looks like can catch what the model gets wrong. Engineers without that judgment can’t. Teams celebrating big AI velocity gains may be doing so because they have enough experienced judgment to catch the mistakes. Teams where most of the engineers are still building that judgment may be accumulating comprehension debt they don’t know about yet. The acquisition test The Grammarly acquisition tests the same question at a different scale: can Superhuman’s taste survive contact with mass distribution? Grammarly has the opposite profile. They’re embedded in Google Docs, Word, email clients, browsers. They have AI capabilities built over years of NLP work. What they’ve optimized for is breadth: supporting every kind of user, every context. Superhuman has been doing the opposite, going deep on one persona and refusing to compromise. Loic frames the challenge clearly: “How do we make Superhuman not this niche, very fancy application, but something brought to the mass — while keeping our identity?” He reaches for Apple as the reference point. “Learning from Grammarly’s scale and AI capabilities, keeping our culture and taste, and bringing that to the mass — that would be really interesting.” It’s a genuinely hard problem. Making things simple is hard. Linear built something delightful for small engineering teams, then got successful, then came the bigger companies, the feature requests, the complexity. The focus that made it work is what success makes hardest to maintain. What this means for you Superhuman is hitting a wall any product with a quality bar will hit. Three things their experience suggests are worth borrowing. Make your implicit promises explicit. Superhuman’s was 100ms and determinism — they had ten years of architecture built around it before AI made determinism optional. Most teams have a similar promise they’ve never said out loud: accuracy, consistency, availability, something. Find yours before the model finds it for you, because you can’t defend a contract you haven’t named. Treat the prompt box as a UX surface, not a backend problem. The moment that surprised Loic wasn’t a model bug — it was the search box. Users phrase queries badly. Prompts are now part of the interface the user sees, and “garbage in, garbage out” is no longer an engineering excuse. Better prompts and evals matter, but if the search box returns the wrong thing, the design team owns that, not the ML team. Don’t credit the tools for what your senior engineers are doing. Superhuman’s 40% velocity gains work because the people using AI know what right looks like and catch what the model gets wrong. If your team is junior, the same playbook will produce comprehension debt instead of speed. Once you can’t tell the tool’s contribution from the engineer’s, you’re not measuring AI productivity. You’re measuring how much taste you happened to hire. Loic spent time before tech in contexts where craft standards weren’t optional and the feedback was immediate — a French Navy vessel that had to be back at sea in six weeks, no extensions. The discipline from that kind of constraint is different from the kind you get from a style guide. You learn it because you have no choice, and then it doesn’t really leave. He thinks that’s what Superhuman has built. He’s been there less than a year. Whether the taste travels at Grammarly scale is the thing he’s actually being paid to find out. High Output is brought to you by Maestro AI [https://getmaestro.ai]. Loic’s AI numbers look good — 90% daily adoption, 40% velocity gains — but he’s the first to say the metrics don’t explain themselves. They work because his senior engineers have the judgment to catch what the model gets wrong. Most engineering leaders have no way to see that layer. You can see PR counts and cycle time. You can’t see whether your engineers are using AI well or just generating output faster. Maestro’s daily briefings reveal where your team’s time and energy actually go — not just what shipped, but the quality of the judgment behind it. Visit https://getmaestro.ai [https://getmaestro.ai] to see how we help engineering leaders understand what their AI adoption numbers actually mean. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit maestroai.substack.com [https://maestroai.substack.com?utm_medium=podcast&utm_campaign=CTA_1]

Comentarios

0

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de High Output: The Future of Engineering!

Prueba gratis

Empieza 7 días de prueba

$99 / mes después de la prueba. · Cancela cuando quieras.

  • Podcasts solo en Podimo
  • 20 horas de audiolibros al mes
  • Podcast gratuitos

Todos los episodios

16 episodios

episode Open the Barn Door artwork

Open the Barn Door

Twenty minutes into our conversation, I asked Charity Majors how engineering leaders should be finding good junior engineers right now. “God, I don’t fucking know.” She apologized, then doubled back. “Sorry. Excuse me. You do need them. They’re not hard to find.” That answer is the whole interview in miniature. How a junior breaks into engineering today, Charity will tell you, is genuinely unresolved. None of the paths that worked for her exist anymore. How an engineering org builds a healthy pipeline, on the other hand, is not particularly hard. The two questions sit next to each other, and she refused to collapse them into a tidy answer. Charity is the co-founder and CTO of Honeycomb, twenty years into the industry, two O’Reilly books behind her and the second edition of one in progress. Her career has been built on distributed systems — production engineering at Parse, then Linden Lab, then founding an observability company. But most of what she said over the next half hour was about people, and she came back to one idea four or five times: engineering teams are not social systems and they are not technical systems. They’re sociotechnical systems, and the way you reason about one shapes the way you have to reason about the other. Idaho Charity grew up in the backwoods of Idaho. No computers, no phone line for most of her childhood. She got to college on a classical piano scholarship and noticed something there. “People who studied music were still hanging out working minimum-wage jobs in their thirties, forties, and fifties. And I was like, I grew up being poor. I am not going to be a poor adult. And so I switched lanes.” She got into tech in the late nineties. “Any smart kid who is willing to work weird hours and try a lot of stuff could make a go of it.” She doesn’t romanticize that. Tech was a toy then, she said, and now powers nuclear power plants, so the bar going up is correct. But twenty years on, she’s worried about what’s happened to the door behind her. “I think we really risk it becoming the sort of ivory tower where we keep out anyone who has a non-traditional background. You need to think harder about crafting paths into technology to meet the moment.” I asked how she got into management. “I was a reluctant manager.” She drew a line between management and leadership before I could follow up. These are sociotechnical systems, she said, “they’re not social or technical, or we could just take the great managers from Starbucks and put them in charge of engineering teams.” The reason she ended up doing the job at all was anger. “I got into people management the same way a lot of people do, which was enraged, because I didn’t like the way it was being done. And I was like, *god damn it, I guess I will do it differently. I will not make any of these mistakes.* So I made different mistakes, of course.” The self-correction is constant in conversation with her. She said something close to it three more times over the next half hour. The freeze When I brought up the AI-killing-the-junior-pipeline discourse, she pointed to something specific. She’d just read a piece by Annie Lowrey in the Atlantic that morning. The Job Market Is Hell [https://www.theatlantic.com/ideas/archive/2025/09/job-market-hell/684133/]. Unemployment is around 4.7%, which is historically fine, but nobody is leaving their jobs and nobody is hiring. On both sides of the resume, AI is doing the talking. Recruiters feed inbound applications into screening tools. Candidates feed job listings into chatbots. “The result is there are no people talking to people. Nobody’s figured out how to do this.” The framing she rejected was the one that treats this as inevitable. “What I don’t like about the way people talk about bringing juniors into tech is they talk about it like it’s some force of nature that we have no control over, which is absolute horseshit. This is a world we create. It’s a world that we reinforce.” It’s a sequence of decisions made by people in rooms. And the people most responsible for those decisions, she would argue later, aren’t the ones the org chart suggests. Make friends with the discomfort Before she got to the operational claims, she walked me through what she thinks her generation of managers got wrong, because the failure mode shapes everything else. “My generation swung the other way and was like very rigorous about, *you should have work-life balance. Nobody should be pinging you after hours.*” The intent was correct. She was managing in reaction to the era of people sleeping under their desks. But she watched it overcorrect. “I see some managers being like, *you’re working more than 40 hours, stop.* And honestly, we live in a very complex, fast-changing world, and if you’re intrinsically motivated to be working, if you’re learning, if you’re having fun, nobody should be stopping you, because that really is the path to success.” She isn’t arguing for the swing back, either. I brought up 996, the Chinese nine-to-nine, six-days-a-week framing that’s been making the rounds on Hacker News. She had nothing nice to say about the swing-back. “It all swings back. It all swings back, doesn’t it?” Then, more bluntly: “That’s bullshit.” She read the cycle as a generational pattern, and she was harder on her own generation than on either pole. “If anyone had told me that, if I had followed that advice, I would not be where I am.” The piece of this that connects to junior hiring is the part most management writing skips. “You need to learn to make friends with the discomfort. You need to learn to find joy in the pain.” None of us, she said, evolved to handle data structures and algorithms, and the early years of an engineering career are genuinely agonizing. The juniors who make it through are the ones who learn to like the agony. A lot of senior engineers, looking back, have forgotten that they once lived through it. It’s a humanistic argument, not just an operational one. She talked for a while about school stamping out the curiosity children are born with. Twelve, twenty, twenty-five years of report cards, conditioning us to associate learning with extrinsic reward. What she loves about adulthood is the chance to rediscover the original instinct. Engineering is one of the few careers that pays you for it. 50 to 1 Her first operational claim was about team composition. “For every staff engineer that you have, let alone principal engineer, you need 50 intermediate engineers.” The number is a gesture. The shape of the argument is specific. Most companies have over-corrected toward senior hiring on the theory that they’ll get more leverage per dollar. The people who actually ship the bulk of features, she said, aren’t seniors. They’re intermediates. “Some of the most productive engineers that I’ve ever worked with have been intermediate engineers. They can just put on their headphones, beginning of the day, go deep, and just pound out the features and the bug fixes.” Heads down, pattern matching, finishing things. “Nobody who’s been in engineering for seven, ten years wants to do that. They’re sick of that.” The bored staff engineer is not a leverage win. “When people get bored, you do not get great work out of them. You get the best work out of people when they are working at that place that’s right on the edge of their ability.” And the supply chain only runs one direction. “Nobody stays a junior engineer for long, two years at most. So you’ve gotta keep feeding the system. You’ve gotta keep bringing new blood in.” Opening the barn door I asked what she’d recommend to companies that are paranoid about hiring right now. “I would advocate for opening the barn door a bit wider, giving more people a shot. Understanding that it means you will have to fire more of them. You will have to let more of them go. But I feel like it’s worse to never give people a shot.” The second half is the part she emphasized. A wider door costs you in faster, more honest performance management, and most engineering managers are bad at that part. “Nothing demoralizes a team more than when someone that they work with every day, who’s not pulling their weight, just hangs around forever.” The unsalvageable cases weren’t the ones that escalated. They were the ones that drifted. “Some of the most heartbreaking situations I’ve ever been in as a manager are when a person’s being let go after years of them doing exactly the same thing, and they’re legitimately dumbstruck.” There’s a side benefit she pointed out that I hadn’t considered. Junior engineers audit your systems in a way nobody else can. “If you’re an engineer joining a team where there is very low turnover, where people never join, where people never leave, that is not likely to be a very high functioning team either.” Old docs. Idiosyncratic mental models locked in three people’s heads. A dev environment that takes a month to set up because nobody’s tried in six. “If you’re used to bringing on junior engineers, oh boy, those kids will audit your systems like no one else.” That’s the sociotechnical argument in plain language. The team isn’t separable from the systems it owns, and the hiring policy isn’t separable from the operational health of the codebase. Both improve together or neither does. What she watches for in a junior The most optimistic moment came when I asked what she watches for in her own juniors. “Some of our junior engineers talk about how they are in conversation with Claude all day long. By the time they bring a question to their senior engineer, which they do very often, they have tried all the low-hanging fruit, they’ve tried a bunch of stuff, they’ve asked a lot of questions. So it is very well worth that senior engineer’s time.” That’s not the threatened-junior story most engineering leaders are telling right now. The juniors she described are using the model to exhaust the obvious before they ask, and arriving at the senior with the harder version of the question. I asked what the leading indicator is for a junior who’s going to make it. “Are they asking good questions? Are their questions getting better? Do they have a good sense of how to use their time and how to use their mentor’s time? That is the best leading indicator.” Not output. Not commit volume. Question quality, over time. She added, almost in passing, that her management chain handles the day-to-day evaluation. “I really trust Emily and all of them.” The broader discipline she described combines two things engineers tend to mistrust: the data, and the conversations. “It’s actually really important that there be data in addition to conversations, because the data and the conversations are bookends. They help you understand each other.” Lean on either alone, she said, and you get either a “people manager” with no technical judgment, or a manager who reads PR counts as a personality assessment. Both fail in different ways. Consent of the governed Near the end, I asked who she thought was actually responsible for fixing the junior pipeline. The pattern she described is counterintuitive. “The places that I know of that actually are successfully recruiting, hiring, bringing in junior engineers, and making them successful, it was *not* the engineering managers who pushed for that program. It was the senior engineers. They were the ones who were like: *we know what it takes to have a healthy, high-performing team. It takes a steady influx of new blood, and we feel this conviction so strongly that we’re gonna go make it happen ourselves.*” The senior ICs went to bat. The managers ran the mechanics afterward. Then the line that anchored the whole conversation: “There is no engineering leadership without the consent of the governed.” Charity has been an executive long enough to watch a lot of decisions get made about engineers, by engineers, with or around engineering management’s input. “If there’s anything that I have learned being in senior management, it’s how much power individual ICs have when they choose to flex it.” I asked whether she meant it literally. Were the senior ICs really the deciding force? She walked through the pattern again. The companies hiring juniors successfully are the ones where the senior engineers made it their problem. The ones not hiring are the ones where they didn’t. I came in expecting a programs-and-processes answer. Recruiting funnels, intern conversions, the mechanics of a pipeline. What I got back was about consent. The senior ICs in your org, the ones who don’t have manager in their title but have weight in every staffing conversation, are the people who decide whether the next generation gets in. Without their buy-in, no pipeline exists. With it, almost any pipeline works. The feedback loop of feedback loops I asked at the end what she’s working on now. She’s writing the second edition of *Observability Engineering*, and she was honest about how it’s going. “It’s not going super great.” She read the first edition recently and found it embarrassing, which is not how most authors I’ve talked to describe their own work. “But now I think my co-authors and I, we know who we’re writing for and we know what they need to hear.” Then she connected the book to the show in a way I wasn’t expecting. “It’s a true fact reality that high-performing engineering teams are about fast feedback loops, and observability is the feedback loop of feedback loops. It is the sense-making apparatus of engineering teams.” That landed for me. A lot of what she’d argued for over the prior half hour started looking like a feedback-loop argument. Open the barn door, but tighten the loop on managing out, so performance information moves fast. Watch question quality, because it’s a faster signal than output. Bring juniors in, because they shorten the loop on every undocumented assumption your team has accumulated. The senior ICs are the deciding force because they’re the only people positioned to keep all those loops short. A team is a sociotechnical system. The systems that team owns are sociotechnical systems too. The discipline of running both well is the same: short feedback, an honest signal, and the willingness to look at uncomfortable data. Charity’s question, the one I’ve been sitting with since we hung up: in your engineering org, who is actually deciding whether the next generation gets in? High Output is brought to you by Maestro AI [https://getmaestro.ai]. The thing Charity said that stuck with me most was about leading indicators. The junior worth investing in isn’t the one shipping the most code. It’s the one whose questions are getting better. The juniors using Claude well at Honeycomb are showing up to their senior engineers having already exhausted the obvious. That’s a different trajectory than the one most dashboards see. PR counts and cycle time can’t pick that distinction up. The work that builds judgment, or fails to, happens in the back-and-forth between an engineer and an AI agent, before any PR is opened. Your Anthropic bill tells you something is happening. Maestro tells you what. Maestro plugs into Claude Code and Cursor and looks at the work itself: how engineers scope a problem before they prompt, what they verify, what they accept on faith. Scored against shipped outcomes, not vibes. You can see which engineers are leveling up and which are accumulating comprehension debt. Visit https://getmaestro.ai [https://getmaestro.ai] to see how we help engineering leaders spot which engineers are developing real AI craft, and which are just generating more output. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit maestroai.substack.com [https://maestroai.substack.com?utm_medium=podcast&utm_campaign=CTA_1]

3 de jun de 202633 min
episode When Craft Meets Non-Determinism artwork

When Craft Meets Non-Determinism

Superhuman built its reputation on a number: 100 milliseconds. Every interaction in the product has to feel instantaneous. Not fast. Instantaneous. That’s the threshold where the human brain stops perceiving lag and starts feeling like the software is an extension of thought. They’ve been engineering to that constraint for years, and it has shaped everything — the architecture, the hiring bar, the way even a billing email gets crafted like a product. Then they added AI. And for the first time, they were shipping something they couldn’t fully control. The feeling that built a company “Every single interaction needs to be below 100 milliseconds, because this is when you feel that things are instantaneous,” Loic says. The number didn’t come from a product spec. It came from game design. Rahul Vohra, Superhuman’s CEO, studied how games create the feeling of flow, and bet that people hate email because of how email works, not because email is email. The architecture follows from that constraint. Superhuman assumes the network will slow you down, so they build as if the network isn’t there — local-first, syncing in the background, optimistic UI throughout. “You need to build without a backend. How do you do that across multiple devices and make it crazy fast?” People pay $40 a month for email and feel it’s worth it. Their users — mostly executives and salespeople who average three hours a day in their inboxes — describe the experience the way people describe good tools: the software stops mattering and the work takes over. How taste becomes infrastructure Loic joined at the beginning of 2025 as an outsider. “I came in with genuine curiosity. I was blown away.” What surprised him wasn’t the rule but how thoroughly it had been internalized. “Even a backend engineer will think about the latency of their API and how this will reflect in the experience.” In most engineering organizations, backend engineers think about correctness and throughput. At Superhuman, they think about how the user will feel. It starts in hiring — product sense is a criterion for every role, not just product and design. The finance team applies the same scrutiny to the email a customer gets when they’re being told what they owe as the product team applies to the inbox. The offer letter is a product experience. “The offer is a ceremony. It’s not transactional — it’s already an experience.” Candidates who got that treatment show up acting like it. Rahul reviews everything going into production. “Within the organization, this is building a muscle in every single engineer, designer, product manager — everyone knows the bond is that high.” You can’t work at Superhuman long without developing an eye for when something feels off — a slightly slow animation, a misaligned pixel, an API call that’s a few milliseconds slower than it ought to be. Loic calls it sensation transference. Packaging changes how you experience the product inside. They take that idea seriously enough that the bill you get from the finance team is treated like part of the product. The part they can’t control For ten years, everything in Superhuman’s stack was deterministic. Same input, same output. That’s what made the 100ms promise keepable: you could engineer to it, measure it, hold it. AI broke that. “The consistency we were used to is not there anymore,” Loic says. “We all face the surprising change of behavior of a model that is technically not changing its version.” A model API doesn’t update its version number, but its outputs shift. The same query returns different results this week than last week. For most products, this is annoying. For Superhuman, it’s a more serious problem, because their users aren’t tolerant of inconsistency. “We are similar to Apple in the sense that people expect the best. They pay a bunch, so they always expect the best.” The specific problem is what happens when AI meets user-generated input. Superhuman can engineer every designed interaction. They cannot engineer how users phrase search queries. “We were controlling every single part of the interaction — feels fast, feels right, feels correct — and all of a sudden, the outcome of the search box is not what I was looking for. Garbage in, garbage out. But how do you control the garbage in?” There’s no bug to fix and no perf target to chase. The product was built on consistency, and now consistency is the thing they can’t fully promise. What the numbers don’t say Superhuman’s AI adoption numbers look good: 90% of engineers using AI daily, 70% of PRs AI-augmented, 90% of those interactions net positive, some engineers claiming 40% velocity gains. Loic is careful about how he explains this. The numbers work partly because of who their engineers are. “We have a very senior team — over-optimized on seniority. Those people tend to use AI with care. They know the outcome they want, and they just use AI to get faster to that outcome.” The 40% gains aren’t coming from code generation. They’re coming from everything before the code. “Coming into a new codebase, trying to understand what this library is doing — before, you had to find the entry point, map the dependencies, build your own mental model. Now Claude Code does that so much faster.” The win is in comprehension and orientation, not typing speed. But the same playbook doesn’t transfer automatically. “If you have a lot of junior engineers, vibe coding’s impact on code quality might be real. It’s not a problem for us — it’s not part of our DNA.” Taste filters the output. Senior engineers with strong judgment about what “right” looks like can catch what the model gets wrong. Engineers without that judgment can’t. Teams celebrating big AI velocity gains may be doing so because they have enough experienced judgment to catch the mistakes. Teams where most of the engineers are still building that judgment may be accumulating comprehension debt they don’t know about yet. The acquisition test The Grammarly acquisition tests the same question at a different scale: can Superhuman’s taste survive contact with mass distribution? Grammarly has the opposite profile. They’re embedded in Google Docs, Word, email clients, browsers. They have AI capabilities built over years of NLP work. What they’ve optimized for is breadth: supporting every kind of user, every context. Superhuman has been doing the opposite, going deep on one persona and refusing to compromise. Loic frames the challenge clearly: “How do we make Superhuman not this niche, very fancy application, but something brought to the mass — while keeping our identity?” He reaches for Apple as the reference point. “Learning from Grammarly’s scale and AI capabilities, keeping our culture and taste, and bringing that to the mass — that would be really interesting.” It’s a genuinely hard problem. Making things simple is hard. Linear built something delightful for small engineering teams, then got successful, then came the bigger companies, the feature requests, the complexity. The focus that made it work is what success makes hardest to maintain. What this means for you Superhuman is hitting a wall any product with a quality bar will hit. Three things their experience suggests are worth borrowing. Make your implicit promises explicit. Superhuman’s was 100ms and determinism — they had ten years of architecture built around it before AI made determinism optional. Most teams have a similar promise they’ve never said out loud: accuracy, consistency, availability, something. Find yours before the model finds it for you, because you can’t defend a contract you haven’t named. Treat the prompt box as a UX surface, not a backend problem. The moment that surprised Loic wasn’t a model bug — it was the search box. Users phrase queries badly. Prompts are now part of the interface the user sees, and “garbage in, garbage out” is no longer an engineering excuse. Better prompts and evals matter, but if the search box returns the wrong thing, the design team owns that, not the ML team. Don’t credit the tools for what your senior engineers are doing. Superhuman’s 40% velocity gains work because the people using AI know what right looks like and catch what the model gets wrong. If your team is junior, the same playbook will produce comprehension debt instead of speed. Once you can’t tell the tool’s contribution from the engineer’s, you’re not measuring AI productivity. You’re measuring how much taste you happened to hire. Loic spent time before tech in contexts where craft standards weren’t optional and the feedback was immediate — a French Navy vessel that had to be back at sea in six weeks, no extensions. The discipline from that kind of constraint is different from the kind you get from a style guide. You learn it because you have no choice, and then it doesn’t really leave. He thinks that’s what Superhuman has built. He’s been there less than a year. Whether the taste travels at Grammarly scale is the thing he’s actually being paid to find out. High Output is brought to you by Maestro AI [https://getmaestro.ai]. Loic’s AI numbers look good — 90% daily adoption, 40% velocity gains — but he’s the first to say the metrics don’t explain themselves. They work because his senior engineers have the judgment to catch what the model gets wrong. Most engineering leaders have no way to see that layer. You can see PR counts and cycle time. You can’t see whether your engineers are using AI well or just generating output faster. Maestro’s daily briefings reveal where your team’s time and energy actually go — not just what shipped, but the quality of the judgment behind it. Visit https://getmaestro.ai [https://getmaestro.ai] to see how we help engineering leaders understand what their AI adoption numbers actually mean. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit maestroai.substack.com [https://maestroai.substack.com?utm_medium=podcast&utm_campaign=CTA_1]

14 de may de 202639 min
episode Stop writing code. Start reading it. artwork

Stop writing code. Start reading it.

We recorded this episode with Steve back in October of 2025, before he invented Beads [https://steve-yegge.medium.com/introducing-beads-a-coding-agent-memory-system-637d7d92514a] and Gastown [https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04]. Several of his predictions have aged well in the months since. Steve Yegge [https://steve-yegge.medium.com/] has been VP or head of engineering at four companies. He keeps stepping down on purpose. Not because things went wrong — his organizations were doing well. He’s the kind of leader whose reputation travels through a company; at Amazon, at Google, engineers lined up to transfer onto his teams. He stepped down each time because he noticed the same thing: the moment he stopped being able to code alongside his engineers, conversations started requiring translation. Once you’re in translation mode, Yegge figured out, you’re not leading anymore. You’re triangulating toward an answer you don’t fully understand. In the AI era, he thinks this problem just got much more expensive. The translation layer When Yegge handed over the engineering org at Sourcegraph — his fourth deliberate step-down in a career that spans Amazon, Google, and Grab — he gave a specific reason. “I was going through a translation layer with my engineers where they’d be like, ‘Well, you see the AI does this, and then I do that, and then the AI does that, and then there’s a gateway’ — and I’m like, what?” It wasn’t that he didn’t trust his engineers. It was that he’d lost the ability to sense-check them. And he’d noticed what happened to leaders who stayed in that position too long: “That’s a technique that non-technical leaders use. People who’ve lost their technical chops, they can still be effective leaders, but they have to be very good at triangulating, almost like a GPS on the right answer by going to different technical people and getting it.” Triangulation is better than nothing. But it’s slow, and it requires your engineers to speak in executive-friendly summaries, which means you’re always one abstraction layer removed from what’s actually happening. Yegge’s response has been consistent across his career: hand the org to someone ready to take it, go back to IC, get his hands back in the code. At Sourcegraph that meant 18 months as an individual contributor during the period when AI coding changed the most — which is exactly when he made the predictions that got Anthropic’s attention. His observation about himself is worth sitting with: his most accurate forecasts came during IC phases, not executive phases. Proximity to the work makes the signal cleaner. The “Otherwise” has arrived The case for technical proximity isn’t just philosophical anymore. Yegge has data. Andrew Glover, Director of Productivity at OpenAI, shared findings with Yegge and his co-author Gene Kim: at OpenAI itself, engineers who adopted Codex — their fully agentic CLI coding tool — are producing pull requests that, even accounting for higher rejection rates, “dwarf the contributions of the people who aren’t doing agentic coding by an order of magnitude. Ten times as many commits.” The interesting part isn’t the 10x number. It’s where the 10x is and isn’t happening. “The ones who are successful with agentic coding were the ones living in the microservices world, where there’s lots of small, well-factored bits of software. The ones who are struggling are the folks in ChatGPT Land, which is one of the world’s largest monoliths.” For a decade, engineers warned that monolithic codebases would become a liability — every warning came with an implicit otherwise at the end: refactor now, or else. But the or-else never arrived. You could run with a monolith indefinitely; deployment was easier, QA was simpler, everything just “floated off and got deployed somewhere.” The warning was technically correct but operationally optional. “You didn’t refactor it. And so what we’re faced with right now is this rat race where first of all, everyone who’s already in microservices land is just being pigs. They can use all the tokens they want. AI is working for them beautifully. The ones with monoliths — and you just point at any company and they have a monolith — it is time to break them up.” The otherwise, he says, has finally arrived. A 2025 METR study [https://addyo.substack.com/p/the-reality-of-ai-assisted-software] found that experienced developers were 19% slower when using AI tools on large, real-world repositories — the kind of environments where monoliths live. What Bezos actually understood about services Yegge built some of the original infrastructure that justified Amazon’s service-oriented architecture, so he has a view on why Bezos pushed it so hard in the early 2000s that most people don’t know about. It wasn’t primarily an engineering decision. “I heard this later from a colleague at Amazon. Jeff had come from D.E. Shaw on Wall Street, and D.E. Shaw is a company that buys companies and breaks them up and sells the pieces off for a huge profit. He was worried that Amazon was gonna die because of the dot-com bust. And so what he wanted to do, as a last resort, was I’m gonna bust Amazon up and sell the pieces. Which means every one of them has to have a service interface.” An exit strategy for a dying company accidentally created the architecture for a trillion-dollar one. Bezos wasn’t playing chess when everyone else was playing checkers — he was scared. The mandate came from a Wall Street M&A playbook, not a software architecture philosophy. Modular design was a byproduct of an exit strategy. The companies that invested in microservices over the past decade for code organization reasons are now discovering they got AI compatibility for free. The companies that didn’t are discovering the bill is coming due. The “Dial” Yegge has a name for the decision every engineering leader is quietly making right now: the Dial. “Every company has been given a dial that goes from zero to a hundred, and it is the number of engineers that you’re gonna fire in order to pay for the rest of them to have AI.” He’s not being glib. If a subset of your engineers can produce 10x the output with agentic tooling, and those tools require meaningful investment in compute and licensing, the question of headcount allocation is already embedded in your budget decisions. You’re turning the dial whether you’re thinking about it explicitly or not. Most companies aren’t thinking about it explicitly. Yegge thinks that’s a mistake. “Once you finally figure out how coding is done today — with Codex, with Claude Code, with Sourcegraph Amp — you switched into that world. You are playing in the big leagues and everyone else is falling behind.” The dial isn’t just about AI spending. It’s about what you believe your engineers will be doing in 18 months. Writing code is for agents Which brings Yegge to his single most concrete piece of advice: stop spending your energy on writing code. Start spending it on reading code. “You’re gonna be generating 10 to 100 times as much code as you ever did before, and you’re gonna need to read it at some point because you need to own it.” Addy Osmani [https://substack.com/profile/11623675-addy-osmani], VP of Engineering at Google Chrome, calls the alternative “comprehension debt [https://addyo.substack.com/p/the-8 +0-problem-in-agentic-coding]” — the accumulation of plausible-looking code you’ve approved without truly understanding, a debt that comes due when something breaks at 2am and you can’t trace why. The shift is real and immediate. Yegge has already made it. He describes his current workflow as watching his agents code — actually sitting there, following the diffs, paying attention to what they produce — rather than writing much himself. “Turn off permission checks so you don’t have to hit enter all the time and just watch it. Watch it code. Pay attention to the diffs.” The skill of reading code fast and evaluating it accurately — is this correct? Does this make sense architecturally? Would I defend this in a code review? — is what separates a developer who’s a good director of agents from one who’s just vibe coding at scale and hoping for the best. Yegge’s analogy: a musician who practices sight reading every day for 10 minutes compounds that skill faster than someone who only practices composition. The reading muscle and the writing muscle are different. For most developers, the writing muscle is heavily developed and the reading muscle isn’t, because historically writing was the job. That’s the ratio that’s inverting. What this means to you If you’re a leader who has drifted from direct technical work, the cost of that drift just increased. AI coding is changing fast enough that managing by summary will leave you making decisions you don’t understand. You don’t need to write the code — but you need to be able to read the diffs. Ask whether your codebase is AI-ready. Not “are we using AI tools?” but “can an agent work effectively in our codebase?” The answer is mostly a function of modularity. If your engineers are struggling to adopt agentic coding, the problem is probably architectural, not motivational. Have an explicit conversation with your leadership team about how AI changes the headcount math. Not as a cost-cutting exercise, but as a forcing function for getting clarity on what you believe your engineering team will look like in two years. Leaving this implicit means it gets decided by budget pressure instead. And if you’re an engineer: watch your agent work. Follow the diffs. Treat it like sight reading practice. The engineers who can evaluate agent output quickly — who own what the agent ships — will be the ones who remain indispensable as the generation overhead approaches zero. High Output is brought to you by Maestro AI [https://getmaestro.ai/]. Steve Yegge talked about the “translation layer” that forms when leaders drift from the code — but there’s a deeper version of that problem right now. Every engineering leader knows AI adoption is happening. What they can’t see is whether it’s working. Token counts and PR velocity tell you who’s generating more. They don’t tell you who’s actually using AI well. Maestro analyzes the AI sessions themselves, scoring how effectively each engineer is working with their tools — so you can see who’s genuinely leveling up and who’s just generating noise. Visit https://getmaestro.ai [https://getmaestro.ai] to see how we help engineering leaders measure AI effectiveness, not just AI activity. How are you thinking about the difference between AI adoption and AI effectiveness on your team? We’d love to hear your story. Schedule a chat with our team → https://cal.com/team/maestro-ai/chat-with-maestro [https://cal.com/team/maestro-ai/chat-with-maestro] This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit maestroai.substack.com [https://maestroai.substack.com?utm_medium=podcast&utm_campaign=CTA_1]

29 de abr de 202645 min
episode Principles Over Process with Gaurav Gargate artwork

Principles Over Process with Gaurav Gargate

Most engineering leaders spend enormous energy on process. Which agile framework. Which sprint cadence. Which AI coding tool to adopt. How to standardize workflows across teams. The assumption is that the right process produces the right outcomes. Gaurav Gargate [https://substack.com/profile/10954060-gaurav-gargate] has come to believe the opposite. Get the principles right, and the process can flex. Gaurav is VP of Engineering at Confluent, where he runs their Security Products and Cloud Platform powering their cloud-native data streaming ecosystem. He joined when the business was sub-$100 million; today it’s $1.1 billion. Before Confluent, he spent seven years at Box and six years at Microsoft. And before any of that, he started his career at a 15-person startup in India — “Didn’t know what we were doing, but it was fun.” Across all of those environments—from a scrappy team of 15 to a billion-dollar enterprise—one pattern has held: the organizations that thrive are rigid about their principles and flexible about everything else. The ones that struggle have it backwards. The Agile Dogma Aha Moment Gaurav has a specific story about when this clicked. Early in his career, he was a believer in classical agile—sprints, scrums, the full playbook. He thought it was the way to run engineering projects. Then he hired a leader who was completely aligned on the principles: execution pays the bills, work needs visibility and traceability, quality gates matter. But the process? Different. “Look, I don’t necessarily care about the book process, whether you call it agile or you call it scrum or something else. I would love to have the agency to ensure I manage and track my work. My engineers feel like they’re actually doing the best work of their life and there is quality gate and accountability.” Gaurav calls this a strong aha moment. “I realized I was being unnecessarily dogmatic in my approach. And actually this additional way of doing it opened up so many gates.” The lesson wasn’t that agile is bad. It was that confusing a specific process with the underlying principle is a trap. The principle—visible, accountable, high-quality execution—can be achieved multiple ways. Insisting on one process locks out people who could deliver the same outcomes through a different path. It closes doors you didn’t know existed. The constraint is real, though. “You don’t wanna have 30 teams have 30 different innovative ways.” There’s a phase where letting a thousand flowers bloom is the right move, and there’s a point where you need to converge on five or six archetypes. The art is knowing when you’re in which phase. Culture Add Over Culture Fit The same logic applies to hiring. Early in his career, Gaurav screened for culture fit—people who matched the team’s existing style. Over time, he realized this was the same mistake as the agile dogma, applied to people instead of methodology. “It’s actually a bad idea to have a very closed door—only follow this culture and nothing else.” When you hire exclusively for fit, you get a team that reinforces its own assumptions. The same instincts. The same blind spots. The culture calcifies instead of evolving. His alternative: hire for culture add. Find people who share your principles and values, but bring their own approaches and experiences. “New people join in, people grow in their roles, people from different companies and backgrounds and experiences come together—the beauty is that an evolving culture being held strong on the principles of the company actually makes it a success story.” The distinction is subtle but important: principles are fixed, culture is not. Values are the foundation. Everything built on top should be allowed to shift. Share the Why, Trust the How Gaurav applies the same framework to day-to-day management, and he sums it up bluntly: “The fundamental principle is to treat people like adults and they will behave like adults.” In practice, that means sharing context aggressively—where the business is going, how decisions get made, what the company needs right now—and then stepping back. “Enable them, let them have that agency to make those micro decisions as much as possible.” He’s not flexible about everything. Collaboration, one-team attitude, flat hierarchy, open communication—these are non-negotiable. “There are certain principles which I’m actually not ready to compromise on.” But beyond those fixed points, he lets leaders find their own style. “Ultimately what every strong individual or leader wants is to be held accountable for the outcomes and the results they deliver. And nobody likes to be micromanaged on how they get there.” Rigid on values. Flexible on methods. The same pattern, applied to management instead of hiring or methodology. The SDLC Tree Where this gets most interesting is how Gaurav applies the framework to AI adoption. His approach is different from the typical “push coding copilots” playbook—and the principle underneath it is the same one driving everything else. The principle: engineers should spend their time on high-value, creative work. The process for achieving that? That’s what changes. Gaurav looks at the entire software development lifecycle as a tree of workflows and targets the branches no engineer enjoys. “Especially as a cloud infrastructure company, there is a ton of work in operating, managing, keeping your infrastructure secure, scaling the business. There are a lot of things that AI can generally do well.” Confluent handles security patches and vulnerability management across three clouds and roughly a hundred regions. Infrastructure gets set up, tested, and torn down constantly. These are the branches AI is taking over completely—with engineers administering and managing rather than doing the work by hand. “Engineers actually love to do the innovation. They love to do the new problem solving. They love to have that ability to write new code in a way they feel is appropriate.” His conclusion follows directly: “I would love my engineers to actually have that mental space to invest their time in that high value work and let all the undifferentiated work be taken over completely by AI.” This is a fundamentally different framing from “AI makes engineers faster.” It’s not about speed. It’s about expanding what engineering teams can accomplish. “The pie is getting bigger. We gotta look at AI as a way to expand the pie of work that an engineer can do, not necessarily just what they were doing last year.” He invokes Jevons’ paradox—the idea that when something becomes more efficient, total consumption increases rather than decreases. Because it’s easier to build, more will get built. More demand, more opportunity, more roles. And his take on whether AI threatens engineering jobs is unequivocal: “Every role, every job category is going to change because of AI.” But change isn’t elimination. It’s the same transition the industry went through when cloud replaced data center ops. The people who understood first principles learned the new layer and kept going. The Fundamentals Don’t Change This is the thread that ties everything together. Principles endure. Process shifts. When Gaurav joined Microsoft, people questioned whether he was a real engineer because he didn’t write device drivers. “The previous generation did something at a lot lower level, and then the next generation is doing something at a different layer. That’s always been happening for decades.” But through those decades of transformation, the fundamentals haven’t changed. Understanding operating systems, databases, memory management—”the fundamental understanding of these core principles is what allows a great engineer to learn and pick up new things.” His advice to new graduates is the same advice he’d have given five years ago: focus on the fundamentals. “Learning new things has become easier. Building and experimenting has become a lot easier than before. If people can really spend time understanding the core fundamental building blocks of computer science, applying them to learn and build new things is actually gonna be easier going ahead.” The career lesson mirrors the organizational one. The engineers who thrive across generational shifts are the ones grounded in principles, not attached to any particular layer or tool. The organizations that scale from startup to $1.1 billion are the ones that hold their values tight and let everything else evolve. The leaders who get the most from AI are the ones who know which work matters and which work is just process. Same pattern. Every level. What This Means for You First, separate your principles from your processes. Gaurav’s agile aha moment came when he realized he was treating a specific methodology as a principle. Identify which of your team’s practices are genuinely non-negotiable values and which are just comfortable habits dressed up as requirements. Second, audit your hiring for culture fit vs. culture add. Are you screening for people who share your principles, or people who share your habits? The first builds a team that evolves. The second builds one that calcifies. Third, when deploying AI, map your SDLC and target the work nobody wants. Instead of asking “how do we code faster,” ask “which branches of our workflow tree drain engineers without engaging them?” Security patches, infrastructure provisioning, repetitive operations—these are the high-ROI AI targets that also free engineers to do the work that drew them to the field. Fourth, give context instead of instructions. If you want people to make good micro-decisions without being micromanaged, they need the same information you have. Share the why and how you measure the what—then trust them to figure out the how. The question worth asking your team: Are the things you’re rigid about actually principles—or are they processes you’ve held onto so long they just feel like principles? High Output is brought to you by Maestro AI [https://getmaestro.ai]. Gaurav talked about giving teams the room to deliver their own way. But when you stop prescribing process, you lose the visibility that process used to provide. You’re no longer watching how the work happens—so you need a way to see whether the work is landing. That’s what Maestro does. Maestro is engineering intelligence for AI-first teams: AI-powered analysis that measures the true impact of your team’s work, from code changes to review quality to team health. Stop flying blind. Start leading with signal. Visit https://getmaestro.ai [https://getmaestro.ai] to learn more. Building a team where autonomy and accountability coexist? We’d love to hear how. Schedule a chat with our team → https://getmaestro.ai/book [https://getmaestro.ai/book] This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit maestroai.substack.com [https://maestroai.substack.com?utm_medium=podcast&utm_campaign=CTA_1]

11 de feb de 202631 min
episode Why AI Productivity Gains Are Context-Dependent | With Raju Matta artwork

Why AI Productivity Gains Are Context-Dependent | With Raju Matta

Some engineering teams are seeing real, measurable AI productivity gains. Cursor is transforming how frontend developers build React apps. AI-assisted code review is catching bugs before deployment. Prototypes that took weeks now take days. But not everyone’s seeing the same results. Raju Matta [https://www.linkedin.com/in/raju-matta-4067a7/] runs engineering for Cambridge Mobile Telematics [https://www.linkedin.com/company/cambridge-mobile-telematics/]—200+ engineers, three countries, petabytes of real-time sensor data processing driver safety. Six months ago, he formed a tiger team to systematically track AI tool adoption. Status reports every two weeks. Multiple tools tested: Copilot, Cursor, PR review bots. His finding? “I’ve not seen the measurable velocity increase that people are saying out in the market—but that doesn’t mean I have totally written off LLMs yet.” This isn’t skepticism. It’s measured evaluation. And the pattern Raju’s seeing reveals something important about when AI tools deliver and when they don’t. Where AI Tools Excel As part of their evaluation, CMT ran an internal hackathon to see what AI tools could do in practice. The results told a clear story. Eighteen projects, all using AI. Teams built fully working web apps—complete with datasets—in 2-4 hours. “For that purpose, it’s great. It’s not bad at all,” he says. The pattern: AI coding tools work brilliantly for rapid prototyping with established patterns, web development using well-documented frameworks, mechanical coding tasks like boilerplate and test generation, and quick experiments to validate product ideas. These are real productivity gains. The people claiming 2x-3x aren’t exaggerating—they’re working in contexts where AI capabilities align perfectly with task requirements. When your bottleneck is writing React components or generating CRUD endpoints, AI tools deliver measurable acceleration. But CMT’s production systems are different. The Complexity Multiplier They’re processing petabytes of data from gyroscopes, accelerometers, GPS sensors, video streams. They’re distinguishing potholes from crashes, sharp corners from reckless driving. They’ve been using AI and machine learning for this work for 13 years—long before LLMs became everyone’s productivity obsession. The engineering challenge isn’t writing code. It’s architecting systems that handle sensor fusion at scale, debugging why clusters fail under load, ensuring accuracy when lives depend on your classifications, and managing tech debt across distributed teams in six countries. “You can outsource your engineering and coding with AI tools, but not your thinking,” Raju explains. In complex production systems, the thinking is where the time goes. Code generation helps, but it’s not the bottleneck. The productivity multiplier drops from 3x to “incrementally helpful” because the constraint isn’t in the typing—it’s in the architectural decisions, the system design, the understanding of how everything fits together. This doesn’t make AI tools useless. They still catch bugs in PRs. They still help prototype solutions. They still accelerate certain tasks. But the overall velocity gain is modest because code generation often isn’t the long pole. The Tiger Team Approach Here’s what makes Raju’s perspective valuable: he’s not guessing. Six months ago, CMT’s CTO gathered the engineering leaders. “How are you guys thinking of AI?” The response: treat it like a first-class citizen. They formed a dedicated tiger team. Three people producing status reports every two weeks on tool adoption, usage patterns, and measurable impact. “We have about three or four tools that we are using all the way from PR review tools to tools like Copilot, Cursor.” This is systematic evaluation, not anecdotal impressions. And the data shows results that differ from the market narrative: “My general experience is that it’s good, it’s doing its job, but I haven’t seen the measurable velocity increase as much as what people are saying out in the market.” His peer conversations confirm the pattern isn’t unique to CMT: “Even other leaders and my peers that I speak with, who are working at big tech companies, have said similar things. So it’s not uncommon.” But Raju’s not dismissing the technology. “The tools are progressing at a very fast pace. I wouldn’t be surprised if it’s another six months or a year where we get to exhaust more pieces of the tool and get more done.” That “yet” matters. He’s still tracking, still evaluating, still expecting improvement. When Mistakes Have Consequences When Raju says “we have to save people’s lives,” he’s not being dramatic. CMT’s technology directly impacts driver safety. Their telematics platform processes sensor data to detect dangerous driving, assess risk, and potentially prevent accidents. This creates a different bar for “move fast and break things.” “We are a little bit more diligent because at the end of the day, we have to save people’s lives. So for us, we’d rather spend the time beforehand than reactively trying to address it.” The stakes are high—both financially and ethically. When your technology directly impacts human safety, you can’t afford to ship fast and fix later. The constraint isn’t just technical complexity—it’s consequence of failure. “AI tools can take you north, but with the same speed, they can take you south.” In safety-critical systems, the review time, the testing time, the verification time doesn’t compress even if code generation does. You can’t ship and iterate rapidly when mistakes could harm people. The overall productivity gain shrinks accordingly because the non-coding portions of the development cycle remain unchanged. This applies beyond telematics. Financial systems. Healthcare platforms. Infrastructure control. Any domain where errors have serious consequences faces the same limitation: AI can accelerate code generation, but it can’t compress the necessary validation and testing cycles. Where AI Struggles AI’s limitations show up in unexpected places. CMT uses AI to filter thousands of resumes for each job opening. The results? “50% makes sense. And 50% don’t make sense.” This split illustrates a broader pattern. AI works brilliantly for well-defined, repeatable tasks. It struggles with judgment calls, context-dependent decisions, and situations requiring nuanced understanding. The tool saves time on mechanical filtering. But the judgment about who’s actually right for the role? Still human. And critically, the humans can immediately spot when AI recommendations miss the mark—they don’t trust it blindly. This mirrors the coding experience. AI generates boilerplate quickly. But understanding whether the generated code fits the broader system architecture, handles edge cases properly, and follows team conventions? That requires human judgment that doesn’t compress. Where This Leaves Engineering Leaders The mistake isn’t believing AI tools work—they demonstrably do in many contexts. The mistake is assuming your context will see the same gains as someone in a completely different situation. Raju’s systematic evaluation reveals the variables that matter: Your problem domain determines gains. Web apps and prototypes with established patterns can see significant productivity improvements. Complex distributed systems with unique requirements tend to see incremental improvements. The difference isn’t the tool quality—it’s how much of your bottleneck typically sits in code generation versus system design. Your constraint defines the impact. If implementing features is your rate-limiting step, AI delivers massive value. If architectural decisions and system design are your constraint, AI helps less. Most production systems fall into the second category after the initial prototyping phase. Your risk tolerance changes the math. If you can ship and iterate rapidly, AI accelerates that cycle. If mistakes have serious consequences, the review and testing time doesn’t compress proportionally. The overall velocity gain depends heavily on how much of your process can safely be accelerated. Your system complexity matters. Greenfield projects with established patterns see huge gains. Legacy systems with unique constraints and interconnected dependencies see modest gains. The complexity of your codebase directly impacts how useful AI-generated code becomes. The Honest Assessment Raju isn’t claiming AI tools are overhyped. He’s providing the nuanced reality: they work extremely well for specific contexts and deliver modest improvements in others. His 6-month tiger team experiment with dedicated tracking hasn’t found a productivity revolution. They’ve found incremental gains with clear constraints. That’s the honest number engineering leaders need for planning. “LLMs can help us experiment and prototype features faster. They can help developers catch mistakes in our pull requests. They can help us find answers faster, and we are constantly evaluating,” he explains. “But I’ve not seen the impact that people are saying out there.” This doesn’t mean ignore AI tools. It means understand your context, measure systematically, and set realistic expectations. For rapid prototyping and web development? The 2-3x gains are real. For complex production systems with safety requirements? The gains exist but are much more modest. Both can be true simultaneously—the difference is context. What This Means for You First, measure systematically rather than relying on anecdotes. Set up dedicated tracking like Raju’s tiger team—assign ownership, establish regular reporting, and gather actual usage data. The hype cycle around AI tools means everyone has an opinion, but data reveals what actually works in your specific context. Second, understand where your bottleneck actually sits. If architectural decisions and system design consume most of your time, AI tools will help less than if code generation is your constraint. Be honest about what’s actually slowing you down before expecting AI to solve it. Third, adjust expectations based on risk profile. If your domain allows rapid iteration and tolerable failure rates, AI tools can deliver significant acceleration. If mistakes have serious consequences, the non-compressible validation cycles will limit overall gains regardless of how fast code gets generated. Fourth, keep evaluating as tools improve. Raju expects capabilities to expand significantly over the next 6-12 months. Today’s limitations may not be tomorrow’s. But base your current planning on current capabilities, not projected future states. The question every engineering leader should ask: What’s actually constraining my team’s velocity—code generation or everything else? Because if it’s everything else, AI coding tools will help incrementally, not transformationally. And that’s okay—incremental gains compound over time. Raju’s measured approach provides the reality check the market needs. AI tools deliver real value, but the magnitude depends entirely on your specific context. Understanding that context is how you set realistic expectations and make smart adoption decisions. High Output is brought to you by Maestro AI [https://getmaestro.ai]. Raju talked about forming a tiger team to systematically track AI tool adoption with biweekly status reports—but that measurement challenge extends beyond just AI tools. When your 200+ person engineering team is distributed across four countries and multiple tools, it becomes impossible to see what’s actually happening without systematic tracking. Maestro cuts through that complexity with automated reporting and metrics and show where' your team’s time and energy actually go, so you can spot patterns and make data-driven decisions about everything from AI adoption to resource allocation. Visit https://getmaestro.ai [https://getmaestro.ai] to see how we help engineering leaders get actually useful insights into their teams. Running systematic evaluations of new tools and processes? We’d love to hear your approach. Schedule a chat with our team → https://getmaestro.ai/book [https://getmaestro.ai/book] This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit maestroai.substack.com [https://maestroai.substack.com?utm_medium=podcast&utm_campaign=CTA_1]

11 de dic de 202536 min