Ep.010 — Build vs Buy vs Rent: The AI Infrastructure Decision Tree for Startups

Beschrijving

Every AI startup hits the same wall eventually. The product is working, users are growing, and then the infrastructure bill arrives and nothing makes sense anymore. The question is not which model to use or which framework to build on. The question is where your AI actually lives, who owns it, and what happens to your margins as you scale. There are three positions available to you. You can build on hosted inference APIs, paying OpenAI or Anthropic or one of the cheaper alternatives per token. You can rent GPU compute by the hour from neocloud providers like Lambda, CoreWeave, or Crusoe. Or you can buy hardware and operate it yourself. Build, rent, buy. Three positions, very different economics at different scales. Intelligent Founder AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Most founders treat this as a one-time decision. It is not. It is a continuous optimisation problem. and, the right answer changes as your traffic grows, as GPU prices shift, and as your model strategy matures. H100 cloud rental rates have fallen from eight dollars an hour in early 2023 to around one eighty to three fifty per hour in mid-2026. API prices have fallen too, but unevenly - there is now a 640-times gap between the cheapest viable LLM API and the most expensive frontier option. That spread is enormous, and most founders are not actively managing it. The decision is driven by three variables: your utilization rate, your workload predictability, and your engineering capacity. High sustained utilization, predictable traffic, and a team that can operate infrastructure - when all three are true, owning your compute makes economic sense. When any one is missing, you want flexibility. Here is the number that changes how you think about this. Eighty percent of AI GPU spend is now inference, not training. That means your infrastructure choice is being made primarily for production workloads, not for training runs. And for regulated sectors like aerospace, transport, healthcare, financial services - where your data goes is not a preference. It is a legal requirement. This series runs eight episodes. We cover inference API economics, GPU rental markets, the on-premises case, open source versus proprietary models, AI FinOps, sovereign AI and compliance, edge inference, and the Nvidia compute wars story. Listen to the full episode here, in Substack app, or Apple, Spotify / youtube. I’ll add a companion cost calculator and spreadsheet at intelligentfounder.ai [http://intelligentfounder.ai] soon. Thanks for listening to Intelligent Founder AI! This post is public so feel free to share it. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.intelligentfounder.ai/subscribe [https://www.intelligentfounder.ai/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

Is Your Salary About to Come in AI Tokens? Ep.009 - The Token Economy

Jensen Huang had a week. well, I had a week too, I was traveling so missed entire last week here. Sorry. but I am back now. so lets go back to Jenson who comparatively had more? fun. definitely more interesting. The NVIDIA CEO spent seven days dropping statements at his own GPU Technology Conference, the Morgan Stanley TMT Conference, and finally on the All-In Podcast and by Thursday the internet had a new topic it couldn’t stop arguing about. The clip that went viral? Huang said and I quote. Jensen Huang: “We’re trying to.” “Let me give you the thought experiment: Let’s say you have a software engineer or AI researcher and you pay them $500,000 a year. We do that all the time.” “That $500,000 engineer, at the end of the year, I’m going to ask them, how much did you spend in tokens?” “If that person said, ‘$5,000,’ I will go ape… something else.” “If that $500,000 engineer did not consume at least $250,000 worth of tokens, I’m going to be deeply alarmed. He confirmed NVIDIA is trying to spend $2 billion annually on tokens for its engineering team. He compared engineers who don’t use AI to chip designers still insisting on paper and pencil instead of CAD software. The take was everywhere by Friday. But most of what was written about it, including the Reddit thread with 937 upvotes and 362 comments, only captured half the story. Here’s the full picture. 🎙️ In this podcast episode, we go deeper on everything above, including. * 🔁 The Jevons Paradox explained from scratch — why cheaper tokens always means more spending, not less * 🤖 What an AI agent actually is — and why it consumes 1,000x more tokens than a simple question * ⚠️ Goodhart’s Law in practice — how token burn rate becomes a metric engineers will game * 💼 The 4th pillar of compensation unpacked — what tokens as pay actually means for your financial security * 🌍 The offshoring disruption nobody’s talking about — why flat token costs globally are reshaping hiring maths * 🏢 What SK Telecom did right — and why their model is the one worth copying If you prefer to read? Here’s the breakdown from a 360 degree perspective. What’s a Token, and Why Does It Cost Money? A token is the unit of measurement for AI processing. Every word you type into an AI, every word it writes back, broken down into fragments called tokens. A sentence is roughly 20 tokens. A full document might be several thousand. Every time you run an AI model, tokens are consumed, and tokens cost money. For a simple ChatGPT query: roughly 1,000 tokens. For a research pipeline: 5,000–50,000 tokens. For an AI agent that runs autonomously » searching, coding, testing, iterating, without you pressing a single button, we’re talking hundreds of thousands of tokens per run. A fleet of agents running continuously? Billions of tokens per day. This distinction between “I asked the AI a question” and “the AI is working for me around the clock” is the entire foundation of Huang’s argument. He’s not imagining engineers typing prompts. He’s imagining engineers deploying autonomous AI workforces. Intelligent Founder AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. The Actual Thesis: Tokens as the 4th Pillar of Compensation Before Huang went viral, VC Tomasz Tunguz of Theory Ventures had already been quietly building this framework. His argument? AI inference is becoming the fourth component of engineering compensation, alongside salary, bonus, and equity. His numbers? a $375K engineer with a $100K token budget has a $475K total package. That token budget doesn’t vest or appreciate, but it enables leverage that no previous tool budget could match. Huang scaled this up to a mandate: a $500K engineer should be consuming at least $250K in tokens. Across NVIDIA’s engineering workforce, that’s a $2B annual token spend, which the company confirmed it’s actively pursuing. The framing is deliberately recruiting-adjacent. “Engineers are now asking ‘what’s my token budget?’ when evaluating offers,” Huang said at GTC. Whether or not this is universally true yet, it’s becoming true fast. [ In this newsletter you get sharp, unfiltered short essays; for full‑length, deep‑dive analysis on AI, subscribe to our companion publication, Intelligent Founder AI. ] The Conflict of Interest Is Real (But Incomplete) The Reddit critique was blunt and structurally correct: NVIDIA sells the GPUs that generate the tokens. Every dollar your engineers spend on tokens flows back, eventually, to GPU demand. Mandating token consumption at scale is demand creation by the person selling the supply. The HP printer analogy made the rounds: “HP would be deeply annoyed if its $200 printer didn’t use $600 of ink.” The Oreo CEO comparison: “Oreo cookies are as important as oxygen.” These are crude but fair. But they’re only half the story. The Jevons Paradox » An economic principle from the 19th century, explains what’s actually happening. When coal-burning technology improved and coal became cheaper, total coal consumption exploded, because efficiency unlocked entirely new applications. The same dynamic is at work with AI tokens: costs have dropped 150x since 2021, yet enterprise inference spending grew 320% in the same period. Cheaper tokens unlock agentic use cases that weren’t viable at higher prices. Agentic use cases consume tokens at orders of magnitude greater scale than simple queries. Total demand surges even as unit cost falls. This is the engine behind NVIDIA’s $1 trillion infrastructure forecast through 2027, and its $215.9B in FY2026 revenu, up 65% year on year. Huang is selling his product and accurately describing a structural shift. Both things are true. The Goodhart’s Law Problem When a measure becomes a target, it stops being a good measure. If you tell your engineers to hit $250K in token spend, some will ask “how do I produce the most value?” and some will ask “how do I hit the number?” The second group will run unnecessarily complex pipelines, use expensive frontier models where a cheaper fine-tuned model would do, leave agents running on idle tasks, and avoid caching that would make them more efficient. The technically correct objective is the inverse of what Huang is incentivizing » token minimization per outcome. Good AI-native engineering means squeezing maximum value out of minimum compute through smart model routing, prompt compression, caching, batching. Measuring raw token volume actively penalizes these skills. The metric that actually matters: Token ROI Ratio value created per dollar of inference consumed. A 10:1 ratio ($10 of revenue per $1 of tokens) is the kind of benchmark forward-looking engineering teams are building toward. That’s the measure worth adopting. The Headcount Question Nobody Is Saying Out Loud Here’s what the viral debate mostly avoided. If a token budget approaching a salary starts to become standard, CFO might l eventually ask? at what token-to-headcount ratio does the compute do enough work that we need fewer humans? And They’re already answering that question. Microsoft cut 15,000 jobs last year while committing $80B to AI infrastructure. Crypto.com laid off 12% of staff in March while revenue was growing, citing AI handling high-volume work. Block cut nearly half its workforce. Around 55,000 US tech layoffs in 2025 were directly attributed to AI-driven restructuring. Huang’s own roadmap puts NVIDIA at 75,000 employees working alongside 7.5 million AI agents? a 100:1 ratio. The “token budget as perk” framing is the friendly version of this story. The CFO version is considerably less friendly. What Smart Founders Should Do With This * Track tokens against outcomes, not as a standalone KPI. Build the denominator: what did $100 of tokens produce? A feature, a resolved ticket, a market analysis? The ratio is the signal. The volume is noise. * Treat token budgets in comp negotiations the same way you’d treat unusual equity terms. Does it vest? What happens if you leave? What’s the cash equivalent? A large non-compounding asset can obscure what you’re actually being paid. * The Jevons Paradox is your tailwind if you’re building on inference infrastructure. Costs will keep falling. Agentic deployment will keep expanding. Products that reduce token waste per outcome, or amplify team output per token consumed, are in a structurally strong position for the next three to five years. * The token economy is real. Huang is both selling chips and describing a genuine transition. The job is to understand which is which — and build accordingly. Thanks for reading Intelligent Founder AI! This post is public so feel free to share it. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.intelligentfounder.ai/subscribe [https://www.intelligentfounder.ai/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

22 mrt 202615 min

Q1 2026 AI Reality Check. What’s Actually Working and What not - Ep.007

Thursday’s deep dive took longer than expected. the agent war moved fast this week and I’m still debating whether to approach it from the OpenClaw architecture angle or the OpenAI acquisition angle, which arguably makes Anthropic the biggest loser in recent AI history. We’ll see how that plays out soon. The community is as usual divided. There’s real backlash over open-source ownership now that Steinberger is inside OpenAI, which is valid. so nothing new in open source, honestly but what’s interesting is that the backlash has increased activity in the space. Forks like ZeroClaw and PicoClaw are already gaining traction, and if nothing else, the acquisition seems to have lit a fire under independent developers to build harder and faster. Mac Minis are selling like hotcakes because Andrej Karpathy said he bought one. Well, you can absolutely set up an OpenClaw agent for half that cost, but Macs are cool, so I say go for it. On the enterprise and rivals front, not much to report this past week beyond the usual scrambling and fumbling coverage. So instead of chasing that noise, let’s do something more useful. and I’ll come back to this either in next post or a bit later. Now, I was going through my usual linkedin feed [https://www.mckinsey.com/featured-insights/week-in-charts/agentic-ai-advances?hsid=dcd98a76-8dbe-4844-add5-b2a437e3b1a3] this morning and saw the McKinsey’s latest post, with basically 3 rehashed themes from its state of AI’25 report - * scaling is harder than experimenting, * governance matters, * agentic AI is next and few sharper points like sub-millisecond multi-agent orchestration, process context layers for agents, but again these are known issues. and so I thought let me pull down the Q1 reports and see if anyone is actually addressing the gaps or starting new conversations. and that's exactly what we’re covering in this podcast - 📊 Reports analyzed: McKinsey State of AI ‘25 | Deloitte Enterprise AI ‘26 | Stanford AI Index ‘25 | Gartner Predicts ‘26 | NVIDIA Telecom AI ‘26 - Plus: Orgvue Workforce Survey ‘25 and TechCrunch Enterprise VC Survey ‘26 The Headline Numbers first - 88% of organizations now use AI. 62% are experimenting with agents. But only 23% are scaling. And only 6% see real EBIT impact. That’s the funnel. 88 goes in, 6 comes out. Deloitte surveyed 3,235 leaders and found the same wall bur from a different angle. * Only 25% have moved 40% or more of pilots into production. * 74% plan agentic AI within two years. But only 21% have governance ready. and Talent readiness? Just 20%. Gartner’s counter-narrative is actually brutal 40%+ of agentic AI projects will be scrapped by end of 2027. Only about 130 of thousands of “agentic” vendors are genuine. The rest is agent washing. Stanford confirmed the tech barrier is collapsing inference costs dropped 280x in two years. But the organizational barrier? That’s the one that remains. What People Are Actually Saying? On Reddit, practitioners called Gartner’s 40% “generous”, one commenter put it quite bluntly saying: “This would mean lower failure rate than implementing a new CRM.” On LinkedIn, someone reframed McKinsey’s data as: “We spent $47M on AI. Nothing’s different.” The Deloitte governance gap » 74% planning agentic vs 21% ready, got called “a collision course, not a strategy.” Honestly, I agree. The reports are measuring adoption when they should be measuring operational readiness. Those are two very different things. Fair counterpoint though: one Reddit commenter noted Gartner has its own reason for pessimism. their business is being disrupted by AI too. Even the analysts have skin in the game. Quite right actually. What’s Actually Failing vs. What’s Actually Working? So the failures have names now: * Klarna replaced 700 jobs with AI, then rehired humans after quality dropped 22% * McDonald’s killed their AI drive-through after 3 years, the system rang up 260 McNuggets and added bacon to ice cream * Air Canada was held legally liable for a chatbot that invented a fake refund policy * 55% of companies that replaced workers with AI now say it was a mistake Every failure shares one trait: AI bolted on without workflow redesign. But the wins are real and where scope is narrow: * Insurance claims: 245% ROI on structured, well-defined tasks * Revenue leakage detection: $5.7M retained, cost less than one senior hire * Sales forecasting: accuracy jumped 63% to 85%, deal slippage down 28% * Customer support (done right): 55% tickets resolved autonomously, costs down 32% The pattern? Tightly scoped. Domain-specific. Clean data. Human escalation built in. Boring? Yes. Profitable? Absolutely. The Spend Paradox Global AI spending is projected to hit $2 trillion in 2026. But VCs predict vendor consolidation so more money but through fewer vendors. “Budgets will increase for a narrow set of AI products that clearly deliver results and will decline sharply for everything else.” But here’s the Gap Nobody’s Naming! Here’s what none of these reports actually addressing and thats the infrastructure visibility problem. Agents are being deployed on top of systems where operators can’t see the majority of what’s actually happening. The reports talk about adoption. Practitioners talk about failure. But almost nobody talks about the plumbing between “it works in a notebook” and “it works in a live production environment.” The 6% who are winning aren’t winning because they picked the right model. They’re winning because they built the operational backbone » orchestration, governance, and infrastructure that lets agents actually run in production, not just in a demo. Practical Implementation that reports aren’t covering. The Adoption isn’t the problem. Almost everyone has adopted, a little or more. (88% according to McKinsey of course). But I wanted to look at some AI deployment at scale examples because the reports data looked more theoretical otherwise. so here is what I found, and these are not part of any of the reports we are talking about. Won:Morgan Stanley (AI advisor assistant), Accelirate/UiPath (insurance claims), Anysphere/Cursor (AI coding), SK Telecom + Samsung (AI-RAN), Telecom sector broadly (autonomous networks). Lost:Klarna (fired 700, rehired humans), S&P Global’s 42% graveyard (enterprises scrapping initiatives), MIT’s 95% (zero P&L impact across $44B in investment). THE PATTERN Every winning example shares the same DNA: * Narrow, well-defined task (not “enhance productivity”) * Workflow redesigned around AI (not AI added to step 7) * Clean, structured data or proprietary data advantage * Measurable financial outcome tied to the deployment * Human-in-the-loop where judgment matters Every failure shares the opposite: * Vague goal (”improve efficiency”) * AI bolted onto broken processes * No governance before scaling * No KPI tracking * Fired people before understanding impact The pattern is the same every time, the winners redesigned the workflow before deploying, the losers bolted AI onto what was already broken. In simple language, the winners deployed AI into production, embedded it into core workflows, and got measurable business outcomes (revenue, cost savings, ROI). The losers adopted AI, ran pilots, and never made it past the proof-of-concept stage, or deployed it recklessly and had to reverse course. The Bottom Line AI adoption is universal. AI value capture is not. The technology has arrived. The organizations haven’t. 2026 won’t be the year AI transforms everything. It’ll be the year the shakeout begins, vendor consolidation, governance debt coming due, and pilot graveyards getting cleaned out. The next frontier isn’t a better model. It’s physical AI, sovereign infrastructure, and agentic orchestration at the edge. The winners won’t be those with the best algorithms. They’ll be those with the best plumbing. The full podcast digs deeper into all five reports and the gaps between them. Listen to the full breakdown on the Intelligent Founder podcast. Subscribe so you don't miss what comes next. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.intelligentfounder.ai/subscribe [https://www.intelligentfounder.ai/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

22 feb 202618 min

Ep.010 — Build vs Buy vs Rent: The AI Infrastructure Decision Tree for Startups

Beschrijving

Reacties

Probeer 14 dagen gratis

Alle afleveringen