The Stateless Founder

Build a Minimal LLM Evaluation Loop That Catches Regressions While You Sleep

14 min · 18 de may de 2026
Portada del episodio Build a Minimal LLM Evaluation Loop That Catches Regressions While You Sleep

Descripción

BUILD A MINIMAL LLM EVALUATION LOOP THAT CATCHES REGRESSIONS WHILE YOU SLEEP THE PROBLEM: SILENT AI FAILURES When your website goes down, you get an alert. When Stripe breaks, payments fail immediately. But when your LLM starts producing worse outputs—slightly less accurate summaries, off-tone emails, JSON fields that are almost right—nobody tells you. The model doesn't throw an error. It just gets worse. For nomad founders managing AI workflows across time zones, this silent failure mode is especially dangerous. You're asleep, on a 12-hour bus in Peru, or doing a visa run in Bangkok while your content repurposing tool ships summaries that drop key facts. THE SOLUTION: A THREE-PIECE EVALUATION SYSTEM 1. GOLDEN TEST SETS (15-20 CASES PER OUTPUT TYPE) * Real production data only: Synthetic test cases test synthetic problems * JSONL format: One line per case, input paired with known-good output * Tagged for slicing: Formal tone, has PII, Spanish language, etc. * Three common types: Email rewrites, JSON extraction, content summaries 2. AI JUDGE PROMPTS (G-EVAL PATTERN) * Rubric-guided scoring: Analysis first, then scores per dimension * Cross-family judges: Generate with OpenAI, judge with Anthropic (or vice versa) * Blind randomized order: Prevents position bias * Four dimensions for email rewrites: Instruction-following, tone fit, clarity, PII leak check 3. PAIRWISE A/B TESTING * Compare prompt A vs prompt B: Not just absolute scoring * Randomized presentation: Judge sees outputs in random order * Tie-breaking: Borderline cases escalate to human review RELIABILITY MITIGATIONS JUDGE BIAS PROBLEMS * Self-preference bias: Judges favor their own model family's outputs * Position bias: Prefer whatever they see first or whatever is longer * Verbosity bias: Longer outputs score higher regardless of quality SOLUTIONS * Cross-family separation: Never use same provider for generation and judging * Human sampling: 10-20% of live production jobs reviewed weekly * Focus sampling: Pull cases where judge was least confident * 95% agreement target: If judge-human disagreement exceeds 5% for two weeks, recalibrate THE MONDAY SCORECARD (30 MINUTES WEEKLY) SIX KEY NUMBERS 1. Pass rate per output type: Email rewrites (90% threshold), summarization (88%) 2. Win rate from pairwise A/Bs: New prompt vs baseline 3. P95 latency: 95th percentile response time 4. Cost per 100 jobs: Token usage × per-token price 5. Judge agreement: Percentage alignment with human sample 6. Incidents: Anything that broke during the week DECISION FRAMEWORK * Roll forward: Pass rates stable, costs in line * Hold and investigate: Something dipped * Roll back: Model deprecation broke judge or generator IMPLEMENTATION TOOLS CI REGRESSION GATE * Promptfoo: Open source CLI with YAML config * GitHub Actions: Automated eval runs on every PR * Pass-rate thresholds: Build fails if quality regresses * Non-zero exit code: Blocks deployment automatically COST TRACKING * OpenAI/Anthropic APIs: Return token usage on every call * Real example: 4¢ per generation + 1.2¢ per judge call = $5.20 per 100 jobs * Alert thresholds: Catch cost spikes before monthly review MODEL DEPRECATION MONITORING * Pin model versions: Keep last two working versions in environment variables * Watch deprecation pages: OpenAI and Anthropic maintain lifecycle schedules * One-line rollback: Pinned configs enable instant reversion WEEKLY RHYTHM * Friday: Add 3-5 fresh cases from production traces * Sunday: Open PR with prompt/model changes, let CI run * Monday: Fill scorecard, make decision, assign one action item * Daily: Alerts on latency/cost thresholds catch spikes MONTHLY MAINTENANCE * Refresh golden sets: Replace stale cases with fresh production examples * Close stale failures: Archive resolved issues * Recalibrate judge: If agreement drops below 95% target START SMALL: THE ONE-OUTPUT-TYPE VERSION Don't try to build all three output types at once. Pick your highest-volume type, build 15 golden cases, wire up one judge prompt, run for two weeks. You'll catch things you didn't know were breaking. The full three-type system is the mature version. One type is the version that fits in an afternoon and still saves you from Monday morning client complaints. RESOURCES * Starter Kit: JSONL templates, G-Eval judge prompts, Promptfoo CI config * Monday Scorecard: Notion template with all six metrics * Deprecations Checklist: Model lifecycle monitoring guide * Human Sampling Guide: 10-20% review protocols ---------------------------------------- The vibes-based evaluation method works until it doesn't. When it doesn't, you find out from your customers. This system ensures you know before they do.

Comentarios

0

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de The Stateless Founder!

Prueba gratis

Empieza 7 días de prueba

$99 / mes después de la prueba. · Cancela cuando quieras.

  • Podcasts solo en Podimo
  • 20 horas de audiolibros al mes
  • Podcast gratuitos

Todos los episodios

26 episodios

episode Attach a Migration + License Addendum to Your Next SOW artwork

Attach a Migration + License Addendum to Your Next SOW

ATTACH A MIGRATION + LICENSE ADDENDUM TO YOUR NEXT SOW THE PROBLEM: EVERYTHING BECOMES "DELIVERABLES" Most nomad builders treat everything they deliver as one blob: "deliverables." Client pays, client owns deliverables. Done. But that includes: * Your prompt library that took a year to build * Connector templates you use across every client * Scoring models trained on your own data * Monitoring scripts and error-handling patterns All lumped together with the custom dashboard built specifically for their use case. THE SOLUTION: BACKGROUND IP VS FOREGROUND IP Background IP: Everything you brought to the engagement (pre-existing tools, libraries, templates, models) Foreground IP: Stuff created specifically for this client under this SOW The addendum says: client owns the foreground, you keep the background. But you license the background to the client so they can actually use what you built them. THE THREE-PART SOW ADDENDUM 1. BACKGROUND IP SCHEDULE A literal table listing every reusable component: * Component name, type, version, owner * License scope: "internal use only," "seat-based," "usage-based" * Takes ~20 minutes if you know your stack 2. LICENSE GRANT WITH THREE PRICING PATHS Seat-Based: Simple predictability * 5 users × $10/seat/month = $50/month * Right fit when access tied to named humans * Agent-assist tools, back-office dashboards Usage-Based with Caps: Value alignment without bill shock * Base fee + per-unit rate above threshold + monthly ceiling * Real-time usage meters so clients see exactly where they stand * Hybrid model accelerating in AI-powered features Revenue-Share: For outcome-tied modules * Percentage of attributable revenue + monthly minimum * Requires attribution rules in contract (last-touch, split, uplift) * Upsell engines, lead-gen tools, pricing optimizers 3. GUARANTEED DATA HANDBACK CLAUSE * Export client data (not your tools) in machine-readable format * 30-60 day window, deletion certificate provided * GDPR Article 28 already requires this for personal data * Changes negotiation: "your data is yours, my tools are mine" MIGRATION CHECKLIST: MAKING LICENSES CREDIBLE PHASE 1: PREP * Name owners, shared channel setup * Mirror environment with masked data * Schema diff: Source vs target, field by field * Rate-limit planning: Bulk endpoints, client-side throttling, exponential backoff PHASE 2: TEST AND CUT * Dry run on 1-5% of data, reconcile counts * Freeze period, final sync, switch DNS/keys/webhooks * Rollback triggers: Record mismatch threshold, sustained 500s, critical test failures * No heroics from hammocks in Gili Air PHASE 3: POST-CUTOVER * Reconciliation report signed by both sides * Observability on, legacy credentials cleaned up SWITCHING COSTS: VALUE, NOT HOSTAGE DYNAMICS The Calculator Inputs (from academic research): * Rebuild hours × blended rate * Integration rework time * Training hours by role * PM overhead * Opportunity cost per day of freeze * Contractual fees Key Principle: Share the math transparently. Walk clients through inputs, let them adjust numbers. Transparency separates value-based switching costs from hostage situations. REGULATORY CONTEXT * EU Data Act (2024): Pushing seamless switching between providers * GDPR Article 28: Requires data return/deletion at service end * Market trends: Hybrid pricing models rising, seat-only declining * Gartner research: Value enhancement drives loyalty, not switching costs RESOURCES Migration + License Addendum Playbook includes: 1. SOW addendum with all three pricing options 2. Background IP schedule template 3. Migration runbook (schema diffs, rate limits, rollback) 4. Switch-cost calculator with formulas KEY SOURCES * Terms.Law IP + Work Product Addendum Generator * AWS Prescriptive Guidance on migration cutovers * Maxio 2025 SaaS Pricing Trends Report * SEG 2026 Annual SaaS Report * Burnham, Frels, Mahajan switching cost typology ---------------------------------------- Next episode: Wednesday

29 de may de 202616 min
episode The 14-Day Partner Sprint: Feed-Drops, Mini-Templates, and the 15-Minute SLA artwork

The 14-Day Partner Sprint: Feed-Drops, Mini-Templates, and the 15-Minute SLA

THE 14-DAY PARTNER SPRINT: FEED-DROPS, MINI-TEMPLATES, AND THE 15-MINUTE SLA THE QUESTION THAT STARTED IT ALL Someone in Kira's Slack community asked: "I've done three collabs this year. A podcast swap, a newsletter mention, a joint webinar. Each one spiked traffic for like two days and then nothing. How do I make partnerships actually compound instead of just being one-off favors?" The answer: Stop treating partnerships like networking events. Start treating them like a systematic distribution channel. THE THREE MISSING PIECES Most partnership marketing fails because it's missing: 1. A shared asset that lives beyond the collab - not a moment, but something that keeps working 2. Tracking that tells you which partner actually moved the needle - so you can prove ROI and repeat what works 3. A response system - when someone shows up from a partner's audience, you answer in 15 minutes, not 15 hours THE 14-DAY PARTNER SPRINT SYSTEM PARTNER SELECTION: THE ADJACENCY TEST Use these five criteria to filter potential partners: * Does their audience overlap with yours (same job title, same problem)? * Do they cover topics within your top three themes? * Can you ship the collab async? * Is their engagement real (actual clicks and listens, not vanity followers)? * Is there a clear contact you can reach? Pass rate needed: 4 out of 5. If they only pass 3, the fit is too loose. Partner types to target: * Podcasters * Community admins * Tool companies * Agencies * Educators (newsletter writers, course creators) Target: 4 prospects in each category = 20 total on your shortlist Expected yes rate: 20-30% (plan for 70% rejection) THE ASSETS THAT ACTUALLY COMPOUND Feed-drops: A full episode from your podcast publishes directly in another podcast's RSS feed. Key requirements: * Host-voiced intro (20-30 seconds) * Talent reads outperform generic announcer reads by 3 points on purchase intent * Realistic conversion: ~0.67% device conversion (Chartable SmartPromos data) Mini-templates: One-page, co-branded assets that solve a specific problem for the partner's audience * Takes ~3 hours to produce * Gate with email for 7 days, then open up * Personalized assets drive 4x more demo requests than generic content (ON24 benchmarks) THE MEASUREMENT LAYER Wire three tracking systems from day one: 1. UTMs on every link * Source = partner name * Medium = channel type * Campaign = sprint month * Track in GA4: template view, template claim, demo intent 2. SmartPromos through Chartable * For podcast-to-podcast attribution * Tracks device conversion: did someone who heard the promo subsequently download your show? 3. Self-reported attribution * "How did you first hear about us?" dropdown on template gates and demo forms * Partner names in the options * Cross-reference against UTM data - when they disagree, trust the human THE 15-MINUTE SLA The setup: * Slack channel for any form submission with partner UTM or word "referred" * Make or Zapier automation (10 minutes to build) * Coverage blocks that overlap with your biggest partner's audience The target: 15 minutes to first reply (not to close) The message: "Hey, thanks for coming via [partner]. Here's a 15-minute fit check - pick a time." Why it matters: Harvard Business Review study shows responding within an hour makes you nearly 7x more likely to qualify a lead. Most nomads respond the next morning because they were asleep in a different time zone. THE SPRINT TIMELINE * Day 1: Build the list and wire the tracking * Day 3: Send 20 outreach messages * Days 4-6: Negotiate and produce assets * Days 8-12: Feed-drops and templates go live * Day 13: Pull numbers and send partners a 5-line recap with their stats * Day 14: Debrief, duplicate the board, load 5 new prospects for next sprint THE COMPOUNDING FLYWHEEL After the first sprint: * You have a proven partner and co-created asset * The partner knows you deliver * The asset has a landing page and tracking * Next sprint: skip prospecting for that partner, go straight to "what do we ship next?" * Add 2 new partners to the rotation Sprint progression: * Sprint 1: 2 partners * Sprint 2: 4 partners * Sprint 3: 6 partners Each tracked asset keeps collecting emails between sprints. WHY THIS BEATS COLD OUTREACH FOR NOMADS * Paid ads: Require budget and constant optimization * SEO: Takes months for results * Partnership marketing: Done this way, gives you signal in 14 days * Location independence: Every asset ships async, no Zoom calls required RESOURCES Get the complete 14-Day Partner Sprint Kit with outreach scripts, negotiation checklist, Notion calendar, UTM spreadsheet, and SLA routing setup at statelessfounder.com/resources [https://statelessfounder.com/resources] ---------------------------------------- Your one move this week: Build the 20-name shortlist. Run the adjacency test. If 4 pass, you're ready to sprint.

27 de may de 202613 min
episode Build a Three-Layer QA Wall for AI Outputs in 48 Hours artwork

Build a Three-Layer QA Wall for AI Outputs in 48 Hours

BUILD A THREE-LAYER QA WALL FOR AI OUTPUTS IN 48 HOURS Every AI deliverable you ship without quality checks is a bet against model drift, prompt degradation, and silent failures. This episode builds a three-layer QA wall that catches problems before clients do. THE COST OF NOT CHECKING * Human evaluation: $50 per case, 10 minutes * LLM judge evaluation: $0.02 per case, 16 seconds * At 1,000 cases/week: $50,000 vs $20 in evaluation costs LAYER 1: RUBRIC-SCORED LLM JUDGE Deploy an LLM judge against a weighted rubric before every deliverable ships: FIVE-CRITERIA RUBRIC * Task fulfillment (30%): Did it follow instructions? * Factual accuracy (25%): Are claims verifiable? * Clarity and structure (15%): Is it well-organized? * Style and brand fit (10%): Matches client voice? * Citations (10%): Proper attribution? * Safety flags (negative weight): PII leakage, hallucinations SCORING THRESHOLDS * Green (ships automatically): 0.8+ total, no critical flags, top two criteria 4+ * Amber (human edit queue): 0.7-0.8 total, or any criterion ≤2 * Red (blocked/escalated): <0.7 total or any critical flag RESEARCH BACKING * ICLR 2026 AutoMetrics: +33.4% correlation with humans vs direct LLM-as-judge * AAAI 2026 Think-J: Rubric-anchored judges more robust to noisy training data LAYER 2: GOLDEN-SET REPLAY AND DRIFT DETECTION Build a golden set of 40-60 items per output type, scored by humans with agreed-upon labels and rationales. WEEKLY CALIBRATION PROCESS 1. Replay golden set through your judge 2. Measure agreement using Cohen's kappa or Kendall's tau 3. Kappa >0.61 = substantial agreement 4. Track week-over-week trends 5. When agreement drops → pause auto-shipping and investigate DRIFT DETECTION * PLOS One 2026 study: Weekly Bradley-Terry recalibration achieved τ=0.59-0.68 vs humans * Detected three drift patterns: stable, improving, degrading * Without weekly replay, you're "shipping and hoping" GUARDRAILS AGAINST BRITTLENESS * Randomize position: Run both A-B and B-A orders (Chatbot Arena method) * Separate concerns: Rubric is workhorse, pairwise is tiebreaker * Never self-judge: Don't let GPT-4o judge GPT-4o outputs LAYER 3: HUMAN SAMPLING WITH RED/AMBER/GREEN THRESHOLDS Strategic 5-10% human sampling focused on risk and borderlines: SAMPLE COMPOSITION * 50%: Amber decisions (borderlines judge wasn't sure about) * 30%: High-risk greens (long outputs, safety-sensitive, new client styles) * 20%: Random greens (keep judge honest) DASHBOARD THRESHOLDS * Green: Judge precision ≥95%, human disagreement <10%, no critical flags * Amber: One metric slipped → raise cutline by 0.02, bump sampling to 15% * Red: Critical safety event, 2+ major misses in 50-item sample, or kappa <0.5 CLIENT VALUE PROPOSITION "Every output gets scored by a calibrated judge against a six-criterion rubric. Top performers ship automatically. Borderlines get human edit. Weekly 5-10% human sample with dashboard that updates every Monday." THE MONDAY DASHBOARD Five widgets for 30-minute weekly review: 1. Volume and mix: Items processed, percentage green/amber/red 2. Judge health: Agreement vs golden set with 4-week trend 3. Human QA metrics: Precision, disagreement rate, sample size 4. Risk flags: By type and resolution speed 5. Cost per eval: Track efficiency gains COST ANALYSIS: VISA RUN REVENUE MATH * Judge costs: $20/week for 1,000 items * Human sample: 50-100 items at $15-20/hour * Total QA cost: ~$350/week * vs Full human review: $50,000/week * ROI: If $350 prevents one client churn, pays for itself quarterly IMPLEMENTATION CHECKLIST THIS WEEK 1. Build golden set: 40 items from real output (good, borderline, bad) 2. Score manually: Create foundation for everything else 3. Schedule Monday review: 30 minutes on calendar NEXT WEEK 1. Deploy rubric-scored judge on new outputs 2. Set up weekly golden-set replay 3. Implement human sampling workflow RESOURCES The QA Wall Kit includes: * Rubric template with acceptance thresholds * Judge prompt pack (rubric + pairwise modes) * Human sampling SOP with R/A/G dashboard * Monday review checklist RESEARCH SOURCES * ICLR 2026 AutoMetrics: Rubric-style evaluators improve correlation by 33.4% * PLOS One 2026: Bias-calibrated LLM judges with weekly recalibration * AAAI 2026 Think-J: Generative judges outperform classifier-style approaches * UW Health Clinical Study: Cost/latency comparison of human vs LLM evaluation * TREC AutoJudge 2026: Live benchmark studying judge vulnerabilities and guardrails ---------------------------------------- Next episode: Judge fine-tuning vs off-the-shelf models for domain-specific QA

25 de may de 202612 min
episode Build a B2B Affiliate Program in 14 Days artwork

Build a B2B Affiliate Program in 14 Days

BUILD A B2B AFFILIATE PROGRAM IN 14 DAYS Most founders think the next hire they need is a salesperson. They're wrong. The next hire isn't a person at all — it's five partners who already have your buyer's attention and will send them your way for a cut of the revenue. IN THIS EPISODE Santi and Kira walk you through building a complete B2B affiliate program from scratch in just 14 days. You'll get the one-pager template, commission structures, UTM tracking setup, outreach email sequences, cross-border payout procedures, and compliance guidelines. KEY TOPICS COVERED * Referral vs Affiliate Partners: Why the distinction matters for your terms and enablement * Partner Tiers: Creator, Solutions, and Community tiers with different commission structures * The One-Pager: Six essential elements every partner needs to see * Commission Math: Recurring vs lifetime models with real examples from Webflow and Fathom * UTM Tracking: Simple Google Sheets setup for attribution without expensive tools * Compliance Basics: FTC 2023 updates, ASA requirements, and disclosure copy that works * Cross-Border Payouts: W-9/W-8 collection and PayPal/Wise batch payment setup * The 14-Day Sprint: Exact timeline from partner list to first demos KEY TAKEAWAYS 1. Start Small and Selective: Five hand-picked partners beat hundreds of random recruits — GoToMeeting got 725% more paid accounts by cutting partners, not adding them 2. Structure Recurring Commissions: Pay 30% for 12 months or 25% lifetime so you only pay on revenue you've collected, eliminating upfront risk 3. Bake in Compliance: Include disclosure copy directly in partner assets to meet 2023 FTC requirements that hold advertisers responsible for affiliate compliance REAL EXAMPLES * Webflow: 50% commission on first year subscription revenue through 500+ partners on PartnerStack * Fathom Analytics: 25% lifetime recurring commissions with simple PayPal payouts * GoToMeeting: 725% increase in paid accounts through focused partner recruitment and enablement THE 14-DAY SPRINT TIMELINE Days 1-3: Build prospect list (15 potential partners → 5), draft one-pager, pick commission model, create UTM sheet Days 4-6: Outreach sequence (4 emails over 12 days), track replies, send preview materials Day 7: Asset drop with unique URLs, disclosure copy, and creative kit Days 8-14: Activation, placement confirmation, first demo tracking, and payout queue setup RESOURCES * Referral Partner Kit: Complete template bundle with one-pager, terms, UTM tracker, outreach emails, payout SOP, and dashboard * FTC Endorsement Guides (2023): Updated disclosure requirements * IRS Publication 515: Cross-border withholding rules for affiliate payments COMPLIANCE NOTE We are not tax or legal advisors. This is operational guidance. Confirm everything with your accountant and legal counsel, especially for cross-border payments and disclosure requirements. ---------------------------------------- Ready to build your partner program? Download the complete Referral Partner Kit and start your 14-day sprint.

25 de may de 202614 min
episode YouTube SEO for B2B: Build a Search-Led Video Engine That Books Demos artwork

YouTube SEO for B2B: Build a Search-Led Video Engine That Books Demos

YOUTUBE SEO FOR B2B: BUILD A SEARCH-LED VIDEO ENGINE THAT BOOKS DEMOS THE ROMA NORTE DEMO STORY Kira's sitting in a Mexico City café when her phone buzzes - demo booked. The source? A 6-minute screen share video with 240 views titled "Make.com client onboarding automation, email plus Slack, free template." Not creative, but it answered the exact query someone typed when they had a broken onboarding flow. WHY SEARCH BEATS RECOMMENDED FEED FOR B2B YouTube's Search & Discovery team optimizes for viewer satisfaction and intent matching, not just clicks. When someone searches "Webflow to HubSpot auto-create MQL with UTM capture," they have a job to do today. They're not browsing - they're buying. The timing advantage: Google's 2025 ranking adjustments surface more video content across search results and AI summaries. Your YouTube videos now compound across surfaces you didn't even publish to. THE TEMPLATE CTA PATTERN Three B2B companies have perfected the conversion mechanism: MAKE.COM * Template library with "Get this template" buttons * One click clones entire automation scenarios * YouTube descriptions link directly to template pages * Template click = conversion event + account activation WEBFLOW UNIVERSITY * "Clone in Webflow" duplicates entire projects * Paired with tutorial streams * Stream teaches, cloneable converts AIRTABLE * "Use template" → "Add base" flow * Tutorial to template pipeline * Working base in your workspace instantly The key insight: Template CTAs provide zero-friction activation. Viewer gets value immediately vs. "book a demo" which requires timezone math and scheduling friction. BUILDING YOUR SYSTEM: THE 4-TIER INTENT MAP Tier A - "Do the job now" (highest intent) * "Airtable CRM score inbound leads and route to AE in ten minutes" * Person has pipeline problem today Tier B - Integration unblocking * Tools that unblock adoption of your solution Tier C - Evaluation * "Make versus Zapier for multi-step client onboarding" Tier D - Post-purchase fixes * Support and troubleshooting content 30-MINUTE TOPIC MAP PROCESS 1. List your 3 core jobs-to-be-done 2. Pick 1-2 tools your buyers already use per job 3. Generate 1 Tier A + 1 Tier B query per combination 4. Add 2 wildcards from C or D 5. Assign each to a week = 12-week map PRIORITIZATION CRITERIA (NOT SEARCH VOLUME) * Does a working template exist you can link to? * Can you screen-share the build in under 10 minutes? * Is it a known adoption pain point? If all three = yes, that's week one. THE WEEKLY CADENCE (5 HOURS TOTAL) Monday-Tuesday: Production (2.5 hours) * Pick buyer query from map * Confirm template link works * Record single-take screen share * Cut dead air, burn in captions Wednesday: Publish * Description template: benefit first line, template link second line * 5-8 chapters with timestamps * Pin comment with template link + common gotchas * End screen to specific next video Thursday: Repurpose (30 minutes) * Cut 2 Shorts (awareness only - links not clickable) * Write 1 LinkedIn post with video + template links * Use LinkedIn-specific UTMs Friday: Measurement (20 minutes) * Update tracker with UTM data * Compute demos per 1,000 views * Decide one thing to keep, one to change TARGET METRICS * CTR: 4%+ (YouTube's documented range is 2-10%) * Retention: 35% average view duration (internal target for 6-10 minute tutorials) * Conversion: Demos per 1,000 views (the one number that matters) THE DISCOVERY OBJECTION Objection: "You're leaving reach on the table by only targeting search." Response: Layer discovery on after building your search foundation. Use Shorts and discovery content to widen top of funnel, but long-form search videos carry the clickable template links and UTMs. Build the net before you drive the fish. MEASUREMENT THAT MATTERS Every template link gets UTM-tagged: * Source: YouTube * Medium: video * Campaign: date + query slug * Content: link placement (description, pinned comment, end screen) GA4 captures automatically. Mark template installs and demos as conversion events. Now you can see: this video drove 4 installs and 1 demo, that video drove 12 installs and 0 demos. The insight: A video with 80 views and 2 demos outperforms a video with 800 views and 0 demos. YOUR NEXT ACTION Pick your first buyer query. Not the most creative one - the most boring, specific, "someone is typing this into YouTube right now because they have this problem today" query you can find. Record 6 minutes. Link the template. Publish. RESOURCES Get the complete /t/youtube-seo-engine kit on the Resources page: * Topic map with 4 intent tiers * Script generator prompts * Description templates with chaptering * Repurposing SOP to Shorts and LinkedIn * UTM tracker wired to GA4 conventions The exact system we just walked through. Duplicate it and start your 12 weeks. ---------------------------------------- The Stateless Founder teaches digital nomads how to build location-independent businesses powered by AI and automation. New episodes Monday, Wednesday, Friday at 7 AM PT.

25 de may de 202615 min