The metric Stitch Fix says every experimenter should chase

Description

Summary In this episode of The Experimentation Edge, GrowthBook CMO Ashley Stirrup sits down with Nick Beyler, data science manager at Stitch Fix, where he leads the decision and insights team and owns the company's internal experimentation platform. Nick shares why the metric he most wants is the one he can't measure yet, a North Star that predicts a client's long-term value from their earliest behaviors, and why the most impactful experiment learnings tend to come from adoption friction rather than product bugs. He makes the case that if you're only testing winners you're not taking enough risks, explains how guardrails make that risk safe, and looks ahead to a new in-house platform and the promise of agentic AI. It's a practical, statistician's-eye view of experimentation for product managers, data scientists, and engineers building serious testing programs. Chapters 00:00 Cold open and welcome to the show 01:45 What Stitch Fix actually does 04:15 Balancing AI with the human stylist 05:15 From public policy to the A/B testing adrenaline rush 07:15 Inside the weekly experimentation review group 08:45 The AI style assistant and listening to qualitative feedback 10:45 Why adoption friction beats product bugs 13:45 Testing for losers and building guardrails 15:45 Keep rate, successful fixes, and the holy grail metric 18:15 The new platform and the promise of agentic AI Takeaways * The most impactful experiment learnings usually come from adoption friction, not product bugs. By the time a big feature reaches A/B testing, it's often already a winner, so the open question is how and where to introduce it. * A losing test is a finding, not a failure. If every experiment wins, you're not taking enough risk to learn anything new. * Guardrails and stopping criteria are what make risk-taking safe, especially when the experience is as personal as shopping. * The most valuable North Star metric is the one you can't measure yet, long-term client value, and causal-inference modeling helps predict it from short-term behavior. * Quantitative results are only half the story. Direct, qualitative client feedback inside an experiment often reshapes the rollout more than the numbers do. Connect with the Guest LinkedIn: https://www.linkedin.com/in/nick-beyler-381864119/ [https://www.linkedin.com/in/nick-beyler-381864119/] Website: https://www.stitchfix.com [https://www.stitchfix.com] Sponsor GrowthBook is the warehouse-native platform for experimentation, feature flags, and product analytics trusted by AI-native product teams at 3,000+ companies worldwide. Go to http://growthbook.io [http://growthbook.io?utm_source=edge-podcast&utm_medium=podcast&utm_campaign=episode-25]

How Fin went from weeks to hours of analysis using AI

Summary In this episode of The Experimentation Edge, host Ashley Stirrup sits down with Raunak Kumar, senior manager of GTM analytics at Fin (formerly Intercom), to unpack how experimentation actually works when the data is messy and the traffic is thin. Drawing on nearly 12 years in marketing analytics across Atlassian, Stripe, and Fin, Raunak explains how AI tools like Claude Code have collapsed analysis from weeks to hours and freed his team to clear its experiment backlog, why declining organic search traffic and a 5x jump in untagged ChatGPT referrals are forcing teams to rethink attribution, and how the most valuable experiments are often the ones that "lose." From a Jira Service Desk bundling test that won on trials but had to be rolled back, to a Stripe contact form that was quietly blocking real buyers, this conversation is a practical guide for product managers, engineers, data scientists, and growth marketers who want to learn more from every test they run. Chapters 0:45 Welcome and what the show is about 1:45 Raunak's role and 12 years in marketing analytics 2:45 How AI and Claude Code changed the analyst's day 4:15 LLMs, declining organic traffic, and the 5x ChatGPT jump 5:15 Two kinds of experiments at Fin: on page and off page 7:15 The Jira Service Desk bundling experiment 10:45 Why the trial winner became a rollback 11:45 Contextual onboarding turns the loser into a winner 14:45 Reading an experiment that loses 18:45 What's next: incrementality, connected TV, and testing creative Takeaways * AI has collapsed marketing analysis from weeks to hours, and the real payoff is a cleared experiment backlog plus analysts who compete on the questions they ask, not the speed they query. * Organic search traffic is declining as ChatGPT, Gemini's AI mode, and Claude answer buyers in place; Fin saw a 5x rise in ChatGPT referrals, but LLMs don't tag that traffic, so attribution has to be proven through experiments. * A guardrail metric saved Atlassian from a costly mistake: bundling Jira Service Desk lifted trials more than 50 percent but tanked activation and paid conversion, forcing a rollback. * A failed test can hold the real winner; contextual onboarding matched to user intent roughly doubled activation and became the default variant after the bundling experiment was rolled back. * In low-volume B2B, read losing experiments for sub-segment signal; a "failed" Stripe form simplification revealed the form was blocking legitimate small-business buyers using Gmail. Connect with the Guest LinkedIn: http://linkedin.com/in/raunakkumar1991 [http://linkedin.com/in/raunakkumar1991] Website: https://fin.ai [https://fin.ai] Sponsor Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts. Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse. With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction. See a demo at https://www.growthbook.io/ [https://www.growthbook.io/]

30. juni 202623 min

The metric Stitch Fix says every experimenter should chase

Description

Comments

1 month for 9 kr.

All episodes