The Experimentation Edge

The metric Stitch Fix says every experimenter should chase

20 min · 2 jul 2026
aflevering The metric Stitch Fix says every experimenter should chase artwork

Beschrijving

Summary In this episode of The Experimentation Edge, GrowthBook CMO Ashley Stirrup sits down with Nick Beyler, data science manager at Stitch Fix, where he leads the decision and insights team and owns the company's internal experimentation platform. Nick shares why the metric he most wants is the one he can't measure yet, a North Star that predicts a client's long-term value from their earliest behaviors, and why the most impactful experiment learnings tend to come from adoption friction rather than product bugs. He makes the case that if you're only testing winners you're not taking enough risks, explains how guardrails make that risk safe, and looks ahead to a new in-house platform and the promise of agentic AI. It's a practical, statistician's-eye view of experimentation for product managers, data scientists, and engineers building serious testing programs. Chapters 00:00 Cold open and welcome to the show 01:45 What Stitch Fix actually does 04:15 Balancing AI with the human stylist 05:15 From public policy to the A/B testing adrenaline rush 07:15 Inside the weekly experimentation review group 08:45 The AI style assistant and listening to qualitative feedback 10:45 Why adoption friction beats product bugs 13:45 Testing for losers and building guardrails 15:45 Keep rate, successful fixes, and the holy grail metric 18:15 The new platform and the promise of agentic AI Takeaways * The most impactful experiment learnings usually come from adoption friction, not product bugs. By the time a big feature reaches A/B testing, it's often already a winner, so the open question is how and where to introduce it. * A losing test is a finding, not a failure. If every experiment wins, you're not taking enough risk to learn anything new. * Guardrails and stopping criteria are what make risk-taking safe, especially when the experience is as personal as shopping. * The most valuable North Star metric is the one you can't measure yet, long-term client value, and causal-inference modeling helps predict it from short-term behavior. * Quantitative results are only half the story. Direct, qualitative client feedback inside an experiment often reshapes the rollout more than the numbers do. Connect with the Guest LinkedIn: https://www.linkedin.com/in/nick-beyler-381864119/ [https://www.linkedin.com/in/nick-beyler-381864119/] Website: https://www.stitchfix.com [https://www.stitchfix.com] Sponsor GrowthBook is the warehouse-native platform for experimentation, feature flags, and product analytics trusted by AI-native product teams at 3,000+ companies worldwide. Go to http://growthbook.io [http://growthbook.io?utm_source=edge-podcast&utm_medium=podcast&utm_campaign=episode-25]

Reacties

0

Wees de eerste die een reactie plaatst

Meld je nu aan en word lid van de The Experimentation Edge community!

Probeer gratis

Probeer 14 dagen gratis

€ 9,99 / maand na proefperiode. · Elk moment opzegbaar.

  • Podcasts die je alleen op Podimo hoort
  • 20 uur luisterboeken / maand
  • Gratis podcasts

Alle afleveringen

24 afleveringen

aflevering The metric Stitch Fix says every experimenter should chase artwork

The metric Stitch Fix says every experimenter should chase

Summary In this episode of The Experimentation Edge, GrowthBook CMO Ashley Stirrup sits down with Nick Beyler, data science manager at Stitch Fix, where he leads the decision and insights team and owns the company's internal experimentation platform. Nick shares why the metric he most wants is the one he can't measure yet, a North Star that predicts a client's long-term value from their earliest behaviors, and why the most impactful experiment learnings tend to come from adoption friction rather than product bugs. He makes the case that if you're only testing winners you're not taking enough risks, explains how guardrails make that risk safe, and looks ahead to a new in-house platform and the promise of agentic AI. It's a practical, statistician's-eye view of experimentation for product managers, data scientists, and engineers building serious testing programs. Chapters 00:00 Cold open and welcome to the show 01:45 What Stitch Fix actually does 04:15 Balancing AI with the human stylist 05:15 From public policy to the A/B testing adrenaline rush 07:15 Inside the weekly experimentation review group 08:45 The AI style assistant and listening to qualitative feedback 10:45 Why adoption friction beats product bugs 13:45 Testing for losers and building guardrails 15:45 Keep rate, successful fixes, and the holy grail metric 18:15 The new platform and the promise of agentic AI Takeaways * The most impactful experiment learnings usually come from adoption friction, not product bugs. By the time a big feature reaches A/B testing, it's often already a winner, so the open question is how and where to introduce it. * A losing test is a finding, not a failure. If every experiment wins, you're not taking enough risk to learn anything new. * Guardrails and stopping criteria are what make risk-taking safe, especially when the experience is as personal as shopping. * The most valuable North Star metric is the one you can't measure yet, long-term client value, and causal-inference modeling helps predict it from short-term behavior. * Quantitative results are only half the story. Direct, qualitative client feedback inside an experiment often reshapes the rollout more than the numbers do. Connect with the Guest LinkedIn: https://www.linkedin.com/in/nick-beyler-381864119/ [https://www.linkedin.com/in/nick-beyler-381864119/] Website: https://www.stitchfix.com [https://www.stitchfix.com] Sponsor GrowthBook is the warehouse-native platform for experimentation, feature flags, and product analytics trusted by AI-native product teams at 3,000+ companies worldwide. Go to http://growthbook.io [http://growthbook.io?utm_source=edge-podcast&utm_medium=podcast&utm_campaign=episode-25]

2 jul 202620 min
aflevering What the Expedia Group cannot measure, it cannot ship artwork

What the Expedia Group cannot measure, it cannot ship

Summary Amir Moghaddam, Director of Software Engineering at Expedia Group, joins host Ashley Stirrup on The Experimentation Edge to make the case that measurement is not a reporting step but a gate: what you cannot measure, you cannot ship. Drawing on nearly four years at DoorDash and his current work leading Expedia's air booking platform, Amir explains why he refuses to label experiments winners or losers, how a "failed" pricing test pushed his team toward full personalization, and why a three sided marketplace forces hard trade-offs between competing metrics. The conversation closes on how the same experimentation discipline now applies to shipping and measuring AI. Built for product managers, engineers, data scientists, and growth leaders who care about rigor over opinion. Chapters 00:00 Cold open 00:50 Meet Amir and the air booking platform at Expedia 03:10 DoorDash, growth, and a 70 experiment year 04:20 Three kinds of experimentation at Expedia 06:30 AI velocity and the new frontier model pace 08:30 What you cannot measure, you cannot ship 10:45 The DoorDash carousel and the price experiment 12:45 The three sided marketplace and competing metrics 16:55 There are no losing experiments 20:45 Predictability, LLMs, and Expedia's road ahead Takeaways * "What you cannot measure, you cannot ship" — if you can't measure an outcome, you can't decide whether it's better, so you're just debating opinions. * Measurement spans three live dimensions: spend (more with less), speed (sprints instead of quarters), and quality, with guardrail "do no harm" metrics on top. * There are no losing experiments. A flat result is a signal to either refine the hypothesis or step back and look from a completely different angle. * DoorDash's price experiment proved price by itself doesn't predict orders. Different customers want different things at different times, which pushed the team toward personalization. * A three sided marketplace (buyers, merchants, Dashers) makes metrics compete. Running the test is easy; deciding what to optimize when goals conflict is the real work. Connect with the Guest LinkedIn: https://www.linkedin.com/in/amirmoghaddam [https://www.linkedin.com/in/amirmoghaddam] Website: https://www.expediagroup.com [https://www.expediagroup.com] Sponsor GrowthBook is the warehouse-native platform for experimentation, feature flags, and product analytics trusted by AI-native product teams at 3,000+ companies worldwide. Go to growthbook.io [http://growthbook.io]

1 jul 202628 min
aflevering How Fin went from weeks to hours of analysis using AI artwork

How Fin went from weeks to hours of analysis using AI

Summary In this episode of The Experimentation Edge, host Ashley Stirrup sits down with Raunak Kumar, senior manager of GTM analytics at Fin (formerly Intercom), to unpack how experimentation actually works when the data is messy and the traffic is thin. Drawing on nearly 12 years in marketing analytics across Atlassian, Stripe, and Fin, Raunak explains how AI tools like Claude Code have collapsed analysis from weeks to hours and freed his team to clear its experiment backlog, why declining organic search traffic and a 5x jump in untagged ChatGPT referrals are forcing teams to rethink attribution, and how the most valuable experiments are often the ones that "lose." From a Jira Service Desk bundling test that won on trials but had to be rolled back, to a Stripe contact form that was quietly blocking real buyers, this conversation is a practical guide for product managers, engineers, data scientists, and growth marketers who want to learn more from every test they run. Chapters 0:45 Welcome and what the show is about 1:45 Raunak's role and 12 years in marketing analytics 2:45 How AI and Claude Code changed the analyst's day 4:15 LLMs, declining organic traffic, and the 5x ChatGPT jump 5:15 Two kinds of experiments at Fin: on page and off page 7:15 The Jira Service Desk bundling experiment 10:45 Why the trial winner became a rollback 11:45 Contextual onboarding turns the loser into a winner 14:45 Reading an experiment that loses 18:45 What's next: incrementality, connected TV, and testing creative Takeaways * AI has collapsed marketing analysis from weeks to hours, and the real payoff is a cleared experiment backlog plus analysts who compete on the questions they ask, not the speed they query. * Organic search traffic is declining as ChatGPT, Gemini's AI mode, and Claude answer buyers in place; Fin saw a 5x rise in ChatGPT referrals, but LLMs don't tag that traffic, so attribution has to be proven through experiments. * A guardrail metric saved Atlassian from a costly mistake: bundling Jira Service Desk lifted trials more than 50 percent but tanked activation and paid conversion, forcing a rollback. * A failed test can hold the real winner; contextual onboarding matched to user intent roughly doubled activation and became the default variant after the bundling experiment was rolled back. * In low-volume B2B, read losing experiments for sub-segment signal; a "failed" Stripe form simplification revealed the form was blocking legitimate small-business buyers using Gmail. Connect with the Guest LinkedIn: http://linkedin.com/in/raunakkumar1991 [http://linkedin.com/in/raunakkumar1991] Website: https://fin.ai [https://fin.ai] Sponsor Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts. Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse. With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction. See a demo at https://www.growthbook.io/ [https://www.growthbook.io/]

30 jun 202623 min
aflevering Inside The Home Depot's experimentation at a $25B scale artwork

Inside The Home Depot's experimentation at a $25B scale

Summary What does experimentation look like inside a $150 billion retailer? In this episode of The Experimentation Edge, host Ashley Stirrup talks with Kim Ting Li, Senior Manager of Experimentation at The Home Depot, where one centralized team tests every major change to a $25 billion online business. Kim explains how 40 people serve 40–50 business teams, why executives join test readouts and ping analysts directly, how every result since 2020 lives in a searchable library, and why scaling beyond hundreds of experiments per year depends on server-side testing capabilities more than AI. For product, data, and engineering leaders building or scaling experimentation programs. Chapters 00:00 Intro 00:45 From neuroscience research to Home Depot 01:45 A $150B enterprise, a $25B online business 02:45 The centralized experimentation model 03:45 Inside the 40-person team 04:30 Readouts, blast emails, and the experiment library 05:40 Executive visibility and the golden rule 06:15 "If you won't act on a bad result, don't run the test" 11:15 Learning from losing tests 12:30 Scaling up: AI, server-side testing, and what's next Takeaways * One centralized team of about 40 people tests every major change to Home Depot's $25B online business, serving 40–50 business teams with consistent hypothesis and analysis standards. * Executive engagement is real at Home Depot: leaders join 30-minute readouts, search the experiment library, and ping analysts directly because they treat A/B testing as the golden rule for measuring incrementality. * Institutional memory is infrastructure — every test result since 2020 lives in a centralized, searchable archive so no one re-runs a question the company already answered. * Kim's stakeholder filter: if you wouldn't do anything differently after a bad result, don't run the test. * Scaling past low hundreds of experiments per year is a capabilities problem before it's an AI problem — Home Depot is moving from client-side to server-side testing so winners release quickly, end to end. Connect with the Guest LinkedIn: https://www.linkedin.com/in/kimtingli [https://www.linkedin.com/in/kimtingli] Website: https://www.homedepot.com [https://www.homedepot.com] Sponsor Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts. Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse. With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction. See a demo at https://www.growthbook.io/ [https://www.growthbook.io/]

29 jun 202611 min
aflevering How Disney picks which experiments to run artwork

How Disney picks which experiments to run

Summary What does it look like to kill a multimillion dollar feature before anyone builds it? In this episode of The Experimentation Edge, host Ashley Stirrup talks with Crystal Ammari, a digital product optimization and experimentation strategy leader whose career spans Nike and The Walt Disney Company. Crystal shares the "dry test" that used a single fake button to measure demand for video chat (4 million users, 106 clicks), why she reframes experimentation as savings and gains rather than wins and losses, how a misconfigured tool, not bad methodology, made tests take six months, and how a stuck Disney team went from "we don't know where to start" to 110 scored and prioritized test ideas. For product, data, and engineering leaders building or scaling experimentation programs. Chapters 00:00 Intro 00:45 The mindset shift from shipping to results 02:00 Why testing took six months, a tooling problem 03:15 The dev team that laughed, and the vendor who agreed 04:50 An executive demand for video chat 05:35 Dry testing with a fake button 06:30 106 clicks and a multimillion dollar save 07:30 Savings and gains, not wins and losses 08:45 The Disney team that didn't know where to start 10:30 From low engagement to 110 prioritized ideas 12:45 Just get something live, and where AI fits next Takeaways * A "dry test", a fake "Click here to video chat" button that grayed out on click — measured real demand without building the feature. Of roughly 4 million users, only 106 clicked, killing a multimillion dollar build. * Reframe experiment outcomes as savings and gains, not wins and losses. A "losing" test saves you from a costly mistake, which keeps teams focused on learning instead of fearing failure. * Slow experimentation is often a tooling problem, not a methodology problem. One program's six month test cycle came from rebuilding every page instead of overlaying changes the way the tool intended. * Getting a stuck team unstuck starts with data and a workshop. A Disney team went from "we don't know where to start" to 110 scored, prioritized test ideas, using Contentsquare heatmaps to diagnose low engagement first. * The biggest thing that gets a team testing is to just do it. Stop designing the perfect experiment and get something simple live to take away the mystery. Connect with the Guest LinkedIn: https://www.linkedin.com/in/crystal-ammari/ [https://www.linkedin.com/in/crystal-ammari/] Website: https://thewaltdisneycompany.com [https://thewaltdisneycompany.com] Sponsor Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts. Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse. With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction. See a demo at https://www.growthbook.io/ [https://www.growthbook.io/]

29 jun 202632 min