The Experimentation Edge

Inside The Home Depot's experimentation at a $25B scale

11 min · 29. Juni 2026
Episode Inside The Home Depot's experimentation at a $25B scale Cover

Beschreibung

Summary What does experimentation look like inside a $150 billion retailer? In this episode of The Experimentation Edge, host Ashley Stirrup talks with Kim Ting Li, Senior Manager of Experimentation at The Home Depot, where one centralized team tests every major change to a $25 billion online business. Kim explains how 40 people serve 40–50 business teams, why executives join test readouts and ping analysts directly, how every result since 2020 lives in a searchable library, and why scaling beyond hundreds of experiments per year depends on server-side testing capabilities more than AI. For product, data, and engineering leaders building or scaling experimentation programs. Chapters 00:00 Intro 00:45 From neuroscience research to Home Depot 01:45 A $150B enterprise, a $25B online business 02:45 The centralized experimentation model 03:45 Inside the 40-person team 04:30 Readouts, blast emails, and the experiment library 05:40 Executive visibility and the golden rule 06:15 "If you won't act on a bad result, don't run the test" 08:15 The $2.99 shipping fee that tanked conversion 11:15 Learning from losing tests 12:30 Scaling up: AI, server-side testing, and what's next Takeaways * One centralized team of about 40 people tests every major change to Home Depot's $25B online business, serving 40–50 business teams with consistent hypothesis and analysis standards. * Executive engagement is real at Home Depot: leaders join 30-minute readouts, search the experiment library, and ping analysts directly because they treat A/B testing as the golden rule for measuring incrementality. * Institutional memory is infrastructure — every test result since 2020 lives in a centralized, searchable archive so no one re-runs a question the company already answered. * Kim's stakeholder filter: if you wouldn't do anything differently after a bad result, don't run the test. * Scaling past low hundreds of experiments per year is a capabilities problem before it's an AI problem — Home Depot is moving from client-side to server-side testing so winners release quickly, end to end. Connect with the Guest LinkedIn: https://www.linkedin.com/in/kimtingli [https://www.linkedin.com/in/kimtingli] Website: https://www.homedepot.com [https://www.homedepot.com] Sponsor Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts. Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse. With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction. See a demo at https://www.growthbook.io/ [https://www.growthbook.io/]

Kommentare

0

Sei die erste Person, die kommentiert

Melde dich jetzt an und werde Teil der The Experimentation Edge-Community!

Loslegen

2 Monate für 1 €

Dann 4,99 € / Monat · Jederzeit kündbar.

  • Podcasts nur bei Podimo
  • 20 Stunden Hörbücher / Monat
  • Alle kostenlosen Podcasts

Alle Folgen

21 Folgen

Episode Inside The Home Depot's experimentation at a $25B scale Cover

Inside The Home Depot's experimentation at a $25B scale

Summary What does experimentation look like inside a $150 billion retailer? In this episode of The Experimentation Edge, host Ashley Stirrup talks with Kim Ting Li, Senior Manager of Experimentation at The Home Depot, where one centralized team tests every major change to a $25 billion online business. Kim explains how 40 people serve 40–50 business teams, why executives join test readouts and ping analysts directly, how every result since 2020 lives in a searchable library, and why scaling beyond hundreds of experiments per year depends on server-side testing capabilities more than AI. For product, data, and engineering leaders building or scaling experimentation programs. Chapters 00:00 Intro 00:45 From neuroscience research to Home Depot 01:45 A $150B enterprise, a $25B online business 02:45 The centralized experimentation model 03:45 Inside the 40-person team 04:30 Readouts, blast emails, and the experiment library 05:40 Executive visibility and the golden rule 06:15 "If you won't act on a bad result, don't run the test" 08:15 The $2.99 shipping fee that tanked conversion 11:15 Learning from losing tests 12:30 Scaling up: AI, server-side testing, and what's next Takeaways * One centralized team of about 40 people tests every major change to Home Depot's $25B online business, serving 40–50 business teams with consistent hypothesis and analysis standards. * Executive engagement is real at Home Depot: leaders join 30-minute readouts, search the experiment library, and ping analysts directly because they treat A/B testing as the golden rule for measuring incrementality. * Institutional memory is infrastructure — every test result since 2020 lives in a centralized, searchable archive so no one re-runs a question the company already answered. * Kim's stakeholder filter: if you wouldn't do anything differently after a bad result, don't run the test. * Scaling past low hundreds of experiments per year is a capabilities problem before it's an AI problem — Home Depot is moving from client-side to server-side testing so winners release quickly, end to end. Connect with the Guest LinkedIn: https://www.linkedin.com/in/kimtingli [https://www.linkedin.com/in/kimtingli] Website: https://www.homedepot.com [https://www.homedepot.com] Sponsor Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts. Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse. With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction. See a demo at https://www.growthbook.io/ [https://www.growthbook.io/]

29. Juni 202611 min
Episode How Disney picks which experiments to run Cover

How Disney picks which experiments to run

Summary What does it look like to kill a multimillion dollar feature before anyone builds it? In this episode of The Experimentation Edge, host Ashley Stirrup talks with Crystal Ammari, a digital product optimization and experimentation strategy leader whose career spans Nike and The Walt Disney Company. Crystal shares the "dry test" that used a single fake button to measure demand for video chat (4 million users, 106 clicks), why she reframes experimentation as savings and gains rather than wins and losses, how a misconfigured tool, not bad methodology, made tests take six months, and how a stuck Disney team went from "we don't know where to start" to 110 scored and prioritized test ideas. For product, data, and engineering leaders building or scaling experimentation programs. Chapters 00:00 Intro 00:45 The mindset shift from shipping to results 02:00 Why testing took six months, a tooling problem 03:15 The dev team that laughed, and the vendor who agreed 04:50 An executive demand for video chat 05:35 Dry testing with a fake button 06:30 106 clicks and a multimillion dollar save 07:30 Savings and gains, not wins and losses 08:45 The Disney team that didn't know where to start 10:30 From low engagement to 110 prioritized ideas 12:45 Just get something live, and where AI fits next Takeaways * A "dry test", a fake "Click here to video chat" button that grayed out on click — measured real demand without building the feature. Of roughly 4 million users, only 106 clicked, killing a multimillion dollar build. * Reframe experiment outcomes as savings and gains, not wins and losses. A "losing" test saves you from a costly mistake, which keeps teams focused on learning instead of fearing failure. * Slow experimentation is often a tooling problem, not a methodology problem. One program's six month test cycle came from rebuilding every page instead of overlaying changes the way the tool intended. * Getting a stuck team unstuck starts with data and a workshop. A Disney team went from "we don't know where to start" to 110 scored, prioritized test ideas, using Contentsquare heatmaps to diagnose low engagement first. * The biggest thing that gets a team testing is to just do it. Stop designing the perfect experiment and get something simple live to take away the mystery. Connect with the Guest LinkedIn: https://www.linkedin.com/in/crystal-ammari/ [https://www.linkedin.com/in/crystal-ammari/] Website: https://thewaltdisneycompany.com [https://thewaltdisneycompany.com] Sponsor Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts. Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse. With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction. See a demo at https://www.growthbook.io/ [https://www.growthbook.io/]

29. Juni 202632 min
Episode Ship faster, measure better: experimentation in the age of AI Cover

Ship faster, measure better: experimentation in the age of AI

Summary How do you know if the thing you just shipped actually worked? On this episode of The Experimentation Edge, host Ashley Stirrup, CMO of GrowthBook, sits down with Kevin Yang, Executive Director and Head of Experimentation at JPMorgan Chase, who has spent six years building experimentation across Chase's digital platforms. Kevin shares how his team turned experimentation into more than a billion dollars of estimated value, why the losing experiments matter more than the winners, and the simple chart exercise he uses to prove that a million-dollar change is invisible without a control group. He and Ashley also dig into measuring engagement without chasing vanity metrics, planning for failure to defeat confirmation bias, and why AI is pushing experimentation into a golden era. It's a practical look for product managers, data scientists, and engineers at how a bank operating at massive scale makes better decisions. Chapters 00:00 Welcome to the experimentation edge 01:45 Kevin's role leading experimentation at chase 04:15 Why chase invested in experimentation 06:45 A billion dollars and the value of losers 12:45 Plan for failure to beat confirmation bias 14:30 The million dollar change you can't see 18:45 Sharing learnings and experimentation wrapped 20:45 Engagement without vanity metrics 22:00 Experimentation's golden era with AI 23:30 Why AI needs more experimentation, not less Takeaways * Chase estimates over a billion dollars of value from experimentation, and most of the lasting learning comes from the losing tests, not the winners. * A control group is non-negotiable: at scale, a change worth millions is invisible under noise and seasonality, and no one can spot it by eye. * Treat engagement carefully. For a bank, more time in the app isn't a win; trust, fast task completion, and healthy repeat engagement are. * Plan for failure before you run a test. A pre-built playbook for a loss prevents confirmation bias and keeps teams from gaming the metrics. * AI is ushering in a golden era for experimentation, because shipping faster only compounds mistakes unless you measure what you ship. Connect with the Guest LinkedIn: https://www.linkedin.com/in/kevintyang [https://www.linkedin.com/in/kevintyang] Website: https://www.jpmorganchase.com [https://www.jpmorganchase.com] Sponsor Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts. Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse. With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction. See a demo at https://www.growthbook.io/ [https://www.growthbook.io/]

25. Juni 202626 min
Episode False negatives are killing your best product ideas Cover

False negatives are killing your best product ideas

Summary  How do you make a high-stakes product decision when the safe choice is to never test it at all? In this episode of The Experimentation Edge, host Ashley Stirrup talks with Arun Bodapati, director of data science at Twitch, about the discipline behind trustworthy experimentation. Drawing on his experience at Schwab, Uber, and Twitch, Arun explains why false negatives are the most dangerous result a team can produce, what hygiene to nail before you push play, and how Twitch used geo-fenced experiments and causal inference to finally settle a pricing question it had avoided for years. It's a practical conversation for product managers, engineers, data scientists, and growth leaders who want experiments that hold up  and earn executive trust.   Chapters 00:00 Welcome and introduction 01:15 Arun's background and marketing experimentation at Schwab 04:15 Uber's mature, experiment-driven culture 06:30 Coming to Twitch: from Python notebooks to a shared standard 08:30 The pricing problem Twitch had long avoided 10:30 Geo-fenced experiments, matched markets, and elasticity 13:15 The gifted-subs surprise and testing promotions 16:15 The discipline that matters before you push play 18:15 Why false negatives are worse than false positives 20:05 Enrollment triggers and broad explore experiments 22:45 AI, the Kiro tool, and what's next for experimentation Takeaways  * False negatives are more dangerous than false positives — they get institutionalized as "we tried that, it didn't work" and quietly kill good ideas for years. * The most valuable experiment work happens before you push play: clear enrollment logic, a plain-English hypothesis, and no optimizing ahead of the test. * If an intervention sounds weak when you write it out in plain English, don't run the experiment — you're just wasting time. * Run a broad explore experiment first; small, over-narrowed populations lack power and raise the odds of a false negative. Find the responsive segment with heterogeneous treatment effects afterward. * Twitch used geo-fenced experiments with matched markets and causal inference to measure true price elasticity, turning a feared pricing decision into a measured, accretive one. Connect with the Guest  LinkedIn: https://www.linkedin.com/in/abodapati/ [https://www.linkedin.com/in/abodapati/] Website: https://www.twitch.tv [https://www.twitch.tv] Sponsor Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts. Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse. With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction. See a demo at https://www.growthbook.io/ [https://www.growthbook.io/]

24. Juni 202628 min
Episode Squarespace killed its blank template and built something better Cover

Squarespace killed its blank template and built something better

Summary What do you do when your big launch increases engagement and tanks conversion? On this episode of The Experimentation Edge, host Ashley Stirrup talks with Lina Blackman, Director of Product Analytics at Squarespace, about the blank template launch that flopped — and how its learnings became Blueprint, Squarespace's AI-guided website builder. Lina explains how her embedded analyst team runs 150–200 experiments a year for 3 million customers, the two questions she asks every time a test loses, why teams only need one or two big wins a quarter, how Squarespace calibrates statistical certainty to business stakes, and where AI belongs (and doesn't) in the A/B testing workflow. For product managers, data scientists, and experimentation leaders who want to extract more learning from every test. Chapters 00:00 Introduction: Lina Blackman, Director of Product Analytics at Squarespace 01:45 Squarespace's business and 3 million website customers 02:30 Decentralized analysts, centralized experimentation program 04:15 150–200 experiments a year: onboarding, mobile, checkout, pricing 04:55 The blank template disaster that became Blueprint AI 07:45 Two questions for every losing test 09:30 Moving ship-first teams up the experimentation maturity curve 12:30 A/B test logs and insights rituals 13:30 North Star metrics and the KPI tree 16:35 AI in the A/B testing workflow — and what stays manual. Takeaways * Stated preference lies: users asked for a blank canvas, but behavior demanded guided design — and only the experiment could referee. * Close every losing test with two questions: did it work for a granular segment, and is the idea worth further investment? * One or two big wins a quarter is a healthy hit rate when you run 150–200 experiments a year. * Calibrate certainty to stakes — tight bounds on revenue and pricing tests, wider bounds on engagement tests so teams don't spin on noise. * Hand AI the mundane parts of the workflow (tracking, assignment setup), but if AI runs the brief and the analysis, ask why you're running the test at all. Connect with the Guest LinkedIn: https://www.linkedin.com/in/linanguyen [https://www.linkedin.com/in/linanguyen] Website: https://www.squarespace.com [https://www.squarespace.com] Sponsor Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts. Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse. With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction. See a demo at https://www.growthbook.io/ [https://www.growthbook.io/]

23. Juni 202622 min