Inference & Intelligence Lab

Podcast by Lin Jia

English

Technology & science

Limited Offer

2 months for 19 kr.

Then 99 kr. / monthCancel anytime.

20 hours of audiobooks / month
Podcasts only on Podimo
All free podcasts

Get Started

About Inference & Intelligence Lab

Inference & Intelligence Lab is a podcast on statistical inference, causal inference, machine learning, and GenAI evaluation, focused on making decisions that hold up in real-world data science. The show features two series—Causal Inference From the Ground Up and Inference in the Wild—covering both first principles and practical pitfalls.

All episodes

10 episodes

The Causality Gap: Measuring the True Impact of Voluntary Adoption in Digital Marketplaces

Across the tech industry, many of the most valuable features rely on voluntary adoption. A traveler chooses whether to join a loyalty program, or a marketplace seller decides whether to opt into a smart-pricing tool. Because you cannot force users to adopt a feature, standard A/B tests leave teams with a diluted, flat topline result. Genuinely great features get prematurely killed simply because the bottleneck was an adoption problem, not a product quality problem. In this special episode, we break down The Causality Gap. We expose the structural math flaws that cause standard observational methods (like PSM or regression adjustment) to fail in opt-in scenarios, and reveal how combining Randomized Encouragement Design (RED) with Double Machine Learning (DoubleML) provides a diagnostic map to save your highest-potential features. In this episode, we discuss: * The "Opt-In" Trilemma: Why voluntary adoption, extreme user heterogeneity, and finite samples break traditional product feedback loops. * The Collider Bias Trap: Why matching or conditioning on post-treatment adoption creates a spurious correlation that breaks your counterfactuals by design. * Randomized Encouragement Design (RED): Leveraging randomized nudges as Instrumental Variables to build a clean causal chain reaction. * Denoise First, Estimate Second: How DoubleML strips out immense marketplace noise while avoiding regularization bias through cross-fitting. * ATT vs. ITT: How decomposing your rollout-level impact from your adopter-level impact tells you exactly whether to iterate on the feature or optimize the funnel. 📖 Read the deep dive on booking.ai medium blogpost (with illustrations and takeaways): [Link] About the Host Lin Jia is a Senior Data Scientist and Craft Lead at Booking.com with over 9 years of experience . Operating at the intersection of statistical inference, causal machine learning, and GenAI evaluation, she specializes in building the frameworks that enable trustworthy, decision-ready insights under real-world constraints . A recognized expert in the field, Lin has authored research on sensitivity analysis presented at KDD 2024 and leads the development of organization-wide standards for experimentation and causal inference . 🤝 Connect with me on LinkedIn: ⁠⁠https://www.linkedin.com/in/linjia/ [https://www.linkedin.com/in/linjia/] 🚀 Support the Craft If you found this episode valuable, please consider: * Following the Podcast: Tap the "+" or "Follow" button on Spotify to stay at the cutting edge of measurement strategy. * Sharing the Episode: Know a Data Scientist or Product Manager struggling to measure opt-in platform features? Send this their way. * Joining the Conversation: Share your thoughts on today’s topic on LinkedIn—let’s raise the standard of the DS craft together.

Yesterday - 20 min

Build the Camera — How Measurement Design Guides Statistical Testing | EP2: Inference in the Wild

EP2: Build the Camera — Why Measurement Design Trumps Statistical Testing Running a statistical test is simply pressing the shutter. But designing the measurement system? That is building the camera. In this episode, we challenge the industry’s obsession with "which test to run" and shift the focus to what actually matters: whether your metric captures meaningful change in user behavior. We explore why even a successful feature can "fail" a T-test (p=0.34) not because the feature failed, but because the raw metric amplified noise from outliers and suppressed the pattern that mattered. In this episode, we discuss: * The Shutter vs. The Camera: Why statistical tests are secondary to how you define and reduce noise in your metrics. * The Estimand Trade-off: How common transformations (like Log-Transforms) don't just change the distribution—they fundamentally alter the business question you are answering. * Leverage through Design: Why the most successful Data Science teams focus on what to measure rather than just how to test. * Case Study: Rank Transformation: Using ranks as a strategic design choice to neutralize outliers while preserving the directional "truth" of your data. Before choosing a statistical test, every practitioner should ask: 1. The Business Question: What do stakeholders actually need to know (e.g., "by how many minutes" or simply "is it better")? 2. Metric Topology: What does the distribution really look like, and where is the noise coming from? 3. The Noise Reduction Strategy: Which approach preserves the "truth" while eliminating the interference of outliers? 4. The Reliability Proof: Does simulation verify that this method achieves 80% power without inflating the False Positive Rate for this specific metric? 📖 Read the companion deep dive (with illustrations and takeaways): https://inferenceintel.substack.com/p/build-the-camera-how-measurement [https://inferenceintel.substack.com/p/build-the-camera-how-measurement] Lin Jia is a Senior Data Scientist and Craft Lead at Booking.com with over 9 years of experience. Operating at the intersection of statistical inference, causal machine learning, and GenAI evaluation, she specializes in building the frameworks that enable trustworthy, decision-ready insights under real-world constraints. A recognized expert in the field, Lin has authored research on sensitivity analysis presented at KDD 2024 and leads the development of organization-wide standards for experimentation and causal inference. 🤝 Connect with me on LinkedIn: ⁠https://www.linkedin.com/in/linjia/ [https://www.linkedin.com/in/linjia/] If you found this episode valuable, please consider: * Following the Podcast: Tap the "+" or "Follow" button on Spotify to stay at the cutting edge of measurement strategy. * Sharing the Episode: Know a Data Scientist frustrated by "insignificant" results on successful features? Send this their way. * Joining the Conversation: Share your thoughts on today’s topic on LinkedIn—let’s raise the standard of the DS craft together.

10 Apr 2026 - 8 min

No Interference, No Ambiguity: The SUTVA Assumption ｜ EP7: Causal Inference from the Ground Up

No Interference, No Ambiguity: The SUTVA Assumption Your randomized experiment is clean. The groups are balanced and comparable. The p-value is significant. But behind the scenes, the treatment is leaking. User A shared their referral link with User B in the control group, and suddenly your "independent" comparison is contaminated. Welcome to the most common—and most ignored—failure point in experimentation: SUTVA (The Stable Unit Treatment Value Assumption). As Fisher famously warned, consulting a statistician after a broken experiment is just asking for a "post-mortem". If SUTVA breaks, you get confident numbers that mean absolutely nothing. In this episode, we discuss: * The Two Pillars of SUTVA: Why valid experiments require both "No Interference" (no spillovers) and "Consistency" (a well-defined treatment). * The Three Mechanisms of Interference: From direct network effects to indirect marketplace competition and systemic behavioral redirection. * The Consistency Trap: Why "the feature" can't mean different things for different users—and how to avoid "hidden versions" of your intervention. * The Experimental Fix: When to move beyond individual randomization toward Cluster Randomization, Switchbacks, or Synthetic Controls. The Interference Diagnostic (Key Takeaways): * Contagion Risks? Randomize at the city or region level to keep social interactions within the treated unit. * Marketplace Bottlenecks? Use switchback designs to handle units competing for finite supply. * Spatial Shifts? Use buffer zones or synthetic controls to ensure you aren't just claiming credit for demand that simply moved elsewhere. Stop running experiments that leak. Start designing for stability. 📖 Read the companion deep dive (with illustrations and takeaways): https://inferenceintel.substack.com/p/no-interference-no-ambiguity-the [https://inferenceintel.substack.com/p/no-interference-no-ambiguity-the] About the Host Lin Jia is a Senior Data Scientist and Craft Lead at Booking.com with over 9 years of experience. Operating at the intersection of statistical inference, causal machine learning, and GenAI evaluation, she specializes in building the frameworks that enable trustworthy, decision-ready insights under real-world constraints. A recognized expert in the field, Lin has authored research on sensitivity analysis presented at KDD 2024 and leads the development of organization-wide standards for experimentation and causal inference. 🤝 Connect with me on LinkedIn: https://www.linkedin.com/in/linjia/ [https://www.linkedin.com/in/linjia/] 🚀 Support the Craft If you found this episode valuable, please consider: * Following the Podcast: Tap the "+" or "Follow" button on Spotify to never miss a deep dive into causal inference and GenAI. * Sharing the Episode: Know a Data Scientist or Product Leader struggling with "No Overlap"? Send this their way. * Joining the Conversation: Share your thoughts on today’s topic on LinkedIn—let’s raise the standard of the DS craft together.

3 Apr 2026 - 12 min

No Overlap, No Answer: The Positivity Assumption | Ep6: Causal Inference from the Ground Up

No Overlap, No Answer: The Positivity Assumption A causal effect can only be estimated where a comparison is actually possible. Imagine evaluating a loyalty program where every enterprise customer is already enrolled—leaving you with no unenrolled counterparts to compare against. This is a violation of Positivity. While exchangeability requires that groups are comparable, positivity requires that the comparison actually exists. In this episode, we discuss: * Structural vs. Random Violations: Why business-logic "zeros" cannot be fixed with more data. * The Propensity Score Plot: How to visually verify if your treated and untreated groups cover the same territory. * The Trimming Trade-off: Why discarding extreme observations to force overlap changes the population your results apply to. The Positivity Audit (Key Takeaways): * Verify Overlap: Use propensity scores to ensure groups share common support. * Identify Structural Zeros: Recognize when policy or logic makes receiving a treatment impossible for certain subgroups. * Watch External Validity: Always report dropped observations to clarify the narrowed scope of your findings. 🚀 Support the Craft If you found this episode valuable, please consider: * Following the Podcast: Tap the "+" or "Follow" button on Spotify to never miss a deep dive into causal inference and GenAI. * Sharing the Episode: Know a Data Scientist or Product Leader struggling with "No Overlap"? Send this their way. * Joining the Conversation: Share your thoughts on today’s topic on LinkedIn—let’s raise the standard of the DS craft together. 📖 Read the companion deep dive (with illustrations and takeaways): https://open.substack.com/pub/inferenceintel/p/no-overlap-no-answer-the-positivity?r=7bs4uy&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true [https://open.substack.com/pub/inferenceintel/p/no-overlap-no-answer-the-positivity?r=7bs4uy&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true] 🤝 Connect with me on LinkedIn: https://www.linkedin.com/in/linjia/ [https://www.linkedin.com/in/linjia/] About the Host Lin Jia is a Senior Data Scientist and Craft Lead at Booking.com with over 9 years of experience. Operating at the intersection of statistical inference, causal machine learning, and GenAI evaluation, she specializes in building the frameworks that enable trustworthy, decision-ready insights under real-world constraints. A recognized expert in the field, Lin has authored research on sensitivity analysis presented at KDD 2024 and leads the development of organization-wide standards for experimentation and observational causal inference.

27 Mar 2026 - 13 min

Comparing Apples to Apples: The Exchangeability Assumption | EP5: Causal Inference from the Ground Up

Comparing Apples to Apples: The Exchangeability Assumption Your dashboard flags a troubling trend: users who contacted customer support have a 40% higher churn rate than those who didn’t. The immediate takeaway seems obvious—support is failing. But is it? Or did those customers contact support because something had already gone wrong? In this episode, we tackle the heart of the "Bad Comparison" problem. We dive into Exchangeability—the fundamental assumption that allows us to treat observational data as if it were a randomized experiment. If your groups aren't exchangeable, your model isn't measuring an effect; it's just measuring a pre-existing difference. In this episode, we discuss: * The Support Trap: Why correlation often hides the "underlying fire" and leads to backward business decisions. * The "Swap" Test: A simple mental framework to determine if your treatment and control groups are truly comparable. * Bias Under the Null: How a model can show a massive "effect" even when the treatment does absolutely nothing. * Forcing Exchangeability: The role of conditioning and why choosing the right covariates is the most critical decision a Data Scientist makes. Stop settling for bad comparisons. Start ensuring your data is "exchangeable" before you trust the result. 📖 Read the companion deep dive (with illustrations and takeaways): ⁠https://inferenceintel.substack.com/p/comparing-apples-to-apples-the-exchangeability⁠ [https://inferenceintel.substack.com/p/comparing-apples-to-apples-the-exchangeability] 🚀 Support the Craft If you found this episode valuable, please consider: * Following the Podcast: Tap the "+" or "Follow" button on Spotify to never miss a deep dive into causal inference and GenAI. * Sharing the Episode: Know a Data Scientist or Product Leader struggling with the "Data Validity Cliff"? Send this their way. * Joining the Conversation: Share your thoughts on today’s topic on LinkedIn—let’s raise the standard of the DS craft together. About the Host Lin Jia is a Senior Data Scientist and Craft Lead at Booking.com with over 9 years of experience. Operating at the intersection of statistical inference, causal machine learning, and GenAI evaluation, she specializes in building the frameworks that enable trustworthy, decision-ready insights under real-world constraints. A recognized expert in the field, Lin has authored research on sensitivity analysis presented at KDD 2024 and leads the development of organization-wide standards for experimentation and causal inference. 🤝 Connect with me on LinkedIn: https://www.linkedin.com/in/linjia/ [https://www.linkedin.com/in/linjia/]

15 Mar 2026 - 20 min

En fantastisk app med et enormt stort udvalg af spændende podcasts. Podimo formår virkelig at lave godt indhold, der takler de lidt mere svære emner. At der så også er lydbøger oveni til en billig pris, gør at det er blevet min favorit app.

Rigtig god tjeneste med gode eksklusive podcasts og derudover et kæmpe udvalg af podcasts og lydbøger. Kan varmt anbefales, om ikke andet så udelukkende pga Dårligdommerne, Klovn podcast, Hakkedrengene og Han duo 😁 👍

Podimo er blevet uundværlig! Til lange bilture, hverdagen, rengøringen og i det hele taget, når man trænger til lidt adspredelse.

Choose your subscription