Forsidebilde av showet AI Evals and Analytics Podcast

AI Evals and Analytics Podcast

Podkast av Stella and Amy

engelsk

Teknologi og vitenskap

Deretter 99 kr / Måned. Avslutt når som helst.

  • 20 timer lydbøker i måneden
  • Eksklusive podkaster
  • Gratis podkaster

Les mer AI Evals and Analytics Podcast

Build trustworthy AI products through evaluation-driven development. Each episode covers practical evaluation strategies, industry trends, and best practices for building safe, reliable AI systems. From dataset generation and evals metrics design to cross-functional collaboration and post-launch analytics, we talk about how to build trustworthy and lasting AI products with a good AI evals and analytics framework. Subscribe for practical techniques, industry insights, and guest interviews on AI evaluation and analytics. More about AI Evals and Analytics -- https://ai-evals.org/ We (Stella & Amy) created the AI Evaluation & Analytics Playbook, a practical framework that helps teams ship production-ready, trustworthy AI systems. Powered by Firstory Hosting

Alle episoder

3 Episoder

episode From AI Evals to Business Impact cover

From AI Evals to Business Impact

Why do most AI teams only ask "is this actually working for the business?" after it's too late? When should you start connecting evals to business impact and how do you actually do it? Using the same medical insurance chatbot from the last episode, we show how to bridge the gap between model metrics and the outcomes your leadership actually cares about. We introduce the Eval-to-Impact Stack: a three-layer framework that connects eval metrics, product metrics, and business metrics.  * More details are available in our Substack post: From AI Evals to Business Impact [https://datasciencexai.substack.com/p/from-ai-evals-to-business-impact] * Interested in AI Evals and Analytics Playbook course? Here is an exclusive discount for our listeners [https://maven.com/ai-evals-and-analytics/ai-evals-analytics-playbook?promoCode=EVALPOD] 00:00 – Introduction & Recap of Episode 2 00:53 – Why Teams Ask the Business Impact Question Too Late 01:38 – The Stat: 95% of Enterprise AI Pilots Fail 01:58 – The Translation Problem: Model Metrics vs. Business Metrics 02:38 – Why Evals Get Labeled as Overhead (And How to Fix It) 03:16 – The Eval-to-Impact Stack: Three Layers Explained 05:00 – Applying the Framework: Insurance Chatbot Walkthrough 07:13 – Work Backwards from Business Goals, Not Forward from Metrics 08:05 – The Cross-Functional Superpower: Speaking Both Languages 08:25 – Closing: "Build the Product Right" vs. "Build the Right Product" Stella Liu: https://www.linkedin.com/in/wenxingl/ [https://www.linkedin.com/in/wenxingl/]Amy Chen: https://www.linkedin.com/in/amy17519/ [https://www.linkedin.com/in/amy17519/]More about AI Evals and Analytics -- https://ai-evals.org/ [https://ai-evals.org/]We (Stella & Amy) created the AI Evaluation & Analytics Playbook [https://maven.com/ai-evals-and-analytics/ai-evals-analytics-playbook?promoCode=EVALPOD], a practical framework that helps teams ship production-ready, trustworthy AI systems. Powered by Firstory Hosting [https://firstory.me/zh]

10. mars 2026 - 9 min
episode Build AI Evals from Scratch: When and How? cover

Build AI Evals from Scratch: When and How?

What is Evaluation-driven development? When should you start building evals for your product? How to build it from scrach? Using a real-world example of a customer chatbot for a medical insurance company, we walk through the process of setting up evals from scratch: translating product requirements into quantifiable metrics, curating quality test datasets (hint: you need fewer examples than you think), and making go/no-go decisions based on eval scores. You'll learn why accuracy and safety require different approaches, how to avoid the trap of AI-generated test data, and why 94% vs 95% accuracy matters less than you'd expect—but safety guardrails are non-negotiable. This is the practical blueprint for anyone building AI products who wants to catch problems before users do. 00:00 – Introduction: Why We Need to Talk About Evals Now 00:39 – When to Start AI Evals? 03:20 – Example Setup: Medical Insurance Customer Chatbot 04:30 – Defining Evals in Product Requirements 07:19 – What Is Evaluation-Driven Development? 08:27 – Breaking Down "Accuracy": What Does It Really Mean? 09:42 – Dataset Curation: Quality Over Quantity 11:24 – How Big Should Your Test Set Be? 12:25 – Safety Guardrails: Knowledge Boundary and PII Leakage 15:29 – Making Release Decisions with Eval Metrics 17:33 – Start with What's Critical to Your Use Case Stella Liu: https://www.linkedin.com/in/wenxingl/ [https://www.linkedin.com/in/wenxingl/] Amy Chen: https://www.linkedin.com/in/amy17519/ [https://www.linkedin.com/in/amy17519/] More about AI Evals and Analytics -- https://ai-evals.org/ [https://ai-evals.org/] We (Stella & Amy) created the AI Evaluation & Analytics Playbook, a practical framework that helps teams ship production-ready, trustworthy AI systems. Powered by Firstory Hosting [https://firstory.me/zh]

7. feb. 2026 - 17 min
episode AI Evals Skills: Why Data Scientists Have a Natural Advantage cover

AI Evals Skills: Why Data Scientists Have a Natural Advantage

What are the skills required for AI evals? Why data scientists have a natural advantage in AI evals?  Evaluating AI isn’t just about "vibe coding" with an AI assistant. It actually requires a solid foundation in statistics for picking sample sizes and coding to build your own testing frameworks. Data scientists have a huge head start here because they are already pros at designing metrics and communicating risks.  In the augural episode, we also explain why Evals (pre-launch testing) and Analytics (post-launch user feedback) are two sides of the same coin: one makes sure the AI works, and the other makes sure people actually love using it. 00:00 – Introduction to AI Evals & Analytics  01:31 – Why Data Scientists Have a Natural Advantage 01:59 – Technical Pillar: Statistics  02:48 – Technical Pillar: Coding & Prompt Engineering  05:03 – Technical Pillar: Dataset Generation  08:35 – Soft Skills & Stakeholder Collaboration  11:17 – Domain Expertise in Regulated Industries  15:50 – New Skills for the GenAI Era  19:25 – Why Evals and Analytics Must Come Together  Stella Liu: https://www.linkedin.com/in/wenxingl/ [https://www.linkedin.com/in/wenxingl/] Amy Chen: https://www.linkedin.com/in/amy17519/ [https://www.linkedin.com/in/amy17519/] More about AI Evals and Analytics -- https://ai-evals.org/ [https://ai-evals.org/] We (Stella & Amy) created the AI Evaluation & Analytics Playbook, a practical framework that helps teams ship production-ready, trustworthy AI systems. Powered by Firstory Hosting [https://firstory.me/zh]

26. jan. 2026 - 22 min
Registrer deg for å lytte
Enkelt å finne frem nye favoritter og lett å navigere seg gjennom innholdet i appen
Enkelt å finne frem nye favoritter og lett å navigere seg gjennom innholdet i appen
Liker at det er både Podcaster (godt utvalg) og lydbøker i samme app, pluss at man kan holde Podcaster og lydbøker atskilt i biblioteket.
Bra app. Oversiktlig og ryddig. MYE bra innhold⭐️⭐️⭐️

Velg abonnementet ditt

Mest populær

Tidsbegrenset tilbud

Premium

20 timer lydbøker

  • Eksklusive podkaster

  • Ingen annonser i Podimo shows

  • Avslutt når som helst

2 Måneder for 19 kr
Deretter 99 kr / Måned

Kom i gang

Premium Plus

100 timer lydbøker

  • Eksklusive podkaster

  • Ingen annonser i Podimo shows

  • Avslutt når som helst

Prøv gratis i 14 dager
Deretter 169 kr / måned

Prøv gratis

Bare på Podimo

Populære lydbøker

Ofte stilte spørsmål

Flere spørsmål og svar
Kom i gang

2 Måneder for 19 kr. Deretter 99 kr / Måned. Avslutt når som helst.