Snacks Weekly on Data Science

Building Taxonomies with Large Language Models [Microsoft]

8 min · 25. Mai 2026
Episode Building Taxonomies with Large Language Models [Microsoft] Cover

Beschreibung

In this episode, we look at how companies deal with large volumes of unstructured text and why traditional clustering methods often fall short at scale. We explore two LLM-powered approaches shared by data scientists from Microsoft: a bottom-up pipeline that builds structure from data using embeddings and clustering, and a top-down pipeline that starts with LLM-generated categories and refines them recursively into a hierarchy. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/data-science-at-microsoft/from-chaos-to-clarity-building-taxonomies-from-unstructured-text-using-large-language-models-c1303db3adb1 [https://medium.com/data-science-at-microsoft/from-chaos-to-clarity-building-taxonomies-from-unstructured-text-using-large-language-models-c1303db3adb1]

Kommentare

0

Sei die erste Person, die kommentiert

Melde dich jetzt an und werde Teil der Snacks Weekly on Data Science-Community!

Loslegen

2 Monate für 1 €

Dann 4,99 € / Monat · Jederzeit kündbar.

  • Podcasts nur bei Podimo
  • 20 Stunden Hörbücher / Monat
  • Alle kostenlosen Podcasts

Alle Folgen

140 Folgen

Episode Building Taxonomies with Large Language Models [Microsoft] Cover

Building Taxonomies with Large Language Models [Microsoft]

In this episode, we look at how companies deal with large volumes of unstructured text and why traditional clustering methods often fall short at scale. We explore two LLM-powered approaches shared by data scientists from Microsoft: a bottom-up pipeline that builds structure from data using embeddings and clustering, and a top-down pipeline that starts with LLM-generated categories and refines them recursively into a hierarchy. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/data-science-at-microsoft/from-chaos-to-clarity-building-taxonomies-from-unstructured-text-using-large-language-models-c1303db3adb1 [https://medium.com/data-science-at-microsoft/from-chaos-to-clarity-building-taxonomies-from-unstructured-text-using-large-language-models-c1303db3adb1]

25. Mai 20268 min
Episode Fraud Detection with Multi-Agent AI Architecture [Razorpay] Cover

Fraud Detection with Multi-Agent AI Architecture [Razorpay]

In this episode, we discuss a classic scaling problem in fraud and risk operations: too much manual review, inconsistent judgments, and growing complexity. We explore the team’s solution, Bumblebee, a multi-agent AI architecture that separates planning, evidence gathering, and analysis into specialized roles, enabling a robust and scalable system to solve the problem. For more details, you can refer to their published tech blog, linked here for your reference: https://engineering.razorpay.com/meet-bumblebee-the-multi-agent-ai-architecture-that-changed-fraud-detection-at-razorpay-c2b6d5704f51 [https://engineering.razorpay.com/meet-bumblebee-the-multi-agent-ai-architecture-that-changed-fraud-detection-at-razorpay-c2b6d5704f51]

18. Mai 20267 min
Episode Localization-Led Generative AI Product [Udemy] Cover

Localization-Led Generative AI Product [Udemy]

In this episode, we explore how Udemy built a multilingual AI platform to bring its generative AI features to learners around the world. The team approached localization across three levels: a translation-first approach for broad and fast coverage, a fully native multilingual system for markets where fluency and cultural precision are essential, and a hybrid solution in between that intelligently routes between the two depending on the situation For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/udemy-engineering/from-zero-to-hero-localization-led-generative-ai-at-udemy-a422e4f968d4 [https://medium.com/udemy-engineering/from-zero-to-hero-localization-led-generative-ai-at-udemy-a422e4f968d4]

4. Mai 20268 min