Building Taxonomies with Large Language Models [Microsoft]

8 min · 25 de may de 2026

Descripción

In this episode, we look at how companies deal with large volumes of unstructured text and why traditional clustering methods often fall short at scale. We explore two LLM-powered approaches shared by data scientists from Microsoft: a bottom-up pipeline that builds structure from data using embeddings and clustering, and a top-down pipeline that starts with LLM-generated categories and refines them recursively into a hierarchy. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/data-science-at-microsoft/from-chaos-to-clarity-building-taxonomies-from-unstructured-text-using-large-language-models-c1303db3adb1 [https://medium.com/data-science-at-microsoft/from-chaos-to-clarity-building-taxonomies-from-unstructured-text-using-large-language-models-c1303db3adb1]

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y forma parte de la comunidad de Snacks Weekly on Data Science!

Prueba gratis

Todos los episodios

140 episodios

Building Taxonomies with Large Language Models [Microsoft]

25 de may de 20268 min

Fraud Detection with Multi-Agent AI Architecture [Razorpay]

In this episode, we discuss a classic scaling problem in fraud and risk operations: too much manual review, inconsistent judgments, and growing complexity. We explore the team’s solution, Bumblebee, a multi-agent AI architecture that separates planning, evidence gathering, and analysis into specialized roles, enabling a robust and scalable system to solve the problem. For more details, you can refer to their published tech blog, linked here for your reference: https://engineering.razorpay.com/meet-bumblebee-the-multi-agent-ai-architecture-that-changed-fraud-detection-at-razorpay-c2b6d5704f51 [https://engineering.razorpay.com/meet-bumblebee-the-multi-agent-ai-architecture-that-changed-fraud-detection-at-razorpay-c2b6d5704f51]

18 de may de 20267 min

Hybrid Search for Improved Content Discovery [OLX]

In this episode, we explore how OLX improved discovery by combining keyword search and vector search instead of forcing a choice between the two. Keyword systems remain excellent for precision, while vector systems add semantic understanding. Together, they create a smarter and more user-friendly marketplace experience. For more details, you can refer to their published tech blog, linked here for your reference: https://tech.olx.com/hybrid-search-where-keywords-meet-vectors-enabling-classifieds-discovery-b7c383fe4fc4 [https://tech.olx.com/hybrid-search-where-keywords-meet-vectors-enabling-classifieds-discovery-b7c383fe4fc4]

11 de may de 20267 min

Localization-Led Generative AI Product [Udemy]

In this episode, we explore how Udemy built a multilingual AI platform to bring its generative AI features to learners around the world. The team approached localization across three levels: a translation-first approach for broad and fast coverage, a fully native multilingual system for markets where fluency and cultural precision are essential, and a hybrid solution in between that intelligently routes between the two depending on the situation For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/udemy-engineering/from-zero-to-hero-localization-led-generative-ai-at-udemy-a422e4f968d4 [https://medium.com/udemy-engineering/from-zero-to-hero-localization-led-generative-ai-at-udemy-a422e4f968d4]

4 de may de 20268 min

Ladder of Evidence to Understand Product Effectiveness [Meta]

In this episode, we explore how Meta uses the “Ladder of Evidence” framework to evaluate the effectiveness of new product features. Instead of relying on a single analytical method, this framework helps teams choose the right type of evidence based on real-world constraints, leading to better and more informed product decisions. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@AnalyticsAtMeta/ladder-of-evidence-in-understanding-effectiveness-of-new-products-part-i-ad8dee70906c [https://medium.com/@AnalyticsAtMeta/ladder-of-evidence-in-understanding-effectiveness-of-new-products-part-i-ad8dee70906c]

27 de abr de 20269 min

Building Taxonomies with Large Language Models [Microsoft]

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios