Snacks Weekly on Data Science
In this episode, we look at how companies deal with large volumes of unstructured text and why traditional clustering methods often fall short at scale. We explore two LLM-powered approaches shared by data scientists from Microsoft: a bottom-up pipeline that builds structure from data using embeddings and clustering, and a top-down pipeline that starts with LLM-generated categories and refines them recursively into a hierarchy. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/data-science-at-microsoft/from-chaos-to-clarity-building-taxonomies-from-unstructured-text-using-large-language-models-c1303db3adb1 [https://medium.com/data-science-at-microsoft/from-chaos-to-clarity-building-taxonomies-from-unstructured-text-using-large-language-models-c1303db3adb1]
140 episoder
Kommentarer
0Vær den første til at kommentere
Tilmeld dig nu og bliv en del af Snacks Weekly on Data Science-fællesskabet!