Snacks Weekly on Data Science
In this episode, we look at how companies deal with large volumes of unstructured text and why traditional clustering methods often fall short at scale. We explore two LLM-powered approaches shared by data scientists from Microsoft: a bottom-up pipeline that builds structure from data using embeddings and clustering, and a top-down pipeline that starts with LLM-generated categories and refines them recursively into a hierarchy. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/data-science-at-microsoft/from-chaos-to-clarity-building-taxonomies-from-unstructured-text-using-large-language-models-c1303db3adb1 [https://medium.com/data-science-at-microsoft/from-chaos-to-clarity-building-taxonomies-from-unstructured-text-using-large-language-models-c1303db3adb1]
140 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y forma parte de la comunidad de Snacks Weekly on Data Science!