That's Girl Code
In this solo episode of That’s Girl Code, Ellen breaks down Databricks: why it exists, what problem it solved, and how data actually flows through it. She explains how companies got stuck between messy data lakes and expensive data warehouses, and how Databricks’ lakehouse approach keeps data in object storage like S3 while adding reliability and performance through Delta Lake (Parquet files plus a transaction log), powered by Spark for scale and Unity Catalog for governance and lineage. Then she maps real-world ingestion into three patterns—files landing in storage (Auto Loader), SaaS and apps pulling via connectors/APIs, and real-time streams like Kafka/Kinesis (Structured Streaming)—all landing first as Bronze Delta tables, before being cleaned and joined into Silver tables and shaped into Gold tables for dashboards or machine learning. She closes with two concrete examples: a near-real-time retail dashboard combining foot-traffic sensors, transactions, and store metadata, and a customer-segmentation ML workflow using feature engineering, Feature Store, MLflow model tracking, and even vector search to turn clusters into meaningful segment descriptions for marketing.
25 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de That's Girl Code!