The Databricks Data Engineer
You built the table right. Well-partitioned, documented, fast enough that the row count came back before you finished reading your own Slack. Six months later it takes four minutes to return that same count, and nobody on your team ever decided to make it that way. There was no meeting, no design doc, no ticket titled "let's make this unqueryable by Q3." A swamp is not a decision. It's the sum of a few dozen reasonable shortcuts that compound into something nobody would have signed off on if you'd proposed it all at once. Which is why telling people to "be more careful" never fixes it. They were already careful. In this episode: - Why your slowest Delta table isn't slow because the data is big, and what it's actually choking on - The storage-bill surprise that's invisible in every query until the invoice lands - How the most generous thing you do for a blocked teammate quietly destroys whether anyone can trust the table - Why nobody can clean up a swamp where nobody knows what's load-bearing, and the cheapest fix in the whole estate - When you should ignore all of this advice, because over-governing a throwaway table is just a different swamp This episode is for Databricks data engineers staring at the one table everyone groans about, the one that actually matters, wondering how it got like this. Whether you run batch, streaming, or DLT, you'll walk away able to name exactly which kind of rot is filling your worst table and the specific senior counter-move that reverses it. --- Helping 18,000+ Databricks data engineers become seniors: interview like seniors, execute like seniors, think like seniors. Follow The Databricks Data Engineer for new episodes every Monday, Wednesday, and Friday. LinkedIn: linkedin.com/in/jrlasak Newsletter: dataengineer.wiki #DataEngineering #Databricks #DataEngineer #CareerGrowth #ApacheSpark #DeltaLake
9 episoder
Kommentarer
0Vær den første til at kommentere
Tilmeld dig nu og bliv en del af The Databricks Data Engineer-fællesskabet!