Acima Development

Lisää Acima Development

At Acima, we have a large software development team. We wanted to be able to share with the community things we have learned about the development process. We'll share some tech specifics (we do Ruby, Kotlin, Javascript, and Haskell), but also talk a lot about mentoring, communication, hiring, planning, and the other things that make up a lot of the software development process but don't always get talked about enough.

Episode 95: What Do Data Engineers Do?

This episode explores the role of a data engineering team within a company and how it differs from traditional application development. While app developers focus on performance and real-time systems, the data team is responsible for collecting, syncing, and organizing data from many sources into a central warehouse (like Snowflake). Using tools such as Fivetran, data is continuously pulled from dozens of systems and stitched together into a unified view that business users, analysts, and dashboards can actually use. A major challenge discussed is how microservices (great for engineering) create fragmented data that must be carefully reconstructed to tell a complete story, such as the lifecycle of a customer or lease. A large portion of the conversation focuses on “data transformation,” which is the process of turning raw, scattered data into meaningful insights. This involves complex pipelines of queries and scripts that combine, clean, and interpret data across systems. The speakers emphasize that this work is far from simple—it requires deep understanding of both the data and the business context. Done well, it enables decision-making (like tracking revenue trends or customer behavior), but done poorly, it can lead to incorrect conclusions that impact the entire company. They compare transformation to cooking or even building a rocket: the output is fundamentally different from the raw inputs, and small mistakes upstream can cascade into major issues downstream. The group also discusses practical challenges in data modeling, system design, and collaboration between teams. Topics include the tradeoffs of normalization, handling schemas across evolving systems, and frustrations like poorly defined enums or lack of communication when engineers change databases without notifying the data team. Security is another key theme, especially around controlling access to sensitive data (PII) and preventing misuse. Ultimately, the episode highlights that data work sits at the center of the organization: it depends on upstream engineering decisions and directly influences downstream business outcomes, making clear communication, documentation, and thoughtful design essential as systems scale. Transcript: DAVE: Hello and welcome to the Acima Developers Podcast. We've got a fun group today. I've got Eddy. We've got Kyle. We've got Thomas. We've got Mike and Justin. We've got Bill, and we've got Zach. Now, Bill and Zach are infrequent. Bill's our DBA, and Zach is the...what are you? The head of the data team? ZACH: Technically, my title is Senior Manager, Data Architecture and Governance. But that's a fancy way of saying that I am heading up a data engineering team. Yep. DAVE: They made you widen the column size to fit that job title in. ZACH: Yeah, pretty much. DAVE: Yeah. Yeah. So, for people that don't know, I've been at Acima for almost five years, six years. I don't keep track of numbers. I worked in engineering for a couple of years, then I went over to work with Zach on the data team for a year. And then he got rid of me and sent me back to engineering. And I've been back over here for, like, a year and a half now. And I think it's really, really fascinating the different ways the teams work. Like, app dev focuses on latency, and we love to do everything with compute, and we're very scarce with storage. And the data team is kind of the other way around. You've got the great big warehouse. Storage is free. Compute is crucially expensive. It's like, you've got a table that has all the integers in it, and you look them up by ID because you can't calculate anything. That's a joke. But people don't believe me when I tell them you have a day’s table that is literally every day from 1970 forward. We don't want you to calculate the name of the day of the week. Just look it up in the table. We don't want you calculating the first letter of the day of the week. That's a separate column in that table, right? ZACH: Yeah. I don't think that that table was originally built for that reason specifically. I think a lot of people used it for that reason. There's a lot of really good days logic built into, like, Snowflake, Redshift, and all of the warehouses. However, when Acima first started, warehousing was a little bit newer, and so maybe a lot of those functionalities didn't exist. Now it's more like, what's a holiday [laughs]? And that's the main reason we're using that table is, what is a holiday? And that table is not always the most accurate on what a holiday is, either. But it's way more accurate than if we didn't use it [laughs]. And it's a data source that my predecessor exported from somewhere a decade ago and runs all the way through, like, 2060. So, I'll probably never adjust it, you know. It’s just -- DAVE: That was going to be my question, so when do we even run out of days? ZACH: It doesn't matter to me. It'll be long after I've, you know -- EDDY: Is that only taking into account local holidays, or now that you're considering, like, international growth, like, does the table also consider international holidays, or is it only local? ZACH: It's not been updated to consider international holidays. We don't have to do a ton with holidays on the data team. Really, that's going to be on our production systems, right? Like, we are consumers of data. We are not...Well, I mean, we generate data, too, but we're mostly consumers of data. If you look at the flow in, it's mostly data coming in. So, it's really important for, like, LMS to understand what a holiday is in every single country that they're in. Not as important for the data team because the events that should not happen on holidays, there should be no data for because they didn't happen, right? But no, I've not expanded that table for, like, Mexico or Canada or any other country. It's just U.S. And even then, like I said, it's not fully accurate. DAVE: I remember when I started here, we had no plans to go outside. We were just U.S. company, and so don't worry about it. And businesses pivot and grow. Zach, I got a question for you. I jumped straight into some detail, but I don't think a lot of people know what a data team does. We were talking about this in the pre-call. Like, the DBA does the architecture, but you guys...you said CrossFit. I work on Merchant Portal. My job is to help keep the merchants happy so that they can give leases to customers and get the product out the door. That's an application database written in Postgres. Where does my data go after, you know, like, every night, what happens to my data? What do you do with it, and who do you give it to, and what do they do with it? ZACH: Yeah, so that's a loaded question. Every 15 minutes, it syncs to the warehouse. We use tooling for that. That tooling is Fivetran. They're a great company. They have a bunch of people like me and smarter than me focusing on just, how do we sync data from data source to Snowflake or Redshift or a data destination, basically? So, it's the best way, in my opinion, to sync it. We used to have an in-house solution. It would miss data. We didn’t focus on it a lot because we have a bunch of other stuff. So, now it syncs into the warehouse. And especially in a system of microservices, which I know are great for software engineers, they're terrible for data engineers because the next piece of the puzzle is I have to stitch all that data together. A lease record, for instance, or really any record, is not going to be wholly in one service. So, now I need to create transformation tables so that our business users, our end users, our BI analysts, and the people viewing their dashboards can see the holistic view of the lease. Because, as you know, there's a certain point where Merchant Portal just doesn't care about it anymore, and it moves on to LMS. And then LMS doesn't necessarily care about all the nitty-gritty of what's happening behind the scenes in all the other microservices for, like, payments or anything like that. So, we really become the place where we're stitching that together. In the last count I had, I think there's 68 Postgres databases syncing into the warehouse today. DAVE: Wow. ZACH: We do not care about all of them [chuckles], to be frank. We do care about around 30 of them, and we use them for transformations. And then there's a bunch of just, like, batching, right? Like, I don't want, and you guys don't want, nobody wants the production customer-facing services spinning up jobs in the middle of the night to grab thousands or hundreds of thousands of records to throw them in a CSV and shoot them off to, like, a company that needs that information, right? Like a third-party company, maybe that we integrate with. And so, the last time I recorded, there was something like 50 third-party integrations that we're also handling. That data will go into those companies; data's coming out of those companies. Maybe the data goes into those companies in real-time events through the production consumer-facing services, but I am siphoning them into the warehouse so we can start to see, like, is this third-party company worth using? What is the effects that we are having here? Or maybe those companies are enriching our data, and then we look at that on the back end, and we let that adjust business decisions. And so, all that's got to come together in a singular place. And it's a lot. Like, the last time I checked, it’s...I keep saying, “Last time I checked,” I don't watch this like a hawk. But we had, like, 13 and a half thousand tables in the warehouse. So... EDDY: So, Zach, you mentioned something interesting, and I kind of want to elaborate a little bit. So, you said you have about 60-plus tables that have data, but you only care about half of them. What's the point of us -- ZACH: 68 schemas. So, like, Merchant Portal is a schema. Merchant Portal has, like, 218 tables. I care about those 218 tables, right, or however number it is. EDDY: What’s the point of, like, writing into a warehouse if you don't care about that data? Like, what's the benefit of even though you don't care about it, it’s still valuable to receive? ZACH: Yeah, so there's a couple of things. Like, when I say I don't care about it, I'm not running transformations on it. It's not being used for business. DAVE: So, you want the data, but you don't have to mess with it. ZACH: Yeah, I'm a data engineer at heart, which makes me a data hoarder. I want all the data [laughter]. I want every last scrap of the data. However, a huge use case that we did not have until moving to Snowflake is now we have a place where the software engineers can go in and look at the data in a 15-minute lag and start debugging, right? Like, think of console access to production. It's insanely limited, and it should be, and most people shouldn't have it. But now you can get a user inside of Snowflake, and I will let you see the production data in a 15-minute lag for debugging purposes. And that's massively huge, even for all those schemas that I'm not transforming on and the business doesn't want to see. JUSTIN: So, I just want to give my two cents on this from a security point of view. I have a colleague whose name is Dan Hamilton. He said, “Data is the most...” well, let me rephrase that. PII data is the most toxic data that you can have in a system. So, anytime that you're, like, propagating that, whether it's to Snowflake or to any of those other systems, it's something that you got to think about in terms of who has access and how long they have access, and is it auditable, and everything else like that. So, it's an interesting point of view because data is awesome, but data is also, you know, it's what makes a company valuable. And if that data gets exfiltrated, that's something you've got to be concerned about. Unfortunately, I've got to drop. But something that's, like, bread and butter for me every day is just like, hey, who's playing around with data? Who has access? And are there ways that it could be exfiltrated? And so, you've just got to keep an eye on that, so... ZACH: Thanks, Justin. DAVE: Very cool. JUSTIN: Thanks, guys. DAVE: Thanks, man. Take care. ZACH: To expand on that real fast before we move on, that's an argument that I have a lot here, and that's why the structure is the way that it is for the teams here that are used to it. Mike, I ran this past you, right? Like, the way for security for data is limitation, right? And everybody wants access to more. MIKE: Yes. ZACH: And you have to draw a line somewhere. You can't just give everybody access to everything. And so, we have those lines drawn here, and we stick to those lines. Not everybody likes it, but it's what you have to do to try to keep your data safe, so... MIKE: Well, that's an interesting point. I have access to some of the raw, untransformed data, but not necessarily other transformed data. And sometimes people from the BI team will say, "Oh yeah, go look at this table." Like, well, no, I don't have that one. But we can usually work things out. I, about a week ago, was helping debug something and was pulling in data from three different databases, you know, from different systems, logging from mobile app, and stuff from Merchant Portal, and over from our contract funding, and tying it all together in this amalgamous stuff, which ended up being crazy helpful, and the mobile team needed that. So, I had enough. But, you know, I think it's the right choice. Keeping the privileges limited, sure, it's a pain. But you know what's even more painful? Is giving somebody privilege they really shouldn't have it and having them abuse it. ZACH: Exactly. EDDY: It’s actually -- DAVE: We base this not on the value of getting it right, but on the price of getting it wrong, right? EDDY: I was going to say...I'm so sorry. DAVE: It’s all right. EDDY: I was going to say it's actually made my life a little easier because I used to have access to even tables from other teams, right, from where I worked on. And so, when that got presented and said, “You're only going to be given access to the immediate team that you're working on, and that's it,” it was kind of bittersweet. I'm like, well, that sucks. Like, I want to be able to look at other data, and it makes my job easier. What actually made it easier was me saying, "I don't have access to that data. Give it to me, [laughs]” and then we'll figure it out later. And so, it ended up being, like, a blessing in disguise in a sense, where I'm just like, well, now that I don't have access to the data that you're asking for, I could just punt and say, "Hey, ask this person. Once that person gives it to me, then I'll answer your question." But -- ZACH: And you can do it that way. The other thing is, like, there's a certain level and above that has this elevated access that Mike's talking about. And that was a lot of pushback that I think we got. "Well, there's going to be a bottleneck." Well, I haven't seen that be the case, actually, right? There are people on your team before you get to Mike that can do those cross queries. You just happen to not be one of them. BILL: Zach mentioned earlier that he has to stitch together data from a number of systems just to be able to compose a whole picture of certain entities, like a customer. We were talking about that the other day, how one of the guiding principles I teach in my modeling classes is that duplication is evil. Try to avoid it at all costs unless you absolutely have to. And, unfortunately, microservices encourage duplication a lot. And there are times when I really miss monolithic systems. If you needed to debug something, it was all in one place. You could stitch together. You didn't have to wait for data to sync. It was just there. But, obviously, there’s some benefits to some microservices as well. You mentioned CrossFit earlier. I'm thinking data engineers are more like craftsmen, plumbers, and chefs. ZACH: We had a member on the team that wanted to change our team name to The Data Plumbers because he thought about, like, the pipelines that you're putting together. Some of the team wanted to be Data Wranglers, and that was outvoted from Data Plumbers [chuckles]. I'd say CrossFit with data because that was a popular thing when I started becoming a data engineer. And it makes sense, right? We pick up data here. We put it down over here. The thing I didn't get into with a lot of people, especially the non-technical people, is all the transforming and the difficulty that comes behind that, right? Like, you're working inside of a software application, and you're working with row-level data. You just have to know that you're working with this customer maybe, and this item, and that's what matters there. You get into, like, data engineering, well, I might be writing a query that affects millions of people, millions of items. And I need it to be extremely performant because I can't be running 18-hour queries against the warehouse. There are people that do that [laughs]. And so, then I have to also work with them on how to not do that. But yeah, so, it really becomes, like, an idea of understanding the compute, how the memory on that compute works, how to narrow down your scope as much as possible. And when you do narrow it down, you know, there's window functions. There's a bunch of compute options on data that could slow you down. And how do you effectively do that? And then just understanding because, like, when we talk about warehouses or compute, right, it's actually a cluster of machines, and they all have their own different tasks. So, like, having an understanding of that and how your data flows through those is extremely helpful, too, not entirely necessary. You can do a lot of damage on a warehouse without knowing that and still be just fine, but it helps to understand how all those flows happen. DAVE: That's actually a good difference between app and dev, or engineering and data, is that on the application side, the thing we never want to see is a query go out without a limit. Like, we don't want you to say, you know, "Select first name from applicants semicolon" like, that's going to burn the whole freaking table from top to bottom, like, all the way down. And then I got to the data team and, like, Casey, who worked...is he still over there? He [inaudible 16:54] the whole team. ZACH: Yeah. So, he left quite a while ago. He's back again working with Rob. So... DAVE: Awesome. Very, very sharp guy. But I remember him sitting us down and saying, "Please don't ever do select star from table, even limit one, because every column is in a different server, and you just spun up the entire data center to get one row of data." ZACH: Yeah, it's not really a different server. Like, think of a disc, right? And I know we're on SSDs now, and those are awesome. But things are still stored in different places on them, and you have to go find them, right? But think of a spinning disc. And if you think of a spinning disc and you think of, like, a Postgres system, or a MySQL system, or these row-level systems, one file on that disc is that entire row. So, when you do "select star from table where ID equals 10," it only has to go one place on that disc. But if you do that to, like, one of my transformation tables that has 250 records, it has to go find all 250 files, count down x amount of numbers so that they match across those 250 files, and then stitch that back together because it's all columnar instead of row-level, right? And that's why it can be really fast when you do summarizations because you go to one place, find that file, and then sum it, right? Or even when you limit it a little bit, you go find three different files, figure out the line numbers you care about, pull them out from the other two files, and then summarize that, do your group-bys or whatever. So, those operations are really fast, where those same operations on, like, a row-level system are really slow because now you’re the opposite. You've got to go find all these row-level files, and then pull the right column out of it, right? And that's why warehouses are incredible for, like, analytics, but you wouldn't want to point any of your applications at the warehouse, at least not unless you're paying for Snowflake's...they've got this new thing; it's pretty cool. They'll store all the data in the table, right, and you can point your application to it, and it's row-level data. I read something about it. I don't know where it's at, but it's kind of a cool little idea. DAVE: I think I checked will it fit in RAM a couple of weeks ago, and I think they're up to, like, 128 terabytes now will fit in RAM. It's not cheap, but we could make it go. BILL: How many of you are aware that Snowflake doesn't even have indexes, well, not the ones that we're used to? DAVE: I just figured it was magic. BILL: [chuckles] It looks like it. DAVE: So, when I was on the data team, what I discovered is, you can do, like, a 75-table join, and it will come back in, like, two and a half seconds. And you can say, "select first name from an applicant, limit one," and it takes two and a half seconds because it's got to go through all the military-grade, weapons-grade query planning. How do I distribute? Oh, just one. And then once it's done all that selection, then, oh yeah, here's your data. That was [inaudible 19:52] to bring one piece of data, one teaspoon over. But when you say it's not indexed, is that because the data's organized, like, almost, like...I’m going to say physically, but you know what I mean, like the spinning disc, like, partitioned out differently to be pre-indexed? BILL: That was the teaser. I was hoping Zach was going to expound on that. DAVE: Oh, dang it. ZACH: Sorry, what was I expounding on? I was looking up and fact-checking myself, trying to find [laughter] the row-level thing that I had mentioned, and I can't find it. So, maybe I dreamed that, but –- BILL: Yeah, I teased the audience with the –- MIKE: He mentioned that Snowflake wasn’t indexed. Yeah, go ahead. BILL: I was teasing the audience with the factoid that, in Snowflake, you don't have to worry about designing indexes for your tables. ZACH: Yeah, no, I was on a call with them one time, and they said they probably do it better automatically than we will. At Redshift, you had to do compound indexes, sort keys. Actually, they weren't really indexes; they were sort keys, right? You can put indexes, like, you can do it if you need to. And we've found a couple of tables that probably make sense for us to figure out what we would rather have it sorted by. And they're not necessarily considered, like, it's not like, "create index" inside of a warehouse. It's like, "sort it by this," because then when you query it by that, so you sort it by date, and you have, like, thousands of dates in there, and you're just looking for these six months, then they're all going to be in the same area of the file. And it gets an idea of where that's going to be. So, they're more like sort keys, and you can do it in Snowflake. It's just that we don't at all. BILL: In Oracle and Postgres, that same sort of thing is called a cluster, where the data is ordered and clustered really close together. ZACH: Yeah. And the other thing, Bill, that I, while I wasn't paying a whole lot of attention, I thought you were mentioning is, like, primary indexes, right? Like, how in a Postgres system you do a primary key, and it's, like, an incrementing number, and you can't duplicate that. Snowflake does not support that either. I could do that, and it could increment. But let’s say I add 1, 2, 3, well, I could go enter 2 back in there, and it doesn't care. It does not enforce those. BILL: [inaudible 22:03] integrity and primary key integrity and -- EDDY: I'm so glad you guys are the ones that have to deal with data and not me [laughs]. ZACH: And if you go and look through a lot of our tables, our primary keys are actually multiple columns, right? A lot of times, our primary keys are not just one column, like an ID column. Our primary key will be, like, lease number, date, and then something else that makes that table unique. And we enforce that through code. EDDY: So, Zach, I've actually wanted to ask you something really interesting. What are some of your biggest pet peeves that we engineers do that really pisses you off that you wish you could change, but we're so fine-tuned doing our own thing, you know, that it's kind of fighting an uphill battle? You basically are, like, throwing the table and being like, "I'll just work around whatever you guys are doing." ZACH: I think the biggest one for me is Ruby on Rails has an enum system, right? And this doesn't get used a lot anymore because I fought [laughs] these battles with software engineers. But it just puts numbers in the database, and then the references to what it actually is is only in the code. I'm not a Ruby engineer, and I don't want to go look through 68 different repos to figure out what all these numbers mean. And I don't want to manage a table that maps that for me because when a new number comes along, and I'm not told about it, I don't know what it is. And so, that would be, like, my biggest pet peeve. And it's not just Ruby on Rails that does it. It's every single ORM has some sort of functionality like that. But, like, Django and Python would do it, too. But you could specify, like, string, string for your enum instead of, like, it being a number, and then the string is only relevant in the application itself. I would say that's, by far, my biggest one that frustrates me when I'm in the warehouse. DAVE: Yeah. Well, and, to be clear, like, the BI team, the business guys, come over to you, and they say, "Give me all the leases that have this type." So, they're actually asking you to actionably query on those numbers, right? If those enums were just in that database, you wouldn't care; it wouldn't matter. But you're actually being asked to make intelligent decisions off of those enums, and we'd much rather have an enum table with a foreign key at that point, right? ZACH: Yeah. Correct, yeah. Like, if you're going to go that route, then in the source system, have a foreign key to an enum table, and I'm fine with that. But since I don't end up with that data at all, because it's just in the codebase, then it creates a need for us to create these transformation tables so that people downstream from me, which there is a lot, right, the whole business is downstream from me. I'm downstream from all the software engineers and all of our third parties, and then there's more downstream from me that is actioning on this data. And so, it causes us to have to do a lot of, like, transformation tables just to make the data legible. DAVE: We had two tables that had enums that they were effectively the same enum, but one of them started at one, and one of them started at zero. And it was the same three fields: 1, 2, 3 and 0, 1, 2. And there was some parking lot therapy [laughter] where we cornered an engineer, and we explained some things. MIKE: One thing that...you keep on talking about transformation. And I want to call out we don't want to undersell, "Oh, you're just transforming data. What's the big deal?" I was thinking, if you want to make a rocket engine, well, you just start with some rocks and transform them, right? And you get a rocket engine. That shouldn't be that big a deal, right? You just start with your ore, melt it down, go through some processing. You can build a rocket engine. Well, that's just transformation [chuckles]. ZACH: Yeah, that's a good call out, right? Because, like, I feel like, and maybe if there's any other data engineers listening, or data analysts, or data people, right, like, “Oh, it's just pulling data,” and it’s like, it’s not. It's understanding the requirements of what you want because the hardest part about data is you could have all the right data and make all the wrong decisions if you don't understand it, right? Or if you put it together wrong. And I was just talking to an analyst today, and he was like, "Yeah, well, people don't understand. It's like, 90% of the job is just making sure it's right and that you've got the right metrics so that the company actions correctly.” And it's the same thing with, like, these transformations, right? Something goes wrong in the transformation upstream where we are, everything downstream is broken. The decisions made are no longer good. Or maybe a happy accident happens, and they're great [laughs]. It could go either way, I guess. But you're right, like, transforming the data, it's not a simple thing. It just sounds simple because we go high-level when we talk about it. EDDY: So, what do you mean by transforming data? Like, I understand. For, someone who's listening in to this and doesn't have a concept of transforming data, what do you mean by that? ZACH: Yeah. So, we have multiple sources that a customer can get into our system, right? We have partners. We have a mobile app. We have a website. We have emails that get sent out. We have all these different things. I don't know if you guys are aware of this, but our consumer-facing systems are very bad at telling me where a customer's coming from. And so, one of the transformations I do is this massive statement where I'm checking across six to seven different systems just trying to figure out where did we get this lease from, right? And that would be, like, a transformation. And those are hard, not only because of the logic that's involved, right, which any programmer is going to understand that logic can be hard. But, like, you have to have a serious understanding of that data, right? So, you can't just say, "Oh, well, we're just going to plug this big case statement in," or "We're going to do this summarization here." You have to understand what that data is, or else we would be telling everybody the wrong origination. Another good example of that is there's a very complicated functionality that we have. I won't go into a lot of detail over it, but it essentially has to check every record for every single day that it's open and, like, go in a very specific order because things are changing, and it has to recalculate it, right? And not only does it take a long time, it's one of those ones that needs fixing, but it's extremely complicated and uses a ton of window functions. So, you have to realize that, like, when you're selecting this, you're actually talking about the row behind it, or the row in front of it, or we're summarizing up until this point, or, you know, there's some complication into that as well. DAVE: That's awesome. So, related to transformations, I remember we have a bunch of tables in the warehouse that start with MP, and that's the Merchant Portal side, the data that came from there. We also have an f leases table, right, that's, like, is that aggregated? I know it's got way more stuff on it than we have over in Merchant Portal. Is that just a combination, or a transformation, or both? ZACH: Both. So, that table is the way that we can allow our data scientists and our business intelligence people to see what a lease looks like across all of our systems that are important to a lease, right? And so, it's also got that functionality that I was talking about, like, where did this lease originate from, right? So, there’s those transformations in there. And then there's a lot of like, well, okay, Merchant Portal knows until this point, and LMS knows after this point, and, you know, these other systems over here know a couple of other things. Let's put them all in one place so that we can look at this new, transformed leases table, and say, oh, this is everything we know about this lease. To an extent, right, there are some tables that that joins to that helps fill in some gaps. But, yeah, it's really just the merging of all the microservices, which is why in the beginning of this, I said microservices are great for software engineers, but they suck for data. Luckily, here we have a really good global identification system. I've seen places that don't, and then it gets even harder to get this data together. So, it's easier here than it might be in some other places. DAVE: It gets fun when you've got a record that has a proxy key that's just your integer primary key auto-increment, right, and a GUId, and a public-facing one because we don't want a customer writing down a 64-byte, you know, token thing, and then something else for, like...we've got a table that's got, like, four different IDs, and it's not stupid. Like, there's a different role for each of those IDs. MIKE: You’re talking -- ZACH: Yeah, there ain’t much more to comment on that one [laughs], so I got –- DAVE: Okay. [inaudible 31:18] Is that a question? MIKE: Eddy, you were talking about transformations, like, what are they? I was thinking about cooking. When you're cooking, you combine the ingredients. You can look at the recipe and say, "Oh, well, I'm just combining these things." But what comes out the other end is fundamentally different in character than what went in. Like, sometimes you combine things, and you get something. Well, you say, "It's just made of these things." And chemically, that's true, right? It's just made of those parts. The outcome, you know, some eggs and flour or whatever, you know, having a cake come out, a cake is a different thing than just a pile of eggs and flour. The combination actually matters. And I think that when you're thinking about that data, the putting things together and maybe performing some operations on them, mathematical things, you know, some summing, some averaging, you're going to get something out the other end that is fundamentally different in character than what you started with. Zach keeps talking about making decisions. I can look at a list of records. I can't make a decision with that. There's no way I can look at a bunch of tables of records, you know, think about them as just a bunch of spreadsheets, then say, "Oh yeah, I've got lists of customers, and I've got a list of leases." I can't make any business decisions off that. That tells me nothing. But if you do the right processing out of there, you can see, "Oh, our revenue is going up, or our revenue is going down, and it's because of this thing over here that changed." And that is fundamentally different, even though it starts from the same place, right? You're starting with those ingredients. What comes out the other end really is a fundamentally different thing. And I think that it's important to recognize that. You think, “Well, yeah, I mean, I'm just changing it a little bit. I'm just combining stuff. Does that really make a big difference?" Well, yeah. If you're thinking about that cooking, you know, a cake really is different than what went into it. Likewise, here where you're doing even more steps, being able to make a key business decision based on some limited numbers is fundamentally different and a critical business function that's completely impossible with what you started with. And it's not a simple step between those. You probably have 50 steps between those in some cases. ZACH: Yeah, I was going to say, and to follow that up is like, we're not talking about, like, oh yeah, a script runs, and there are some transformations, and now you have f leases, the table that David was talking about. What ends up happening is you have 15 to 20 scripts run, and then you get f leases, and then I need 15 to 20 scripts after that to make more actionable. And they all build off of each other, and there’s all dependencies on these tables, right? So, it is, it’s a pipeline. You have to think of it as a pipeline, and each step in this pipeline is a script or SQL that's building the next thing that might come into this next table or give us more insights, right? So, -- BILL: So, I really like the chef comparison earlier. Because, like you were saying...I know you said CrossFit, and I think that's a great one as well. But, for me, I think almost like of culinary arts, right? The structured alignment of these different resources coming together, kind of like what you're saying, Mike. But then also it's an art, right? Because it's presentable. It's got to be presentable to a person that might not understand the basics of data or something, you know. They're able to pull it, access it, and still be able to analyze and acknowledge what that data houses, you know, just kind of, like, in layman's terms, I guess. DAVE: And if you're getting data from me, when I was on the data team, it was omakase. It was a surprise, and you got what the chef gave you [laughter]. EDDY: You know, one of the things that kind of rings the bell when I was asking you what's the biggest gripes that a software engineer does that really, like, rubs you the wrong way, and I sort of answered my own question, but I kind of paused because I wanted to see what your biggest gripe was. But I want to challenge that a little bit, and I want to ask you if this is something that maybe infuriates you even more than dealing with enums in a database. You ready? Having software engineers, right, treating a database at the application detail and not as a shared contract, right? So, let's say, for example, we go in there and manipulate our own schema, our column names, right? We drop tables, and we just don't tell you about it [laughs]. We just don’t tell you about it. Like, suddenly, right, I'm assuming, right, that that has some detrimental side effects on your team, right, because we didn't delegate any of those ones. ZACH: That is accurate. That's also something that we've worked on here since I've taken over the data team. I've worked on getting closer with Mike and the other engineering directors and working top-down like, "This is our new process." Everybody here, the GitHub auto-assigner puts Bill and either Ricky or Kim as approvers, right? That’s our way past that. So, like, if we went back to that world, Eddy, where I woke up, and nothing that I wanted to run, and DevOps was reaching out to me saying, "Hey, you're taking down Merchant Portal," yeah, that is my biggest gripe. But we are multi-years removed from that at this point, so it's not my biggest gripe anymore. It's pretty well solved. We've had a couple of issues recently; we put in some more stuff to get past that. And really, that is a lack of communication, right, and is what that boils down to. So, we've bridged that gap very well here at Acima. So... DAVE: If I recall, Casey, or maybe it was Casey, somebody early on...this blew my mind when I came. Because I'm like, yeah, that was my question too, Eddy, when I went over to the data team. I'm like, I don't see you guys doing anything with our migrations, and I know we're migrating the database every single day. And Casey was like, "Eh, it's just a Tuesday for us." And in the list of reports that run every night, one of them is "Go deal with all the schema migrations and just update the warehouse,” and down the road you go. ZACH: Yeah, we've come a long way. The other thing that helps us out with those a lot is Fivetran. Fivetran is non-destructive, our homegrown solution that we had. Back when David came to join the team and help us move to Snowflake, it would break it. You dropped a column, it would break it. You updated an entire table with, like, a backfill, I'd take your system down on accident without even meaning to [chuckles]. And then we moved to Fivetran. DAVE: Sometimes you meant to. ZACH: [laughs] Nope. You won't get me to admit to that, ever. DAVE: You never meant to, but sometimes you didn't feel too bad [laughs]. ZACH: But Fivetran is very...it’s not destructive. You drop a column. I don't drop a column, which can be hurtful in another way, right? If you guys were to drop a column or stop writing to a column, and I didn't know we stopped writing to the column, and I was transforming off of that column, well, now you could have just made 37 tables have a null feature for no reason and break some reporting. And then I have to hear about that from the business, and it's my fault, you know [laughs], and so... And it's never, as everybody here probably knows, it's never a good feeling when somebody off of your team comes to tell you about issues on your team. DAVE: Yeah, I remember one of the cool things about having worked in data and then going back is, we had a thing where we had some tables where it's like, oh, we just need a phone number, just stick it on, right? This is how databases go straight to second normal form, right? Oh, now we need a work phone; now we need a cell phone. And we let it get out of hand, right? And so, we had, like, 11 tables that had phone numbers on them and three different kinds. All right, we need a phone numbers table. And that came through, and I was looking at this, and I'm like, okay, we can build a table. We'll export it. And this is going to take a while to get everything off. So, we're going to do triggers that go back and forth, Rails triggers after, you know, after hooks on the code. If you update this one, we update the master. You update this one; we update the outward record. Okay, great. And then I put a note in the ticket: go talk to the data team because they have reports that go off of this table, and if we stop writing to this, they're going to be very upset. And I remember talking with Casey, him tapping me on the shoulder and saying, “The dev team are changing the encryption keys, and we need to be able to decrypt this information.” And I said, “Okay, how soon do we need this?” And then I said, “Wait, let me guess: they've already changed it, and we can't decrypt data and give it to the call center.” And Casey said, “Yup.” And I'm like, yeah, so I got to go back to engineering and yell at Adam and say, “Okay, what happened?” And he thought he had communicated it, and it just...yeah. So... ZACH: Yeah, I remember that because there was a lot of late nights and tagging another software engineer who was very smart with encryption. Because it's not just that, like, we changed the encryption algorithm, right? We have to convert what Rails is doing to Python. DAVE: To Python. ZACH: And understand what it's doing under the hood so that we can recreate it. And we've had a lot of problems with that in the past, that being one of them, and from one of the systems that's the most important. But going back to your second normal form, I found a table one time, and I got a lot of pushback about changing it, but it was essentially...and, Bill, you might have been working here at that point, but it was tokenization, right? And it was, like, a company name token, company name tokenization at. And then there was a column like that for every single one of the companies that we've ever used for tokenization, and we were adding another one. And so, there’s this, like, 10 columns, and I'm like, what are we doing? This is horrible data architecture. And we wouldn't even be needing to make this migration at all if we would have just set it up properly, right? Like, just get a tokenization table that links back to this other record and then make it very dynamic. And so, that was, Eddy, to your question, too, another frustrating experience because I was completely ignored on that one and two more columns got added to the table, and who knows how many since then, so... DAVE: Now I want to go look [laughs]. EDDY: Well, I don't have the access to, unless it was [inaudible 41:45] DAVE: Not fair [laughs]. EDDY: [laughs]. You were talking a little bit about, like, phone numbers’ table, Dave, and it got me thinking. I guess it's really easy to kind of just think, oh man, if multiple tables can have a name column, why not just create polymorphic tables, you know, with ownerships, you know, and then just shove everything that can be polymorphic be polymorphic? So, where do you draw that line, right? So, for example, phone numbers, you can have a phone numbers table; email, you can have an emails table, right? Address, you can have an address table, et cetera. But, like, I'm assuming you don't want that for name maybe, right? Or do you want that for date of birth, for example, et cetera? Like, is the default always...if multiple tables can share the same data, does it just make sense to always make it polymorphic? Where do you draw that line, you know, even if you are repeating yourself in multiple tables? ZACH: I’ll do the simple answer, and then let Bill come in with the more complicated [chuckles] answer if he wants to correct what I say. Phone numbers make sense. You, Eddy, can have multiple phone numbers. That is a one-to-many relationship. But you, Eddy, are one person, so, like, you have a date of birth. You have all these facts about you that sit on your customer record, but you could have multiple phone numbers. And so, you put that into a secondary table, and you just match back. And you can have multiple emails. You can have multiple bank accounts. You can have multiples. So, when you could have multiple things, that's when I would do that because when you start finding yourself doing things, like I was saying, or underscore one, underscore two, underscore three, that needs to go somewhere. And Bill's going to have probably a better explanation than that, but that's where my idea was at, yeah. DAVE: Like, how far to go down, right? Like, the extreme case to be like, should we have a first names table and select, you know, like, Bob belongs to these three applicants and you just, you know, first name...Is that the logical conclusion, Eddy? BILL: That’s it. DAVE: Of, like, way too far? How much is too much? This has become a form joke of, like [crosstalk 44:05] ID. BILL: After you've done it enough, you just get a feel for it. The example you just offered, that would be one of those times where you're like, this is just ridiculous. This is, like, fourth, fifth normal form. No [laughs], going too far. If you have a repeating attribute, like Zach was talking about, like multiple types of emails, multiple types of phones for a given person, that's pretty simple; you normally stick that in a child table. But you were talking about polymorphism, a single record being able to represent multiple types of things, which you frequently find in, like, event tables and whatnot, where different sorts of things can be stored in that same table. That's usually about the only place I use polymorphism. It is a case-by-case basis. It's mostly art and less science. I actually don't have a really good answer for that, when to use polymorphism. I almost never use it. I'm actually surprised at how often we use it here. DAVE: I might have a good follow-up, then. So, the way to know the right answer is experience, and what is it? Good judgment is how you get experience, or the other way around: experience comes from good judgment. Judgment comes from...you know the quote, right? Experience comes from bad judgment; that’s what I was saying. What does it feel like when you burn your hand on the stove, when you have over-polymorphized or over-normalized your form? BILL: Nobody likes to work with your schema. Developers hate it. Now, in general -- DAVE: Mike, I think I may have over-normalized my form. BILL: [laughs] DAVE: [inaudible 45:31] of my data. BILL: I have found that developers have a...you asked earlier one thing that is a pet peeve of ours. Mine is that developers have an unnatural fear of joins. If the data model is well-modeled and solid and doesn't go beyond third normal form, a relational database loves that. And I've had tables with billions of rows, and joining them is not a big deal, sub-second response time. So, that’s something. I wish developers would not fear joins. That's somehow related to what we are talking about, and I have since lost my train of thought. DAVE: It's all good. I think -- MIKE: I had a thought about the normalization. A phone number has a defined structure. It's an entity with clearly defined structure where that internal structure matters, right? Like, you could conceivably have a phone number type in the database even, right? And I'm sure some databases probably implement that. There are probably some telecom [laughs] companies that very much do have a phone number type in their database. Likewise with an email address, right? It's an entity with a clearly defined type. Whereas a first name, it's just a string. There is no internal structure. There's no expected internal structure. In fact, it varies across cultures. It varies in language. You really, really don't want to impose structure on it because that would be a really bad idea. It's important that you recognize that as just a string. Also, the number of them is unbounded. You can have arbitrary strings there, right? I mean, you might truncate it at the end if you have something ridiculous but, you know, it's just arbitrary data. I feel like that's fundamentally different in character than the other things we've talked about. An address is something, you know, it's its own...it's got its own little schema, right? An address is a thing that has a clear definition that represents a concept. Now, a first name does represent a concept, right? But it’s not in and of itself anything other than just a string, right? It is just a blob of text, no different than any other paragraph, right? And somebody probably has done something ridiculous by putting a whole paragraph as their first name. And [chuckles] that's perfectly legitimate for that, which is different than the kind of thing we're talking about with an address. There's a meaning to the address in the way that there's not on that first name. Not that first names aren't important, not that they don't have meaning within, you know, cultural meaning, but they don't have a meaning in terms of the data in that respect, other than it’s just a string that’s an identifier. And -- BILL: A little [inaudible 48:06] of a thought that I have to add to that. MIKE: Please. BILL: Sometimes the decision about how far to go in normalization and going crazy with your data modeling depends on the business context. My first eight years of my career was spent at telecommunications companies. And there, a phone number had to be split out. So, you had separate fields for the international code, the area code, the exchange, and then the line number. But at most companies, you don't need that. There’s no reason. So, sometimes that’s the answer. What is the business –- MIKE: And that makes the phone number...And you just answered my question, like, yes, it does exist, right [chuckles]? It does matter on the business context. And now that you mention it, I bet that if you were working for a company that was doing, like, genealogical work, like ancestral stuff, then maybe there are some last names, for example, where you might actually care a lot, and you might care about normalizing those. Like, you might want to represent some of those as special, if there are some high-frequency ones. I haven't really thought about this. I’m just talking [crosstalk 49:10] BILL: There were some hard lessons I had to learn when I worked for MYFaith [SP] for 11 years because they operate in 281 countries. And I bet this is found in the link that Dave just shared there in the chat. But there were some things I did not know about names in certain parts of the world. Like, some countries, you have a single name. It's not a surname. It’s not a first name; it's just your name. And we had modeled our data to be very Western-centric. It expected you to have a first and a last. There's all sorts of fun stuff that you can run into when you’re modeling. DAVE: For those listening at home, you can Google "Falsehoods Programmers Believe About Names," and it's a list of, like, shocking things that you believe: oh, they'll fit within 30 characters. Oh, they'll fit within 50 characters. Oh, they'll fit in ASCII. Oh, they'll fit in Unicode. BILL: [laughs] DAVE: People have names at birth. People have names within a year of birth. People have names within five years of birth. That is not always true. Like, again, you're getting into a pretty esoteric data set at that point. But yeah [crosstalk 50:12] people have names. ZACH: Yeah [laughs]. I was going to say, that's good. DAVE: The author got challenged on that. He said, "Oh, come on, show me an example where people have names, where it's a large data set." And he said, "Cataloging mass graves." And I'm like, ooh. Yep. BILL: And that's one of the things I love the most about data modeling is using experience and knowledge like this to anticipate problems and avoid them in the initial stages of design. ZACH: Yeah. And the cool thing about it is you want to avoid them all. So, you're always learning [laughs], and there's always going to be something that you didn't expect, some user input. It's like a video on LinkedIn, right, where it says, like, "Programmer watching QA," and it's, like, one of those boxes with the different shapes. And they're like, "Where does the square go?" "Yeah, in the square hole." She’s like, “Yeah.” And then it’s like, "Where does the circle go? That’s right, in the square hole." They’re like, "No [laughs]." Especially if you work for a company like this with a lot of user-inputted data, like, you have to be careful with that. DAVE: SQLite. Let me finish on this real quick, Eddy. SQLite, I discovered this this week: everything is a square hole in SQLite. SQLite uses a variant type underneath the hood, and it uses data affinity to determine what type it is. And it literally does not care in the schema what column type you declare. I literally tested this; you can try this at home. Create table test, open parenthensis, ID as banana or ID [inaudible 51:48]...ID banana comma name banana phone number banana, and then insert into it a number and a string and some other, you know, whatever you want. And when you select it, it will come back in that type. My faith as a programmer is broken. Nothing makes sense anymore [laughter]. EDDY: Well, like, and mobile apps use SQLite? So, I can only imagine on, like, how detrimental [laughs] that can really be. So -- DAVE: Sorry, I cut you off a minute ago, Eddy. EDDY: Oh no, I wanted to say something, but I also didn't want to cut anyone else off. I wanted to kind of expand a little bit because I'm actually really curious. When I first started and I started to really understand, you know, like, data modeling and data types, you know, and, like, non-nullables, and, you know, and constraints and all this stuff, right, my default thought at one point, and I know the answer to this, but I want to ask it just because I want to see you guys’ reaction. I was just going to say, like, why don't we just store everything as varchar, right, just to be safe, you know? And that way, you don't have to worry about what data they send you, you know, and you can just now worry about schemas. And why is that bad, I guess? DAVE: SQLite's saying, "Preach it, brother [laughter]." ZACH: Yeah, yeah, Matt, you do, and I give you crap about it all the time. And your response is always, "Oh, this was just for me.” And I don't care [laughs]. I don't care if it's just for you; do it right [laughter]. So, a really large reason is data quality, right? What if you're expecting a number and you get a string and everything is just a varchar? Or, like, what if we're expecting, and I know we do this a lot here, and I do it a lot, too, where, like, Postgres doesn't use up all the space. Like, MySQL, if you said, like, varchar(250), it's using 250 bytes, right? If you do it for Postgres and you put two bytes worth of data in there, it's using two bytes. And it's the same with Snowflake. Now, Redshift works the opposite way, where I had to be careful about the sizing of varchars. But, like, let's say state code, for instance, right? If you're operating only in the U.S. and you're doing state code, you want that to be two characters, and if it's something more than two characters, you want that to break. You don't want that to go in, and then you want to catch it in the application. Because the best place to do data quality checks, especially when you have humans giving you the data, is the application. And you want your data types to match that for data quality. And that's a huge [inaudible 54:31] about data quality, and that's not the only reason. Previously, it was faster, and it probably still is to some degree, but computers have grown a lot since then. But why a lot of, you know, you got a lot of relational tables, and you would do enums that go out to another table, like numbers are faster to look up. That's less the case now, but I'd still argue varchar-ing everything is a terrible idea, even for look-up speeds [laughs]. BILL: When you use the right data type, and if the business rules require it, constrain that column to a certain length; you get built-in data integrity checks for free. DAVE: I did a lot of geolocation at a previous job where we were, like, trying to find, you know, pins on a map. And k-space indexing, like, two-dimensional geospace indexing, if you're just throwing JSON strings in there, good luck. You're just going to have to scan the whole database if you want to find anything in the U.S. But if you index it based on, you know, geolocation, by having that in a special format, you can index a lot better. ZACH: Your geoms. I've never done, like, mapping things out until I came here. So, like, geoms, and, like, all the functionality inside of warehouses, they'll let you, like, plot locations on a map. There's some that will take into account the curvature of the earth, and some that’s just like as the crow flies. And they have their own data types of geoms, which is very foreign and very fascinating. EDDY: I'm just throwing out a bunch of data because I'm taking advantage of the fact that I have data people here who can just answer my questions. DAVE: I love it. I love it. EDDY: And so, [inaudible 56:20] right? So, this is a genuine question, right? Why would you ever want to use ints for IDs, right? ZACH: Yeah, you don't. You never want to. You want to use BigInts, because if you just use ints, you run into a problem that we've seen, where you run out of numbers. And also –- BILL: It’s happening at LMS right now. ZACH: Yeah. And so, like, if you ever sit there and think, oh, I’m making an ID; let's just do an int, that'll get you a way, sure. Why not? But then you're the reason we all have to struggle and figure out a creative solution to turn that to a BigInt. So, [laughs] Eddy, BigInt IDs always. EDDY: Is that just fair to say that's the default? Like, even if you don't fully expect that table to grow exponentially. ZACH: Yes. EDDY: Is there a cost for associating a BigInt versus an int? ZACH: No, not in Postgres, which is what you're working in because Postgres is only using the amount of bytes that it’s stored. The cost can happen in systems. Like, if I remember correctly, and it's been a while since I worked in it, MySQL, you fill that space with, basically, think of it as bases, right? Like, you say 64, or is it 126? I can't remember which, but for BigInt, right? I think it's 64. And so, like, if you have 2 numbers in there, it will fill...think of it as filling the other 62 with spaces, and it uses that in storage, but Postgres does not. MATT: MySQL allocates it. ZACH: Yeah, there is no cost [inaudible 58:03]. So, I would argue that even with the cost, it's worth it [laughs] to not deal with running out of numbers. DAVE: And again, like, on the data side, storage is free and compute is expensive, right? We're on app. It's the other way around. So, we're like, oh, conserve space, conserve space. That is awesome. When I worked on the data team, I used to tell people that we were in charge of the numbers, and last Tuesday, we almost ran out of sevens [laughter]. Yeah, so...Oh my gosh. Anybody have anything to wrap on? This has been fantastic, Zach, Bill. I hope we can have you guys come back. This has been fantastic. ZACH: I think the idea was floated around where we get, like, my whole team, and I think we should. We should. BILL: This has sparked a number of ideas for me as well. MIKE: Oh, nice. DAVE: Fantastic, Fantastic. BILL: I'd really like to start talking about partitioning. DAVE: Oh yeah. BILL: Because we have a number of systems with tables in the billions, and, normally, you start partitioning when you hit about 100 million. DAVE: That is fun. EDDY: What I think, Bill, you started to introduce, at least at Acima level, is, I think you take for granted, you know, that you're working under your schema for so long that you just understand it. But when you're coming in fresh, and you're expected to understand what all that data means, right, and we don't document, right? Because you had a big push, like, guys, add comments to everything so that I know what you mean on what you're storing here, right? I think you really opened up, like, a fresh perspective on, like, guys, we don't all work in your table and in your schema, right? Like, please be nice and tell me what that is, right? And -- BILL: Those comments are meant to trickle all the way down to analysts, scientists, and users. Yes, it's definitely not just me. And this is just the base bedrock layer that's needed. On top of that, there's...I don't know if you can see in your articles on LinkedIn lately, but on top of that, is the semantic model, the ontology, the decision tree. There's so much context that goes around a company's data, and just the basic definition of it is the most bare-bones thing I could request right now. But there's a whole lot more to it. And once we have that kind of meaning, we can turn AI loose on our data and do amazing things. ZACH: That's what, previously, right, this initiative, but previously, you had...and I'm saying previously, six years ago, right around the time that I started, up until, like, four years ago, probably, we had less microservices; we had less data. We had a person named Casey that you could go ask what things meant, and he was so entrenched in the data that he would be able to tell you. We've far outgrown that. I don't know everything here. And Casey no longer knows everything here, and he's still here. There's just still a lot of unknowns because we've grown too much, and that's, you know, the comments in the databases. All the stuff Bill's talking about, those are part of growing pains. You've got to make sure that people understand what the data is. And I get it asked all the time in some data channels of, like, “How do I do this?” And I go, “I don't know. Maybe you should go ask Merchant Portal.” And then that question gets put into Merchant Portal for the data owners to actually answer, right? Because you work on a microservice. You are the owners of that data, where I'm the consumer of the data. So, I'm not going to make speculation on what it is. But once all this documentation is done, they can go look. They can see what it says. And if they have questions at that point, go ask, and then you know you have to update your documentation because it's not good enough [laughs] -- DAVE: We're going to get to a point where instead of asking what the lease is doing, we can ask how the lease is doing. ZACH: Yeah. DAVE: I would love that. This is probably a good place to wrap. I would love to have you guys back, even just for a SQL show, SQL, pun unintended [laughter]. But this is a great spot. Thank you, guys, so much for coming. Let's wrap here, and we can move into an after-call. This has been the Acima Developer podcast. And thank you for coming, and hope you'll listen to us next week.

1. huhti 2026 - 1 h 2 min

Episode 94: Staying Cool During Production Issues

Mike opens by framing “production incidents” with a vivid non-software story. As a teenager he smashed bathroom tile with a dead-blow hammer, drove his pinky knuckle into a jagged shard, and had to manage both the injury and the panic of his little brother who got sick from seeing it. He uses that as the metaphor for on-call life. Bad things happen, reactions vary, and what you do in the first moments matters, especially staying calm, reassuring others, and focusing on the most urgent next step. The group riffs on modern incident response, starting with humor about “just ask the LLM,” but landing on a real point. AI can be excellent at sifting noisy logs, even if you should not blindly trust it mid-emergency. Dave pivots to the idea that the best loyalty, from customers and coworkers, is earned when something goes wrong and support is excellent. He describes jumping into a long outage call ready to tear apart his own recent work with zero ego, because people remember who shows up with “two tow trucks” when everything’s on fire. Mike and Justin emphasize composure and delegation. If you are overwhelmed, hand off to someone with a cool head. Prioritize restoring service, “stop the bleeding,” before deep root-cause analysis. Invest ahead of time in rollback plans, feature flags, staged rollouts, and observability. From there, they broaden into practical triage and long-term resilience. Verify the issue, look at metrics and dashboards to identify symptoms like CPU, disk, network, traffic spikes, and database issues, and narrow the delta between last-known-good and broken. They discuss how constraints differ in mobile, including App Store review delays, crash loops, and reliance on the user’s device and network. They also cover security incidents, where you need monitoring to detect attacks, plus coordinated mitigation like blocking traffic and working with vendors. They stress the importance of having an incident quarterback, a playbook, and a contact list for after-hours escalation. The close focuses on what comes after the band-aid. Do postmortems and cleanup so temporary fixes do not become permanent donuts. Balance realistic risk planning with business needs. Emphasize strong observability and the ability to recover quickly, alongside prevention, echoing practices like Chaos Monkey and the idea that monitoring prevents historical events from re-happening. Transcript: MIKE: Hello, and welcome to another episode of the Acima Development Podcast. I'm Mike, and I'm hosting again today. We've got a good crew here today, and I'm excited about this one. We've got Kyle Archer, Eddy Lopez. We've got Dave Brady. Hello, Justin Ellis, Thomas Wilcox. We've got Ramses Bateman, and Will Archer. So, I think we've all been here before multiple times [chuckles]. We've got a familiar crew to talk about an important topic that's always fresh because [chuckles] there's a constant need. I was racking my brain what story to tell for this, and I ended up going back to...I don't even remember exactly when it was, but it was somewhere in my late teens, early twenties, in that era. So, admission, that's quite a long time ago [laughs]. That's more than halfway back [laughs]. And I was helping out at my parents' house with some remodeling they were doing. They were tearing out the...they were redoing the bathroom. And so, they were tearing out...they had a wall that had some tile on it, and they were tearing out the tile. And they were going to put some new...I don't even remember. They shifted things around, but they were tearing out the tile. That's the important part. And I had my little brother with me nearby. He was too young to really help. He was, like, six. And, you know, he was just hanging out and chatting with me, and I was taking a...they call it a dead blow hammer. It's a hammer with sand in it, so when you hit, it just stops. So, it's a weighted hammer, but it has a soft landing, so it doesn't have a...it doesn't bounce back, right? It just kind of stops, rather than having a strong bounce. It's good for situations where you want to do that, right, where you don't...you really don't want it bouncing back and hitting you in the face. And I was breaking up the tile wall. Context, there I am with, like, a six-year-old breaking up a tile wall. And there was some wire mesh behind it, and I was gradually peeling back. As I broke it, I was peeling back this wire mesh that was embedded in some sort of mortar. And I was pulling out [inaudible 02:26] the cement behind the tile. And so, as I'm banging, I pull back a piece, you know, pull it back because I'm making some progress, and I swing in. And because that broken tile is now hanging out and mounted on that wire behind, with the, you know, the cement that's holding it together, when I swing with that hammer at full force, right after peeling, you know, an extra layer back, I sunk my knuckle of my pinky finger right into a piece of broken tile. And I go, oh, and I look down. And I look down into my knuckle, maybe five eighths of an inch, a couple of centimeters more than you should be looking down into a knuckle [laughs]. Oh [laughs], that moment, that's not good. And then the blood starts, right? A rather remarkable amount of blood, I'll say [laughs], was coming out of the finger. Remember, there's a six-year-old here in the room with me. And he yells, "Mom, dad, come help Mike. He's really hurt bad." And, of course, they're thinking the worst. I'm like, "No, no, no, no, no, it's okay [laughs]," yelling. But, you know, there's the moment of panic there. And so, I had some choices in that moment, right? What do I do? Luckily, I think I handled it pretty well. I comforted the people around me to let them know this isn't a disaster. I'm going to need to do something, but you don't need to, you know, call 911. Unfortunately...so, we got everything up, went to one of those urgent care places. They stitched me up. I could tell some other weird stories about it there. A few weeks later, I noticed a little white mark on my finger, and I started pulling, and it was a piece of the thread from the gauze that had somehow got stuck in my finger. And I pulled out, like, a foot [laughs] of this string out of my finger, and then it snapped down near the bottom, and some of it zipped back in. I've never seen it again, like, oooh [vocalization] [laughs]. And I still, when I touch my knuckle, I feel weird sensations all the way down the rest of my finger. It's a [inaudible 04:21] impact of that one. But my poor little brother [chuckles], he got sick from seeing it, and he was throwing up and just not okay. And I felt bad, and I had to comfort him, "This is really okay. I get some stitches, and it'll be fine [chuckles]. It will be fine." And [chuckles] I felt really bad because I was not really even thinking about it. I didn't realize that he was not okay. So, when I discovered before I left, like, 10 minutes later, he wasn't okay, you know, I gave him a hug, you know, tried to help him feel like things were okay, get a ride over to the urgent care facility. They stitched me up, and I'm fine. Today, we're going to talk about dealing with production incidents. And I bring up this example because it's outside of software, but it's a production incident, right? You've got the bad things happen, and what do you do? What do you do now? And I think that there's some aspects to that story we can riff on as well as others. But it helps set the stage for a lot of what happens when we have these production incidents and what we do in that moment because it matters a lot. And how some of the reactions, you know, there's a variety of reactions to this moment among the various parties in place that had some better, some worse, you know, impact. So, servers are down, you know, how do you keep cool? Things are on fire. And that's our topic today. And I've got definitely some thoughts on this. I've written down some notes, but, as usual, I don't want to...I've told the story, right? I've laid out the context. So, I am really hoping some of you all will have some initial thoughts to lead out with. EDDY: Sorry, is the answer not ask AI to see what's wrong with your server [inaudible 06:02]? MIKE: [laughs] DAVE: How do you think the server went down? EDDY: I was thinking, is that not the go-to answer now? I'm sorry, podcast over. Ask the LLM. [laughter]. WILL: Not not the answer. DAVE: The AI is going to say, "You are absolutely right to be upset that the server is down." JUSTIN: So, related to that -- WILL: I mean, I'm just saying that's not not the answer. Like, AI is great at reading a log. Like, it took me -- DAVE: Yeah, actually. WILL: Years, if not decades, to get, like, pretty decent at reading log vomit, you know what I mean, like, filtering through the chicken innards that [laughter], you know, a log will, like, throw up all over you and just be like, "Oh yeah, that's actually it." AI is actually super duper at that. I don't trust it, especially in an emergency but, like, do that. Sure. Yes. Do it. EDDY: I was literally pairing with someone, and we were looking at a Grafana log, right? And I'm like, "Oh, it's because of this." And they're like, "Where? Where is that?" And I'm like, "Oh, I read it somewhere here. Hold on, let me find it again." And, like, you get so good at ignoring all the clutter, you know, and just filtering everything. But, oh my God, dude, like, AI can sift through, like, raw JSON, like candy. DAVE: I have a thought to throw out. I have a bunch. I always do. But one of the things that...and this is not really a production thing, well, maybe it is: loyalty. The thing that makes somebody loyal, a customer, in particular, is you get this graph of, like, did they have a good time, or did they have a bad time? And then did they receive good support, or did they receive bad support? And the most vehement haters of any product are the people who had a bad time and got bad support, right? Just got told, "You go away, not our problem." We've all had examples of this. The most loyal customers, this is interesting, are not the ones who had a good experience with good support. They're the ones who had a bad experience and had fantastic support. These are the rabidly loyal fans. Imagine you've got a car, and you blow a tire on the road, okay? And you call AAA, and they're like, "We're busy. Go away." You're like, "I'm canceling my AAA membership immediately," right? You buy new tires at Big O. You drive along. You're great. You never have a problem with it. Okay, they're tires. They're supposed to be tires. I expect them to be tires. Now you're driving down the road. You blow a tire, and by the time you've hung up the phone, two tow trucks have arrived, one of them with a spare tire and a change and a mechanic, and the other one's ready to tow your car if the tire change won't work. They take care of your tire. They replace it. They get you back on the road in 5 minutes, plus a $10 coupon to, you know, to Chili's or whatever, for, you know, "We apologize for the impact on your time." Would you ever buy another brand of tire? I wouldn't, not in a minute. So, what does this have to do with production incidents? This is the story I tell myself in my head of I want to be that guy when my code breaks. I want to be the guy that absolutely had no ego about, you know, how the server went down. I'll talk story on myself here a little bit. We had an outage about a month ago. I'm very, very proud of the fact that I had gone...I've been here for five years. I've never taken out prod. I'm a very cautious engineer, and I'm kind of proud of that. And prod went down about a month ago, and, man, then there was, like, a five-hour incident call because stuff was going on and things were...oh my gosh. What are we going to do? And I joined in the call. And I'm spearheading. I'm like, "Well, it could be this. It could be..." and I'm, like, reaching, well, I might have screwed this up. It could be this other...oh, man, I didn't consider this thing. Let me go test that. And I basically was Johnny on the spot. With any resource you needed, I will tear apart my own pull request and anything in it. I don't care. I'm not here to be proud to be the best engineer. I know the server's down. I care that the server is back up, and I want everyone in the room to know that Dave was the guy who showed up with two tow trucks, a change of tires, and a $10 gift card to Chili's. And then when it turned out that the server went down three minutes before my deploy, and everyone went, "It can't be Dave's deploy," it went from, "Wow, Dave is really carrying this," to, "Holy crap, Dave is carrying this, and he didn't have to." And Andy gave me a pat on the head at architecture for really showing up and driving the ball on that, and that's how you turn an absolute crisis into a huge opportunity. What people remember is what you were like when things went bad. How you behave when things are good is a terrible predictor of how you will behave when things go bad. And how you behave when things are bad is the best predictor of long-term relationship success. And can I trust you, and do I want you around forever? So, that's my inspiring speech about that. I'm not trying to blow my own horn, because, I mean, obviously, my...I deployed something, and things went down and could have been me. But it's who you are when it goes bad that people remember. MIKE: You know, you talked about how you respond. In my initial story, I mentioned, you know, a few parties here. You got the little kids. JUSTIN: Mike, are we just going to let David, like, drink out of a beaker here [laughter]? DAVE: It's not a beaker. It's an Erlenmeyer flask [laughter]. I do do mad science. JUSTIN: What kind of a, you know, show you got going on there [laughter]? DAVE: For those of you listening at home, which I guess is everybody because we don't actually publish the videos, I have a magnetic stirrer. You got to [inaudible 11:31] tell the story. I'll tell everybody. I have a magnetic stirrer. I bought it for resin and, you know, paint and stuff like that. And every once in a while, I thought, you know, I could mix, you know, my Kool-Aid, or I could mix, you know, my Liquid I.V., or my LMNT. I could mix that in it. But if you put it in a regular cup, it splashes it everywhere. And I'm like, I might as well just buy the stupid lab equipment that goes with the stupid stirrer [laughter]. And so, yes, I do have this. Now [laughs], this does absolutely nothing to excuse the fact that this is root beer with hot sauce in it. I'm not kidding. I am a monster. I have a reputation to live up to. So, there you go. WILL: Don't drink out of the resin beaker, man [laughter]. DAVE: You're not my real dad. WILL: Do you want microplastics? That's how you get microplastics [laughter]. You get macroplastics [laughter]. DAVE: Exactly. These are culinary only. WILL: [inaudible 12:22] army man. DAVE: Yeah, these are culinary only. These are my portable flasks [laughter]. JUSTIN: [inaudible 12:27] you keep the labels correctly on those [laughter]. DAVE: Oh, jeez. I'll switch to this one. EDDY: I mean, how many of us actually drink from a plastic water bottle, you know what I mean? You'll [inaudible 12:39] way. It's inevitable. MIKE: Honestly, I drink out of, like, a mason jar a lot. It's glass. It's not going to give you the microplastics. It looks funny [laughs], yep. But -- JUSTIN: Mike, back to you. I was just very -- [laughter] MIKE: So, aside completed, segueing back...the responses. So, the response of somebody who was overwhelmed by the situation and just went and started vomiting. He couldn't control that, right? Like, that was a reaction that was completely outside of his...out of his voluntary control, and that's fine. You should, you know, you're in a situation where millions of dollars are on the line. You're not okay, bow out. And I think that that's the responsible thing to do. If you find yourself in that situation, delegate to somebody who's got a cool head and do that because that's, like, the first note that I wrote down. If you can't maintain focus and be like, okay, that's okay because you can't help it, like, there's not shame in that, but there is shame in not admitting it, right? You know, pretending that you're okay. Because, under stress, sometimes we have unexpected reactions. Usually, you're not the only one, right? You're part of a team. Bring the team in. Give it to somebody else. But having that cool head, I think, matters tremendously because you've got some important decisions to make, and the order you make those decisions in matters a lot. I would argue that, you know, the next...you probably got three things you've got to do. You can always...I wrote down five, but the first thing that you do matters a lot because, a lot of times, people say, "Oh, wow, things are broken. What went wrong?" And then they'll spend the next six hours trying to figure out what went wrong when the servers are down and your business is losing money [laughs]. DAVE: Yeah, we don't care what's wrong. We care about the servers. Yeah, give me cash flow. MIKE: Exactly. DAVE: Stop the bleeding then take the bullet out. Yes. MIKE: Bingo. And I was thinking, literally, that's what made me think of my incident [chuckles] back in my youth because, literally, I had to stop the bleeding. Nothing else really mattered, right? I put direct pressure on that. I went, and I got the stitches. And they asked me. I remember that, like, "Do you have feeling in your finger? Do you think it severed a nerve?" I didn't actually realize that I had at the time [laughs], but that didn't matter as much as, you know, let's get rid of this gaping hole in this guy's hand. That matters a lot. Stopping the bleeding should go first. Go ahead. JUSTIN: Yeah, and when you talk stopping the bleeding, I think a lot of this is, like, in the prep work that you do. And 9 times out of 10, for production releases, for me, if you do a production release and something goes bad, you've got to have that back-out plan ready to go. And whatever that is, hopefully, you're doing installs multiple times a day, and your back-out plan is just hitting a button, you know, just getting back to normal, which was, you know, whatever it was before you did that deploy. And, you know, if you have that up and running, that's a sign, I think, of a really mature business. It's like, hey, I can go into prod, and if something breaks, I can back out of prod within 30 seconds," and life goes on. And then you, like you said, then you could figure out what...dig out the bullet. WILL: Right. Well, yeah, I mean, but it's always, you know, I don't know. I mean, I'm always hesitant to, like, hop in the Wayback Machine, right? Because, like, if we're going to be like, all right, step one is go back in time and make sure that you can claw back that deploy [laughter], no. Step one is, like, don't write the bug in the first place. I mean, you know -- DAVE: I actually call this the time machine problem. WILL: If I'm [inaudible 16:19] I'll fix it all the way [laughs]. DAVE: Because everyone's solution is, well, don't do that again. Well, don't do that. I'm like, well [laughter], where were you an hour ago? MIKE: Well, it's also tricky if you're deploying an app. So, Will, you're working with mobile apps, right? -- WILL: Oh yeah, oh yeah. Like -- MIKE: You don't get to go to the App Store and say, "No, I didn't mean that. You downloaded that to somebody's phone. Please bring it back." That's not on your list of options. WILL: You can get done wrong. You can get done real, real nasty if you bungle a mobile app. I think it's only happened to me maybe one time in my career, where you get the dreaded crash loop, where your state in the app is corrupted, and it's not fixed with a hard reboot, right? Where, like, your state has gotten corrupted. And it didn't happen to everybody, but there was an edge case where we had some people crash looping, and, like, that app's got to get smoked, like, you got to pull it off your phone. You can get burned super bad, to a degree, that is. EDDY: What's the rollback strategy in a mobile environment, right? Like, because you have to follow certain standards, you know, in the marketplace, right? Whether that's Play Store or the App Store, right? Like, if I remember correctly, they have, like, certain criteria and waves that you can release updates to your application, and they've got to approve that every single time, right? So, if something leaks, right, in that deploy, like, do they have, like, a fallback where you can be like, oh, crap, it's not working; let me just deploy the previous version on the application? Like, how -- WILL: Well, it depends, you know, there's rules for some people, and there's rules for other people. So, I started out as a very, very small fish in the App Store pond, a minnow. And you don't get nothing, like, they'll review it when they review it, you know what I mean? And you can beg, and you can grovel, and maybe they'll get to it, maybe in a day or two, or whatever. But, like, there's just a lot of minnows in the store, and, you know, the dog is always eating their homework, so you know what I mean? Like, you just...they'll get you when they get you, right? Android turned things around, has historically turned things around pretty quick, because I don't think they have a lot of, like, human beings looking at it. Android, you know what I mean, you can really usually get it down same day. But, like, you know, App Store, it could be days, you know what I mean? We're talking, like, you know, three to five business days. But, you know, I got into a bigger fish, you know, maybe, like, a trout, you know what I mean? And I had a number. You don't call that number very often, you know what I mean? But you can call the number, and there's a person, you know, at Apple Corporate, and you could grovel. You could grovel to a person, versus, like, just, like, groveling to this email where it's just like, I don't, you know. And now, you know, and now I work for some pretty big dogs and people you know. And, like, I can grovel internally to the VP who could talk to, you know, another VP, and they can make things happen, you know. And all my lickings happen, like, you know, in-house. And it'll just be like, "Hello. I'm the SVP of technology. And let's talk about how you shit the bed, Will [laughs]." You know, which is, you know, I mean, like, I don't know. I mean, like, if it has to be that way, it has to be that way. But things have evolved, right? Like, I'm not just some sort of, like, cowboy. And when you're working with, like, sort of big money and big engineering staffs, everything you do is feature flagged, right? So, like, you have a, you know, a live dynamic CMS, and anything I put out, anything I put out ever, you know, I've got an off switch. You just have to have that. That's, you know what I mean, like, at this scale, you've got to have a panic button. And there was also, like, you know, the app deployment infrastructure has evolved rather significantly since I've been doing mobile apps, in that, like, you're not blasting it out to 100% of your customer base. That's crazy. Like, that's psychopath work. You roll it out to, like, 1%. Let's see how it does. Let's let it simmer for a little while, right? So, it's good and bad, right? But, you know, there are best practices which, you know, to a web development shop might seem, you know, kind of primitive and anxiety-panic-inducing, which there are, right? I mean, because you've got to remember, like, if you're on a mobile app, you're running on somebody else's server, right? Like, it's their hardware. It's their machine. They could do anything. Anything. DAVE: Including nothing. Including nothing when it goes down. WILL: Anything. Yeah. You're out of hard disk, baby. Sorry, no more hard disk for you. Oh, you got a little greedy with the RAM. We're pulling your card. MIKE: [laughs] WILL: Sorry, no no, you know. Like, hey, like, oh, you had the network. You had the network, huh? That's cool. That's cool. But I'm going in a tunnel now [laughter], you know. Like, there are levels to the game. And, like, when, you know, like, your app, you know, your distributed application, you know, is in no way a guaranteed stable internet connection, no, no, no, no, no. No. Nobody's even pretending that that's the case. And things can get really difficult, and getting accurate telemetry can be very, very difficult, you know. Because there are certain crashes where you're just done. You're done now. You're finished. The operating system is stepping in. Daddy's home, and everybody's going to their room right now. So, those can get more difficult. But again, you know what I mean, because, like, you know, there are bigger dogs. You know, there are a lot of really delightful, you know, third-party mobile app telemetry gathering solutions. They'll give you screenshots now. It was great. It's so cool. I could be like, "Oh, it crashed," and I could just be like, "Oh, what are the, like, last, you know, few things that they have done in the app?" And I'm just like, oh. You know, where have you been all my life? MIKE: [laughs] WILL: Sorry. Thank you for coming to my TED Talk. DAVE: No, all good. I have season tickets. MIKE: You did talk about several things, though, that goes back to what we talked about a minute ago, or this ongoing conversation. What you do ahead of time matters a great deal. You say you don't push out changes that go live. What? Are you mad? You say, you know, push out changes that are behind a feature flag, and then the rollout is independent. A rollout of the feature is independent of rollout of the app, right? So, you've changed the cycle so that you actually do control the rollout. Or, as was said, when you actually have a web app, you have the ability to roll back. You press the button, "Oh, wait, yeah, now it's back." Problem solved. That prep work ahead of time goes a long way to making things right. Now, let's say things have gone wrong anyway, right? You've got unexpected traffic that's 10x your normal level, and now you've got a database query that's unhappy. There's no rollback, right [chuckles]? You've got live traffic, and you probably want to be doing something with that 10x traffic, right? You probably want to be making some money. What do you do? JUSTIN: That's where prep work comes in again, horizontal scaling. Well, unless it's hitting the only copy of your database, then you've got to do more. EDDY: It should probably stem from writing an ORM query versus just a raw query. Just saying, there's a lot of magic that happens when you write ORMs under the hood. MIKE: Oh, and it's always the database. It's always the database [laughter]. There is maybe sometimes it isn't, but, yeah, it always is [laughs]. It's something you've done with the database. You're missing an index. You've done something that you could do undo with the database, but now you're in a bad spot, right? You're in the bad spot. We talked about stopping the bleeding. You get in the call, a bunch of people upset. You've got three or four business stakeholders who are in the call asking you for a status update. You don't even know what's wrong yet, but you know the app is down, and it's all on you. Step one, what do you do? EDDY: Roll back, unless it's a database. MIKE: There's been no deploy. Things are down. What do you do? WILL: What changed? Something changed. DAVE: You just answered some first questions. WILL: We were happy, right? And then we became unhappy, right? So, what is the delta? What is the delta between happy and not happy, right? Like, could be just a lot of traffic, right? That's okay. Like, I went from happy to very happy to very unhappy, right? It could be a deployment, right? Dave was talking about the deployments, like, "Okay, I changed this thing," right? Okay, that's an issue, right? I mean, and so, like, identifying the last time that you saw the sunlight, that you felt human joy, you know, okay, well, there we go. And then you just sort of, like, narrow that delta down to, like, "Okay, it was here, and then it was here." All right, now you've got a stew going. JUSTIN: So, you're talking a lot about, you know, identifying this stuff. It goes back to, again, planning and making sure you have appropriate monitors in place such that you can go look at those logs and you can have that dig-in ability, and something other than just, "Oh, prod is down." It's like, where are my alerts? You know, I should be able to go into the logs and say, "Oh, the traffic is hitting the firewall here, and it's hitting the VPC, and then it's hitting, you know, the application, and then it's hitting the database." You know, is that traffic consistent all the way down the thing? And can I see all that in the logs? DAVE: How is the system down, right? Are you CPU-bound? Are you disk-bound? Are you network-bound? Are you hung? Yeah. MIKE: Notice that we're talking about going and looking at our metrics to see what's wrong, not going and doing a deep, like, root cause analysis necessarily, like, what's hurting here? DAVE: Right. This is symptoms and triage at this point. MIKE: Yeah, exactly. DAVE: Don't prescribe until you've diagnosed. MIKE: And that's the triage, exactly. And as mentioned repeatedly, you go to your data; you pull up your dashboards, right? Whatever you've got that you have to go get some visibility into that. Whatever you've done to observe, that's the first place you look, like an instinct [chuckles]. JUSTIN: Actually, the first thing I usually do is I go hit it myself on the browser if it's down [laughter]. DAVE: For real. For real. MIKE: Verify. DAVE: Works on my machine is a valid bit of data. I mean, it's a terrible excuse, but, like, it is actually up from here. Okay. Are you on the VPN? Are you? Yeah. MIKE: Absolutely. JUSTIN: That's really what I do first is [laughs], like, "Oh, I can [inaudible 27:58] [laughs]." DAVE: Confirm the bug." EDDY: "Wait, it's broken? Hold on. I don't believe you. Let me go to the website and see if I can replicate your problem [laughs]." DAVE: I had a support call. I worked for [SP] Joston's Learning. They were, like, an e-learning thing back in the '90s. And so, we would go in, and we would string Ethernet like radio, like RF cable, a 10BASE-T cable, if you remember that, like, coax off the back of these things. And the students would...for, like, middle schools, they would kick the plugs. They would kick the routers. And some of the students figured out that if they kicked the plug, they didn't have to study that day. So, they started getting...and the teachers got real good about going in and reconnecting the plug and saying, "Do your darn lessons," right? And we had one server that just...they came in on a Monday and nothing. Like, it just came up to, like, an "operating system not found" message. And I'm like, oh my, and so I did everything over the phone that I could possibly think of. I finally had to dispatch an engineer to the site. Engineer walked in, looked at the server, reached down, and ejected the floppy disk that somebody had plugged into the computer so that they could play Doom on the LAN over the weekend, and forgot to pop the disk out. And I got a lambasting from the engineer of, "Check the A drive next time that the computer won't boot, if it's booting to the wrong operating, you know, to the wrong disk." But everybody else's system was working, so it wasn't...I knew it wasn't on our side. But yeah, this turned out it was just the one server. No other servers in the building were affected because that was the one that Jose had decided was going to be the Doom server. EDDY: Would it be valid to say, "Grow callus, and then you won't feel it anymore," as a valid response to being cool during a fire? I don't necessarily quantify that as a valid...I don't want you to grow callous on the fact that you've broken it so many times that you don't feel it anymore. DAVE: Right. You're not wrong, though. EDDY: Yeah, exactly. It's sort of like [inaudible 29:55] under the pressure after you've done it so many times kind of grows numb a little bit, right? Like -- DAVE: I had a manager teach us how to get calluses instantly. It was fantastic. Servers were down. We were losing money. And the president of our unit walked in. And we were running around like chickens with their heads cut off, right? And he walks in, and he goes, "All right, we knew this was going to happen." And we went, "Hey, you're right. You're right. We knew this could happen, okay." And all he did was just normalize it. It's not the end of the world. This is a thing that can happen. Let's take this back into the catastrophic level. There's a thing that they tell 747 pilots. "In an emergency, wind your watch." If you're at 30,000 feet and you blow all 4 engines, they just stop for no reason, and you don't know why, you've got 20 minutes before you die. And in that 20 minutes, you have to find the right solution. I mean, you have to find the right solution. But there's a million things that it could be. Now you've got checklists that you can work. But they basically say the first thing you need to do is stay calm. Machines break. So, when you're at 30,000 feet, and all 4 engines stop for no reason, it's not for no reason. It's because it's a machine, and something has gone wrong. We knew this could happen. This is normal. It's not great; it's not ideal, but it's not supernatural. It's not lightning bolts from the sky. And that gets you into a resourceful mindset so that when the answer goes right by out of the side of your vision, you're not tunnel visioned on, my next attempt at the...oh, oh, oh, oh. It's that, it's that -- WILL: Yeah. You know, I would add on to that, like, does anybody in this call know of anybody who shipped a prod bug, screwed something up, and they lost their job? Can you think of somebody that that has happened to? We have decades of experience here, right? DAVE: One time. WILL: Because, for me, nobody, nobody. I can't think of a single one. DAVE: One time. And it'll be real clear that it wasn't the prod bug. It was...we have a thing, when we ship code here at Acima, you have to have reviewers review your code. And I introduced it at architecture, a couple of weeks back, that, you know, at CoverMyMeds, we called this "sticking your head in the noose with the developer." And you had to have a review from an associate, you know, a coworker, and you had to have a review from an engineering manager. And the engineering manager rubber-stamped a review. I'm going to say it was his own code, rubber-stamped it, shipped it 4:00 o'clock on a Friday, took out the fax machines, and he went home and didn't come back and check. And we were down all weekend. This was 10, 15 years ago. We didn't have any observability. We didn't know the fax machines were down, but it was his job to know that it was down. So, he did not get fired for taking out prod. He could have taken out the whole fax bank if he had just checked his work, or if somebody else had reviewed it, or if he had just turned around and fixed it. He got fired for criminal neglect, you know what I mean? Gross neglect, gross negligence. My definition of gross negligence is: if we fired you and replaced you with nobody, we'd be better off. That's gross negligence. That's what he did. He didn't get fired for taking out prod. WILL: I mean, so it's just something to, you know, if you happen to be, like, sort of like a [inaudible 33:23] developer, right? DAVE: I see your point. You're not going to get fired. Yeah. WILL: We've got a literal lifetime of, you know, dev experience. And if I'm wrong, just, you know, open your mouth and say, like, no, you're not here, but this is, like, a lifetime of experience. We don't know anybody who got fired for taking out prod. And I don't know if there's anybody on this call, you know, at a senior level who hasn't shipped a prod bug before. EDDY: Okay. Can you define the parameters on what you mean by taking down prod? Our gateway for API traffic is completely haywire, kind of thing? Or are you talking about, like, oh, our hosted AWS server -- MIKE: I'll tell you my first one. I had been there a few months, and I was asked to restart the service. I ran the wrong script and turned off the server. This is back when your server was in a physical data center, and the only way to get that thing back on was to drive to the data center and turn that server back on. And I turned it off. So, when I say down, I mean it was off [laughs]. And my manager said, "What did you do [chuckles]?" And then we figured it out, and we fixed it, and nobody was fired [chuckles]. DAVE: I don't have a black and white definition for taking out prod, Eddy. But as a sliding gray scale, the more money the company is not making, the more your taking out prod was. And related to the nobody getting fired, I once heard a CEO say to someone, this was, like, 20 years ago, somebody wiped out the system and came in, resignation letter written, hat in hand, hangdog expression. And the CEO said, "I just paid $12 million to train you. Why the heck would I fire you?" And I tell you what, he was the most diligent engineer after that. He'd gone through a $12 million training. WILL: That isn't to say, like, you know, like, YOLO, send it, right [laughter]? But just like -- DAVE: Yeah, that's the guy that got fired, yeah. If you're gambling with...if you lose $12 million, you're not going to get fired. You're going to get fired for gambling with $12 million of not your money. KYLE: I've always looked at whether or not prod is down as whether or not you're affecting your five nines. If it's something that you can report on for your SLA, then you've successfully taken prod down. WILL: Yeah, yeah. This week, and I'm still a little bit salty about it, and it'll be, you know what I mean, it'll be fine. But I had a thing where there's some analytics telemetry stuff in the code review process. I had to refactor it, like, three times for no reason at all. People wanted it, oh, what would it look like if the couch was over there? What would it look like if the couch [chuckles] was on the ceiling? What would it look like if the couch was on the front porch? And I'm like, okay, man, all right, you know, whatever. And so, I moved it three times. In the course of that, I missed some telemetry. There's some telemetry on, like, campaign reporting that isn't going to get out until the next release. And, I don't know, in my mind, that's a prod bug, you know, because, like, they're not going to know which campaign for, like, you know, two weeks. I'm really grumpy about that. I'll probably be over it by Monday. DAVE: You've heard the rule "Fail early, fail loud," right? It's just observability from the other end of it. It's like, if something's down, I want to know. I've had two times in my career when the CEO found the bug before anybody in QA or engineering or anyone. And it's awful when that happens. EDDY: I do want to backpedal to what Will said. You probably had that in mind when you first started, but you probably did it, like, three different times, three different iterations. You were so far in with refactoring that you probably forgot by the end, right? And I think that was more of a symptom, you know, of the work of the refactor. DAVE: And if I was two levels up, I would want to know who made you change it three times and why because those aren't free. There's clearly not free. I'm not a machine. WILL: It was my fault. It was my fault. It was my fault, like, I did it. DAVE: And you fixed it, right? WILL: Yeah, I did it. I fixed it. It was, like, a 10-second thing. It was just, I don't know. Anyway. DAVE: So, as a CEO, I'd be like, who made Will change this three times? Because if you make him roll enough times, he's going to roll in that one eventually. WILL: Yeah. Anyway, anyway, you know, it's fine. It's fine. Like, somebody else took down the dev server for, like, a 24-hour period, like, the very next day. So, if anybody is looking for somebody to, like, grump at -- DAVE: Yeah, you don't have to outrun the bear. WILL: It was me only very briefly [laughs]. JUSTIN: So, you guys chatted about, like, you know, moving on to what you do. You fix it immediately, right, and then you dig out the bullet. You know, digging out the bullet is kind of like the postmortem, and kind of mature organizations have a postmortem process. And that's always interesting. That's where you truly find out where your policies and your processes are lacking because, you know, you shouldn't have shipped the bug. Something caused that bug. When I brought down prod, I was lucky because it was after hours. Otherwise, somebody may have been fired. But the postmortem was painful, but nobody felt terrible about it because it was like, it was my fault. It was like a string change, and the string was...I had changed it to...the string was supposed to be "production," and I had changed it to "pro," so "pro" versus "production." And we found out the reason why I changed it to "pro" was because in all of our other environments, it was "uat" or "dev." And I was like, oh, that's convention. We just use the three-letter word, but no, "production" was the whole word spelled out. And this was when I was working at Fidelity. We'd done the install. It went out. We got calls almost right away. And, luckily, we'd done the install at, like, 4:00 o'clock in the afternoon, so the trading day was over. But it was, you know, the conversation with my boss the next day was just, like, sweating bullets and everything. But it was just like, you know, like you guys said, it was like, oh, as long as you learn from this and don't assume. That was the extent of the postmortem. In other places -- EDDY: Also, like, I think that speaks volumes to, like, the brittleness of their [chuckles] system, right? Like, if you can change something... JUSTIN: Oh, it was...You'd be amazed at what our financial system is running on. It's, like, duct tape and very, very brittle -- EDDY: I'm not surprised [inaudible 40:27] tell us [laughs]. WILL: I would not. I would not. I want to kind of digress, and I'm very curious about this. Like, we talked about, like, sort of, like, you know, like the developer, like, blowing things up thing, right? But, like, Justin, you're working in security. What about security breaches? How do we deal with, like, a security breach? How do you even know there's a security breach? How do you, you know what I mean, do a postmortem for, like, a security thing? Like, oh, we had a compromised system. What do I do about that? The server's up happily spilling its guts to anybody. [laughter] JUSTIN: Happily divulging all the secrets [laughter]. So, again, it goes back to monitoring because you got to be able to know when you are being attacked. Because if you don't know that you are being attacked in some way, you just think it's normal traffic. You got to have monitors on what you are interested in because if you don't have the monitors on, they'll just take all your secrets. They'll take all your money and everything. A good example: when I worked at Coinme, which is a cryptocurrency company...Is it still okay? Yeah, they were bought out by somebody else. Okay, I can talk about this. When I worked there, it seemed like at least once a month our servers were under attack, either denial-of-service or password, you know, people were attacking, trying to steal passwords, or that sort of thing. And cryptocurrency is probably like the wild west, the most wild west financial industry there is right now. But we had to go in, and we had to...on the denial-of-service attacks, we were on the call with Cloudflare and trying to figure out, oh, what could we block to, you know, stop this denial-of-service attack, whether it's whole swaths of the earth. You know, we're going to block all of Russia. We're going to block all of Eastern Europe. Or if we decide that, you know, oh, we can block a certain type of browser tags or, you know, all those sorts of things were considered. And sometimes we actually had to do a live install to add custom tags to our traffic so that we know what was good from us, and that would block these bots that were under attack. And so, it was nuts. Like, there were several times when we were, like, all night long fighting this sort of thing. But you basically just had to figure out, okay, what's their avenue of attack? And then, you know, figure out ways to block that traffic that was coming in. And sometimes we had whole swaths of our customers who got locked out because they were under password attack. So, it is a wild west, depending on, you know, what could happen. And then, you know, the next week, because it usually happens on a weekend, the next week we'd have a postmortem about, you know, what could we do to defend against that kind of attack? And sometimes that postmortem was, you know, done with our security company, or with the companies that we contracted with to help us block that sort of thing. So, it was interesting, and it was very, very detailed and kind of a crazy thing that we had to deal with in those cases. MIKE: What you're saying there is interesting, and you're hitting on something that I was wanting to bring up, because it's kind of a gap in our conversation. We said, oh yeah, you stop the bleeding, and then, you know, you figure things out. Well, sometimes stopping the bleeding is not an instant process. You talked about, you know, part of the triage: okay, I know they're bleeding. You know, you've looked at the metrics. You see, okay, I know something's going wrong. There's internal bleeding here. Or, you know, obviously, you know, we're getting a denial-of-service attack. What next? Because there's usually different options, and they have different value. There's a difference in what you do. You got the database issue. Do you add an index? Do you rewrite your query? What do you do? There are different options, and those different options have different costs -- JUSTIN: I actually want to bring up one point that you have here. You're investigating what the cause is. You got to have the contact information for all the people that you might need to contact on a Saturday night in order to solve the problem because you can't be an expert at everything, right? So, make sure that you have the contact information of these people and that you treat them nicely, and you [laughs] reward them because you are intruding upon their time that perhaps they were not on call. MIKE: Ramses is on the call. He hasn't said anything. I'm always glad when he's on the call because he knows everything [laughs], which may not be quite literally true, but it's close. RAMSES: It's far off. MIKE: [laughs] You know, having the right people in the room matters a lot. That's a really good point. And you better have a process for calling those people. DAVE: He doesn't know where all the bodies are, but he knows where the memorial services are held. MIKE: Making those choices matters. And it's really easy to get rabbit-holed on something because you're like, okay, we need to come up with a solution. How do we make this work? And you don't want to explore every option. That takes too long. So, there's a delicate balancing act that you're performing during that time, whether it's all night with a security issue or your database is down. Every minute's costing you a million dollars or whatever it is, right? You better be making a choice quickly. We've talked a lot about having presence of mind. Well, it matters a lot. And I think it's really important that you give yourself the mental space to explore that and find the right option. And that can go really wrong really easily. It's very common when you have that incident call, you have a lot of people who join in, and maybe you do have several business stakeholders who are coming in who are asking questions repeatedly. And they want to know, and rightfully so. But they should not be in that incident call when you're resolving the problem. You jump in with somebody else to have the discussion. I think it's critical that, whatever it is, and, you know, there are business stakeholders who actually can be really good, and they'll back up when they need to. But you need to get the people who can solve that problem, those people you mentioned, into a place where they can legitimately think and make a good decision. They can evaluate those options and pursue the best option. A few months ago, I was involved in a production incident, and I saw a lot of noise and people getting focused on something, or not knowing what to focus on. You know, there was lots of bouncing around, and helping people make a choice, "We're going to go this way," went a great deal to getting that solved in a much shorter amount of time versus hours, days, right? You need to get that. That's a big deal. Have you all seen that same dynamic? WILL: Honestly, like, most of the time that I've been in these calls, people have weighed, you know, I haven't seen anybody sort of freaking out. I think I've been pretty lucky in that, you know, the people from, you know, like, the higher upper-level managers are just sort of like, what's going on? I mean, I don't know. I mean, maybe a piece of it is just me, you know what I mean, in that, like, I will tell you exactly what I know in clear and concise ways. This is what I know. This is what happened. This is what I'm going to do, you know what I mean? And then I just sort of, like, and now I'm going to go do it. And they're just like, it's everything I needed, and I'm going to leave, you know? So, I mean, I think, you know, kudos to them. And, like, ICD 2, you, as, like, sort of, like, a first responder, let's say, need to be aware of, you know, what their ask is, right? And, I mean, you're going to talk to their boss. And they're going to talk to their boss, and then they're going to talk to potentially, you know, the big boss. And everybody needs to know what's going on, what's being done, you know what I mean? Because the CEO, like, in a big company at least, right? CEO's hands are tied. They can't do anything. They couldn't fix that server if they wanted to nor, you know, in most instances, could your boss, or at least your boss's boss. If you don't have dirt under your fingers, you're useless, you know. And so, your job is to communicate. No offense. I mean, it's not like they don't do anything at work, but when the server room is on fire, like, if you're not helpful, just, yeah, let me get the fire out, and then we can do manager stuff, you know, later [laughs]. JUSTIN: Yeah. And there's generally a playbook for incidents. If you guys have an on-call rotation, which I believe you guys do, we have one. You have a playbook that clearly designates, you know, oh, somebody is on call, and they have the power to declare an incident. They are in there. They're the incident quarterback. That's what we call 'em here. And they have access to all the people they need to call. And they are also responsible for communicating up and handling, you know, the managers that may come in and, like, throw their arms around, or whatever. And the incident quarterback, I think, is really key to maintaining a calm, you know, demeanor during this incident. And it's key that anybody who has the potential to be that has that right training so that they know how to use that playbook, what they need to do. And, you know, it's really nice if they do know how to do that. If they don't know how to do that, then you're doing on-the-call training if you happen to be on that call. WILL: I'll take it for granted that there is, like, a binder that one could open up and handle the incident, you know, or God help you, training [laughs]. Your training is, this is the spreadsheet for your days, and maybe there's an email or something. Yeah, yeah, I don't know. I've seen training. I have seen it. I've witnessed it where, like, they're just like, "Okay, this is the training stuff." Like, I know it can happen. However, however [laughter], however, like, as often as not, it is just a Slack message, like, "Hey, server's down. Can you get on this call [laughter]?" And I'm like, "Yeah, yeah, I can". DAVE: I pushed really hard to get, at CoverMyMeds, to get...we called it the 3:00 AM playbook, which is just a checklist, right? You know, do this, do this; do this; do this; look at this. If it's this, do that. And we literally had to write it for somebody with no context, no knowledge of the system, other than, you know, generic familiarity with the tools. And it's 3:00 o'clock in the morning. You're sleepy. And all you want is to go back to bed. And literally, the outcome of the 3:00 AM playbook is to stop the bleeding. It's not even pull out the bullet. It's literally get the server back up, watch it for a few minutes. If the server looks like it's going up, it's still up, go back to bed. We'll dig the bullet out in the morning at 8:00 o'clock. MIKE: So, you stop the bleeding. What next? That's the key thing first, right? It's very easy. Well, maybe even have some partial fix in place. It's very easy for people to say, "Oh yeah, problem solved," and walk away. And then, two weeks later, it's still, you know, your Band-Aid's in place, and the Band-Aid falls off [laughs]. DAVE: We've all worked on systems that's got the little donut spare tire that's been there for seven years because it works. MIKE: Yeah [chuckles]. How do you deal with this in the long term, [inaudible 52:13] as soon it's happened? How do you end up stronger going out of it than you went in? WILL: It depends on how fast you got to drive down the highway, man. Like, there have been plenty of sort of, like, robust failover systems that had, like, a kind of a slow, you know, peptic ulcer memory leak, where they just cook for, you know, a couple of weeks or a month or so. And, eventually, you'd get to a point where it's just like, yeah, that one's got to go. And you just, you know, you vote somebody out of the pool. You keep on going. There's no, you know, it could be bad, like, you'd just be like, ah, it'll be fine, you know? There's no one-size-fits-all there, you know. Some stuff's like, we're working all weekend, baby, and other stuff is just like, nah, it'll be fine. DAVE: We did a couple of systems where we needed to know that, like, we called it Meteor Strike Level Readiness. So, we literally had our entire cluster, like, 700 servers running in a data center in Atlanta and another cluster in Chicago. They were not synced. Like, the databases weren't slaved to each other. They weren't, you know, synchronizing. We ran off of the one, and it just sent backups to the other one. And, every six months, we would fail over to the other data center and use the other one as the backup. And, in three years of doing that every six months, by the time I parted ways with the company, it was still an all-night. And at 4:00 in the morning, we were all writing down the stuff that didn't work that needed to be fixed over the next six months. And it was awful because we would take down prod to do the fail...I mean, we were simulating, like, literally, a meteor has hit Chicago, and we've got to switch over to Atlanta now. How fast can we go? And we still had long lists of things to do, but we got very good at triaging: what's the most important thing? And the most important thing was, how fast can we get Atlanta up and running and then figure out how much is left in Chicago, and what can we do with it? So, that's a lot of money. So, that's another element, right? You slap the donut on that spare tire. And, all of a sudden, the CFO is like, "Why are we spending money on this? I'm still making money." Well, you're not going to be CFO for long. WILL: I mean, I don't know. There haven't been a lot of meteor strikes, you know, in the past 20 or so years. Like, you know, like, Atlanta, both Atlanta and Chicago have been, like, remarkably durable. We haven't burned them to the ground in 100 years, 150 [laughter]. DAVE: And, honestly, it's historically had the same amount of likelihood that they both get hit at the same time, honestly. So, I'm not even sure what we're doing, so... WILL: Yeah, yeah [laughs]. Who would nuke just one? DAVE: Right [laughter]? It's like Lay's potato chips. You can't nuke just one [laughs]. WILL: Nobody who would do it is going to be short. DAVE: That's right. That's right. JUSTIN: Yeah. And I think that has to do with, like, a realistic evaluation of what could happen. Because you could sit there and prepare so much for any sort of thing that could happen, but there's a cutoff point. And I think a reasonable level of risk is acceptable to the business because the business has to survive and be profitable. And, you know, if you're spending all your time, like, thinking of the worst-case scenarios, one, you got to get a life, and two, you're going to spend way too much time, and your engineers' time trying to solve hypotheticals. DAVE: To be fair, the reason...so it wasn't hypothetical. The reason we came up with the meteor strike scenario is...I'll have to dig it up. There was a data center in Houston that had a transformer, like, the main power transformer inside the building shorted out. And it heats...it superheated the cooling oil, and it detonated. It didn't kill anybody because it happened in the middle of the night. JUSTIN: Wow. DAVE: But it was in the center of the building, took out all the servers around it in, like, a 20-foot thing, and then punched a hole through the ceiling. And the servers in there literally fell in the hole. And I can't remember who was on it. I was web admining for Schlock Mercenary. My best friend was doing a web comic. And all of Keenspace and Keenspot, like entire companies, like, their whole data center was just gone. I can't remember the name of it, but it's a name that you might recognize, especially if you're in networking. You would go, "Oh, I know them." WILL: I've got some [inaudible 56:52] words you guys might recognize: us-east-1 is down [laughter]. If you know, you know. I wish you could see the face that Kyle and Mike are making. MIKE: Yeah, instant recognition. You got to East, and I knew where you were going [laughs]. WILL: I don't think I'm allowed to use proper nouns, but us-east-1. Everybody knows who I'm talking about, and everybody knows what they do every other year. DAVE: Rack Shack, EV1 Servers was the one. 2008, they had a transformer explode. Sorry, 2003. And then it happened again in 2008. So, wow, wow. There's somebody who needed to be fired there, clearly. MIKE: Somebody needed to do the postmortem and take action. We're kind of reaching a good time to be shutting down. That cleanup matters. And maybe you determine, hey, you know, we can keep rolling on this donut for years [chuckles]. But you probably have some customers who need to be helped. And, you know, it's important not to neglect saying, "Okay, yeah, we've stopped the bleeding. What's the cleanup need to be?" Because there may be some important cleanup. And there's some, you know, people are going to care. People are going to care. Any final thoughts you all have? We've talked a lot about keeping our composure [chuckles], a lot about that, about the importance of having, like, an engineering sort of mindset. How do I fix this? Triaging, stopping the bleeding, fixing it, and pragmatically, you know, and then not neglecting the after, what comes after. Anything else you want to cover? DAVE: I have a strong religious belief that it is more important to be able to fix the problem than to correctly prevent the problem. Because if you correctly prevent a problem, you have not improved your capacity for dealing with something that you didn't correctly predict. But if you get good at solving the problems, you suddenly can stop worrying about missing something because you start to realize, "We'll handle it." You don't get cavalier. You don't deploy at 4:00 PM on a Friday and go home because you'll handle it. You don't get stupid, but it can help calm you down and say, "Yeah, this is what happens." JUSTIN: So, what you're saying, David, is, like, you should take prod down just a little, and then [laughs] and then that little inoculation [laughs]. DAVE: I wrote a tool called Tour Bus, which, over a conference room Wi-Fi, over a T1, so, like, 256K, with 200 people in the room surfing the internet over it. And I took out prod with it from my laptop while I was giving a talk on stress testing your server. And I did not get a talking to from the CTO because it was his pants that were down on the internet, not mine. And I didn't yank his pants down maliciously. I genuinely didn't think I would take out our prod servers. But there you go. So, give the emperor's new clothes a tug every once in a while. MIKE: So, Netflix, famously, I believe it was Netflix, correct me if I have anything wrong here, who had the tool called Chaos Monkey. DAVE: Chaos Monkey. MIKE: They would go and just break their system here and there, all the time, so that they knew their system would be resilient because unless they were testing it, they didn't know. WILL: I really like having a boring day at work [laughter]. DAVE: Me too. WILL: I like boring days at work [laughter]. I'm thinking I can ride with you on that one, Dave [laughter]. MIKE: I will say that, you know, they say, "Oh, it's always the thing you didn't think of." It doesn't matter how much preparation you do; there's going to be something you didn't think of. And we've talked some about monitoring along this and observability. I'm of a mindset that, given the choice between the two, observability is more important than hardening, not that they're not both important. But you're going to miss something. You're going to miss something when you're trying to prepare for whatever the attack is, because it's going to be some attack you weren't thinking of. And I say attack. It may not be malicious, right? Whatever bad thing happens, it's likely you didn't think about it. If you did think about it, you would've fixed it. But if you have really good systems to figure out what happened, you can solve that quickly, and if you don't, then you can't solve it quickly, and you're in a really bad spot. I've, for a long time, been of the strong belief that monitoring, that observability is the more important of the two. DAVE: Observability leads to good hardening. Good hardening does not lead necessarily to good observability. KYLE: Just to go along with your last point, I would say that monitoring is what you do to prevent historical events from re-happening. WILL: Ooh, I'm stealing that. I love that. MIKE: I like that. Hopefully, in your next production incident, you've taken something from this that helps you out. Until next time on the Acima Development Podcast.

18. maalis 2026 - 1 h 11 min

Acima Development

3 kuukautta hintaan 3,99 €

Lisää Acima Development

Kaikki jaksot

Vain Podimossa

Suosittuja äänikirjoja