IT Horror Stories with Jack Smith
Conducted during a busy release weekend, the failover test exposed gaps not in the technology itself, but in coordination and communication. While production ultimately stayed unaffected, the situation quickly escalated as subcontractors weren't aligned, assumptions didn't match reality, and information didn't flow when it mattered most. We unpack how a well-intentioned test turned into a coordination challenge, where timing, dependencies, and unclear responsibilities created confusion across teams. It's a story about how resilience isn't just about systems and infrastructure, but also about people, processes, and making sure everyone is on the same page — especially when things are supposed to "just be a test." Want even more info ? Read our show notes and related blog post related to this episode : https://blog.ithorrorstories.eu/episode-12-the-failover-that-failed-successfully/ [https://blog.ithorrorstories.eu/episode-12-the-failover-that-failed-successfully/] All other links and socials : https://links.ithorrorstories.eu/ [https://links.ithorrorstories.eu/] 00:00 Welcome & Setup 01:34 Corporate Environments 03:30 Failover Planning 07:19 Double Disaster 09:08 Critical Failure 13:20 Realization Moment 15:28 Split Brain 17:34 The Recovery 21:13 Lessons Learned 31:32 Conclusion
19 Folgen
Kommentare
0Sei die erste Person, die kommentiert
Melde dich jetzt an und werde Teil der IT Horror Stories with Jack Smith-Community!