IT Horror Stories with Jack Smith

The Failover That Failed Successfully - Lessons from a Successfully Failed Disaster Recovery and Failover Test

33 min · 6 apr 2026
aflevering The Failover That Failed Successfully - Lessons from a Successfully Failed Disaster Recovery and Failover Test artwork

Beschrijving

Conducted during a busy release weekend, the failover test exposed gaps not in the technology itself, but in coordination and communication. While production ultimately stayed unaffected, the situation quickly escalated as subcontractors weren't aligned, assumptions didn't match reality, and information didn't flow when it mattered most. We unpack how a well-intentioned test turned into a coordination challenge, where timing, dependencies, and unclear responsibilities created confusion across teams. It's a story about how resilience isn't just about systems and infrastructure, but also about people, processes, and making sure everyone is on the same page — especially when things are supposed to "just be a test." Want even more info ? Read our show notes and related blog post related to this episode : https://blog.ithorrorstories.eu/episode-12-the-failover-that-failed-successfully/ [https://blog.ithorrorstories.eu/episode-12-the-failover-that-failed-successfully/] All other links and socials : https://links.ithorrorstories.eu/ [https://links.ithorrorstories.eu/] 00:00 Welcome & Setup 01:34 Corporate Environments 03:30 Failover Planning 07:19 Double Disaster 09:08 Critical Failure 13:20 Realization Moment 15:28 Split Brain 17:34 The Recovery 21:13 Lessons Learned 31:32 Conclusion

Reacties

0

Wees de eerste die een reactie plaatst

Meld je nu aan en word lid van de IT Horror Stories with Jack Smith community!

Probeer gratis

Probeer 14 dagen gratis

€ 9,99 / maand na proefperiode. · Elk moment opzegbaar.

  • Podcasts die je alleen op Podimo hoort
  • 20 uur luisterboeken / maand
  • Gratis podcasts

Alle afleveringen

20 afleveringen

aflevering Jack's Rants - The New Change Management - ITIL Change Control, Organizational Change, and Why Project Managers Suddenly Need Psychology Degrees artwork

Jack's Rants - The New Change Management - ITIL Change Control, Organizational Change, and Why Project Managers Suddenly Need Psychology Degrees

Once upon a time, Change Management meant raising an RFC, preparing a rollback plan, and surviving the Change Advisory Board. Today, it also means stakeholder engagement, communication plans, adoption metrics, workshops, and apparently having the emotional intelligence of a licensed therapist Whether you're a project manager, systems engineer, change manager, or simply someone who's ever heard the phrase "we've always done it this way," this rant will probably feel uncomfortably familiar. Want even more info ? Read our show notes and related blog post related to this episode : https://blog.ithorrorstories.eu/jacks-rants-the-new-change-management/ [https://blog.ithorrorstories.eu/jacks-rants-the-new-change-management/] 00:23 Introduction 01:23 Evolution of Change 02:39 Change Challenges 04:14 Human Factors 06:56 Practical Solutions 10:25 Conclusion

22 jun 202612 min
aflevering GDPR Enters the Chat - The Day a Hiring Exercise Became an Information Security Incident artwork

GDPR Enters the Chat - The Day a Hiring Exercise Became an Information Security Incident

It was supposed to be a routine hiring exercise. A candidate receives a technical assignment, reviews the provided material, and prepares a solution. Then someone notices that the "anonymous" dataset isn't anonymous at all. What follows is an uncomfortable discovery involving confidential business information, questions about data handling, and an unexpected conversation with legal. In this episode of IT Horror Stories with Jack Smith, we discuss how good intentions, poorly sanitized data, and assumptions about anonymity can quickly transform a recruitment exercise into an information security incident. Along the way, we explore data governance, confidentiality, risk management, and what organizations should do when they discover they've shared information they never intended to expose. Want even more info ? Read our show notes and related blog post related to this episode : https://blog.ithorrorstories.eu/gdpr-enters-the-chat/ [https://blog.ithorrorstories.eu/gdpr-enters-the-chat/] 00:00 Introduction 02:18 Security Breach 03:25 Social Engineering Risk 10:54 Company Response 12:46 Final Thoughts

15 jun 202614 min
aflevering Spooling Out of Control - Enterprise Printing Explained and Why Printing Still Breaks IT artwork

Spooling Out of Control - Enterprise Printing Explained and Why Printing Still Breaks IT

Printing is one of those technologies everyone assumes should have been solved years ago — until someone can't print invoices, shipping labels stop coming out, or 50,000 customer letters suddenly disappear into a print queue somewhere. In this episode, we take a tour through the strange and surprisingly complex world of printing: from small desktop printers and office multifunction devices to enterprise print servers and industrial mass-mailing environments that still move serious business. We explore why printing continues to survive every digital transformation, what actually happens behind the scenes when you click "Print," and why reliability suddenly becomes very important when paper turns into payroll, warehouse labels, customer communication, or operational downtime. Because nothing reminds people that IT exists faster than when the printer stops working. Want even more info ? Read our show notes and related blog post related to this episode : https://blog.ithorrorstories.eu/episode-15-spooling-out-of-control/ [https://blog.ithorrorstories.eu/episode-15-spooling-out-of-control/] 00:00 IT Horror Stories 02:00 Industrial Printing 06:00 Printer Installation 11:00 Mission Critical Printing 15:00 High Volume Printing 17:30 DTP and Mac Systems 24:00 Printing Challenges

1 jun 202626 min
aflevering Sleep Mode in Production - How a Laptop Took Down a Warehouse artwork

Sleep Mode in Production - How a Laptop Took Down a Warehouse

Welcome to the story of a perfectly known and visible laptop that somehow became critical production infrastructure inside a warehouse environment — without going through proper validation, testing, or operational review. The system worked, solved an immediate problem, and was quickly integrated into daily operations... right up until the laptop entered sleep mode and warehouse activities ground to a halt. What followed was a frantic investigation into why scanners stopped working, workflows froze, and logistics operations suddenly stalled because of a standard power-saving setting. It's a story about how quickly "temporary" operational solutions can become business-critical, and why governance, testing, and basic operational checks matter far more than people realize — especially when convenience quietly reaches production. Want even more info ? Read our show notes and related blog post related to this episode : https://blog.ithorrorstories.eu/episode-14-sleep-mode-in-production/ [https://blog.ithorrorstories.eu/episode-14-sleep-mode-in-production/] 00:00 Introduction 01:32 Early 2000s IT 03:07 Warehouse Down 05:07 Troubleshooting Begins 09:23 Cable Chaos 11:18 Director's Laptop 15:57 Lessons Learned 21:31 Conclusion

18 mei 202623 min
aflevering Ransomware Lockdown - What a Real-World Ransomware Attack Looks Like: Incident Response and Recovery artwork

Ransomware Lockdown - What a Real-World Ransomware Attack Looks Like: Incident Response and Recovery

We explore a hypothetical ransomware scenario that mirrors what many organizations could face today. Imagine a normal day where systems suddenly become inaccessible, data is encrypted, and the scope of the attack starts to unfold in real time. What follows is a race against the clock, with teams trying to understand what's happening while keeping critical operations alive. More importantly, we focus on the response and recovery: the trade-offs, the communication challenges, and the lessons organizations can take away before something like this actually happens. Because ransomware isn't just a technical problem — it's an operational and human one, where preparation and clarity matter as much as the tools in place. Want even more info ? Read our show notes and related blog post related to this episode : https://blog.ithorrorstories.eu/episode-13-ransomware-lockdown/ [https://blog.ithorrorstories.eu/episode-13-ransomware-lockdown/] 00:00 Introduction 01:05 Ransomware Explained 03:08 Attack Entry Points 06:16 Employee Training 13:00 AI Phishing Threats 20:18 Data Leaks Aftermath 25:26 Cybersecurity in Mergers 28:18 Recovery Steps 33:04 Conclusion

4 mei 202635 min