Black Swan in the Data Center or How to Prepare for a Disaster
In today’s blog, I’d like to step away from the technical aspects of disaster recovery and focus on the human impact. With that in mind, I’d like to introduce you to one of my favorite books on the subject of predictability and inevitability: The Black Swan by Nassim Nicholas Taleb.
The book describes Black Swan events as unexpected and unpredictable. However, a person or organization can plan for negative events, and by doing so strengthen their ability to respond, as well as exploit positive events. Taleb contends in his book that people in general — and specifically within companies and enterprises — are very vulnerable to hazardous Black Swan events and can be exposed to high losses if unprepared.
Not if, but when!
Why do most of us not acknowledge the phenomenon of black swans until after they occur? Part of the answer, according to Taleb, is that humans are hard-wired to learn specifics when they should be focused on generalities. We concentrate on things we already know, and time and time again we fail to take into consideration what we don’t know. We are therefore unable to truly estimate opportunities and not open enough to rewarding those who can imagine the impossible.
Now, let’s get back to your data center. There is an obvious parallel between the theory of Black Swan events and the need for disaster preparedness for your critical IT assets.
There is no way we can predict hardware or software failures, human error or neglect, natural calamities or terrorist acts. But, once we acknowledge that some of these events inevitably will happen on our watch, we have already jumped the biggest chasm that separates us from being destroyed by a disaster vs. surviving it.
Assumption of inevitability, and preparedness – these two key steps will put you in a much better position.
Here are the key points to consider:
1. Control the Human Factor - Imagine a disaster hits. Your servers are down. You go to a previous snapshot just to make sure you have clean data. Your power has been disrupted, you don’t know how reliable it is in the coming days. You are getting distracting calls from your bosses, customers, and support - all asking the same question: When will we be up and running again? No pressure, but your job is in jeopardy. Will you (or your employees) calmly handle the situation, with confidence, while reliably executing all recovery tasks, mitigating risks, and repairing all the affected entities? Under the stress of this situation, humans tend to make mistakes. That’s a fact. No matter how good they are at their jobs. Especially since disaster is not really something you can practice. Or, can you?
2. Regularly Test Recovery - You know that your IT environment is never the same, especially in the age of the software-defined data center (SDDC). Software upgrades and patches, hardware changes, new applications, employee turnover, organizational changes – all these conditions may render the best disaster recovery runbook useless. But you won’t know it until you actually have a chance to try it. That’s why testing, and testing regularly, is critical to your recovery when it is real.
3. Leverage Automation - Even if you test, how do you know all the complex recovery processes will execute exactly as they are supposed to? We already talked about the human factor. How do you mitigate it? Automation is the solution. If your DR plans are not in the form of Word or Excel files, but are in a form that can be executed on at the push of a button, there is a much higher chance the solution will work exactly as designed.
4. Ensure Help is Available – If you are dealing with a disaster on at least a local or regional scale, will your staff be available? While your disaster recovery is critical to your business, your employees may be focused on protecting their families, saving their property, and otherwise making sure life in general is less impacted. Only after all these are taken care of can they will pay attention to the business. And, it may be too late by that point. Having someone off-prem, unaffected by the same calamity, who is capable of providing professional support while your people are dealing with their own problems is instrumental. Don’t underestimate the value of a strong DR partner.
5- Work with the Right Provider – Where do you find one? You likely do not have the time or resources to go and review all of them. Take a look at the latest Gartner Magic Quadrant for Disaster Recovery as a Service report. You can get a copy of the report here and see how Gartner views current trends and key providers in the market. You can also read a recent blog on the report from Mark Jameson, our VP and General Manager of the Acronis Disaster Recovery.
I hope the guidelines above provide valuable insight as you examine your disaster recovery needs. I invite you to download the Gartner white paper to learn more. If you want to start your disaster recovery planning, The 7 Rules of IT Disaster Recovery webinar could be a valuable resource. And call us if you need help! We are there for you – exactly what you need in case of a disaster.