Just because a system and its data are backed up doesn't mean the underlying data protection methods will work if something goes wrong. To find out, IT must verify that the systems are in place and function correctly. Facebook recently took that advice to the extreme by shutting down one of its data centers to test how the infrastructure and systems would perform.
"This is tens of megawatts of power that basically we turned off for an entire day to test how our systems were going to actually respond," Jay Parikh, global head of engineering at Facebook, said at the @Scale Conference in San Francisco.
The team's motivation for performing the tests was to learn to embrace failure and to react and recovery quickly, Parikh said.
Facebook's engineering team held trial runs before switching off the data center, though, because some employees feared that the test would affect the entire company. That pre-test work paid off, and shutting off the center was pretty uneventful, Parikh said. The test showed that the systems were working properly and identified several areas for improvement.
To be sure, it's a radical example of how to verify that systems are working, but it's a reminder that IT teams need to do the same — whether by shutting off a data center or following another verification method.
Image via Can Stock Photo