AFCOM November 07, 2008 by Ed Harnish Full text of original article at AFCOM web site
How to Recover From Disasters in a Virtual WorldBad things happen to good computers, from hurricanes and earth¬quakes to corrupted software installations and computer viruses. There is no way to avoid the inevitable natural and man-made disasters that occur but there are ways to be prepared for them. In a corporate environment, work, not time, is money. Ensuring that work-digitized data, custom-built software applications, sys¬tem configurations and all other computing resources-survives the next disaster is the only way to guarantee that a company does not fall victim to its own lack of preparation. The key to surviving any corporate disaster is being able to restore computers to a known, working state in the shortest possible time-minutes rather than hours or days. Virtualization may be a new addition to data centers, but the motto of being prepared still rings true. Regardless of what is in a data center, surviving a disaster requires a lot more planning than sim¬ply making weekly, or even daily, backups. A backup of a hard disk will do no good if the data center man¬ager cannot restore the system quickly. File-based backups, while
helpful for some uses, are inadequate if a server or workstation needs to be restored completely since the file-based backup will not have a current copy of open systems files. The system could be booted to DOS to make a file-based backup, but then it will be offline and the worker non-productive. Not only does that defeat the purpose of having systems up 24/7, it also could introduce far more serious problems if the restoration is done incorrectly or incompletely. Some vendors offer software that will image open systems files but cannot restore those images to dissimilar hardware. This alone can defeat the disaster recovery process if a backup cannot be restored. Remember, no one can predict to what hardware an image will be restored. It might be a server already on premise or it might be a box at a remote disaster recovery center. Other vendors claim backups can be restored to different hardware, but first a network manager is needed to completely reconfigure the network, all user settings and configurations, plus additional system network settings. While a restore to different hardware is possible, this downtime defeats the purpose of a disaster recovery system. Remember, the system to which an image is restored might not be local. For instance, if the disaster recov¬ery server is in Colorado and a company's office is in New York, a technician might not be able to be dis¬patched to do the reconfiguration. At that point, the restoration simply would not work. There are two ways to protect virtual servers. One popular method is to back up the files that comprise the virtual machines from outside on the host operating system. The other is to back up the virtual machines from within, effectively treating each virtual machine as if it were a distinct, physical server. This latter meth¬od uses the same approach to backup that most IT managers use today for their physical boxes. In practice, backing up production virtual machines from outside is inefficient and not recommended because running virtual machines keeps state information in memory. With this approach, backing up only the .VMC, .VHD, and other files will not capture the complete state of a running virtual machine in its entirety. Although virtual machines can be backed up when they are not running, the benefit of having the machines running all the time is lost. It is inefficient in a production environment. That said, the preferred approach is to back up virtual machines from within and treat each as though it was its own physical box. That means the proper software licenses are needed for not only the operating system but also the backup software. Some backup products are licensed by the physical machine, some by the number of CPUs per system, and others by the number of virtual machines. The preferred choice is to use an agent backup system. It can run on the host and within virtual machines. This ensures that all VMs and the host are backed up correctly. Data center managers can back up on de¬mand or schedule backups for the host and each virtual machine as necessary. Creating an image of a virtual machine provides the same benefits as creating an image of any hard disk whether it is in a workstation or an enterprise server. Images capture every bit on the disk, including open Windows files, encrypted data, configuration files, network settings, etc. With an image of a VM or the host system, data center managers can be assured that if anything physical or virtual goes wrong on any disk, they can restore it to a known, good working state. Virtualization provides significant cost and management benefits for corporate data centers. As more orga¬nizations move toward virtualization, they need to be aware that risks remain prevalent. Virtual servers are subject to the same variety of loss scenarios as traditional servers, as well as some additional ones that arise from the nature of virtualization technology, including: complete hardware loss due to theft, fire or simi¬lar disasters; hard disk corruption or failure; compromise of host and/or guest operating system, whether by virus or similar malware, software failure or intentional hacking; or human error, including accidental deletion or modification of a virtual machine, hard disk or its files on the host. With that, here are best practices for recovering from a disaster in today's new virtual world: - Develop a disaster recovery plan and test it frequently. Even with a plan in place, if a staff cannot put it into operation or does not understand how it works, having the plan is ineffective.
- Back up physical and virtual systems with a disk imaging solution that allows the image to be restored to dissimilar hardware. It is important to make sure the software is hardware and operating system neu¬tral. New hardware drivers should be able to be added during the restore process. It is also very likely that virtual assists and physical ones run on different operating systems.
- Make sure imaging software does not delete critical configuration information, such as the security ID number, network configurations, user configurations or other critical data that would require a network engineer to reconfigure.
- Restoring an image on a remote server should not require dispatching a technician. Make sure a re¬mote server can be booted, even if its operating system has failed. The ideal scenario should allow data center managers to boot directly from the physical or virtual server's image.
- Today's enterprises recognize that the future is in 64-bit hardware and software. Disk imaging software should support both the emerging and legacy technologies.
- Corporate networks often are made up of multiple network domains. Ensure imaging software can cre¬ate and store images across domains.
- Support for multiple databases is very important. At a minimum, imaging software should support Micro¬soft's volume shadow copy service. For compatible databases, this allows the database to be suspend¬ed during the creation of an image. It is also useful to have agents for other databases.
- Console-based management is critical. A management console must be able to manage both servers and workstations, including computer groups, backup policies, scheduling, notification and other policy issues.
- Imaging software should be able to store an image not only across the network, but also on the same physical disk being imaged. This can be useful for the quick restoration of a file or folder; it is not rec¬ommended as the only location for storing an image as it would be ineffective if the disk drive itself failed. However, for a software failure or accidental deletion, this hidden partition could provide the fast¬est recovery of a file or folder.
- It is important that all IT employees be familiar and comfortable with creating and restoring images. If
only the most highly trained engineers can conduct backups, then personnel resources may be misal-
located. During an emergency, the most skilled engineers should be solving the problem, not restoring
the server.
|