* In order to ensure a successful implementation and customer experience, Acronis Maintenance and Priority Support (AMPS) is required for the first year. CUES July 27, 2004 John Brozycki Original on the CUES Website
The Zen of Disaster RecoveryIt's not just a plan; it's a state of awareness.
When developing a disaster recovery plan, one size does not fit all. A program that's appropriate for a local credit union with just one or two branches will not be the same as those for regional credit unions with a dozen branches, and that plan certainly would be inadequate for a credit union that serves thousands of members in a densely populated urban area. While having a plan is a start, it isn't enough. Metaphorically speaking, disaster recovery can be compared to a three-legged stool, with each leg representing the plan, the technology and the capability to implement it.
Key StrategiesAt $1.8 billion Hudson Valley Federal Credit Union in Poughkeepsie, N.Y., we employ a program that includes a variety of components, including offsite storage of backup tape with Iron Mountain to internal disaster recovery and disk imaging software from Acronis Inc. Each year the credit union's IT staff conducts a full-scale, off-site, disaster test that entails restoring systems that are identified as critical to the institution's operations. These systems include the core processing system, Internet banking services, key file and print servers, and Windows domain controller functionality, which manages user access to a network, including logging on, authentication, and access to the directory and shared resources. Remote branches connect to the recovery site and then test their ability to perform transactions normally. Our environment is a mixture of UNIX- and Windows-based critical systems. The IT network staff is responsible for the Windows systems while IT operations is responsible for the UNIX systems. (Some staff are dedicated to PCs and others to Unix because originally users were on dumb terminals connected to mainframe hosts. As PCs moved into the mix, support staff was added for those systems.) All servers are configured for redundancy via an array of independent disks and most support disk drives that can be hot-swapped out should one fail. Each year the number of critical systems and our reliance on these systems increases. Currently there are about a dozen critical systems, housed at headquarters. A key component to Hudson Valley FCU's disaster recovery plan is disk imaging. Having a tape backup of a system is important for backing up individual files, but it is the disk image that keeps a record of the state of the whole system. We 3 regularly create backup images of each critical server using Acronis True Image Server. We can also schedule incremental backups on a daily basis, so maintaining a current image of each server is fast and easy to create. The image can be used to restore the entire system in case of a disaster; tape is useful to restore individual files. File-based backups cannot restore hidden or open Windows files, and therefore cannot be used for a bare-metal restore should a disk drive, or an entire redundant array of independent disk system, fail. A bare-metal restore essentially requires rebuilding the entire system including the operating system, all updates, patches, configuration files, user data and the like. The images can be stored over the network to remote storage devices or can be stored locally on removable devices, such as a writable CD or DVD. It's also possible to store images across the Internet using the Microsoft Server Message Block/Common Internet File System protocol, although that approach is not currently employed at our credit union. Because Acronis is able to image a live server, we can create the image without interrupting any server or user operations. Having a point-in-time snapshot of the server is critical to maintaining a complete record of not only the server's data, but also the operating system, all installed software, patches, security updates and configuration files. Restoring a server from a bare metal state could take days. File-based backups from tape can take several hours (more if you must restore incremental tapes as well.) Our Acronis backup takes a matter of minutes. This is a critical consideration with high availability systems.
Real-Life ExampleIn August 2003, much of the Eastern portion of the United States was plunged into darkness by a massive power failure. Primary power was cut to the credit union's computer systems, forcing us to employ first the uninterruptible power supplies protecting our servers and then backup generators to supply our core systems. Power to non-essential systems was lost. We were able to back up all our core systems completely, exactly as our plan calls for, during the outage. Connectivity was maintained to the Internet using a dedicated, powered connection so that our Internet banking service, which handles more transactions than any of our physical branches, was unaffected. This was important in allowing our geographically dispersed membership to access their accounts, provided they had powe r and Internet connectivity. Although power to our branch offices was lost, there we re no critical systems housed in those offices. If the corporate office has a power or systems failure, uninterruptible power supplies, then external power generators are in place to fill the gap. If that's not enough, then IBM steps in to help with business continuity. All branch transactions are recorded in real time on the primary servers at the corporate offices. If computer connectivity is lost at a branch, the branch can still record transactions manually on paper. However, when power is lost, the branch is generally required to close because of physical security concerns. To date, the credit union has yet to experience a data loss or emergency where it was not able to execute its disaster recovery plan. Having the plan, plus the appropriate hardware and software, to do this is critical in being prepared in case disaster strikes. But nothing takes the place of frequent practice and training to ensure staff is ready and able to execute the plan.
|