Find articles

By author: Edward Nowlan

Disaster Recovery Plan

If a major disaster hit the Williams campus, what would happen to our computer networks? Which services and systems are essential and would need to be restored as quickly as possible, and which ones could wait longer? What steps could we take now to minimize the impact of a disaster?

Networks & Systems had a busy summer answering these questions and revising OIT’s disaster recovery plan. Jesup is a vital hub in the College’s network, and houses most of our servers and wireless controllers. If Jesup were partially or totally destroyed, the network would be crippled. Creating a redundant hub located away from Jesup that could replicate the College’s most critical services became our top priority in preparing for disaster recovery.

What if Jesup were destroyed?

Broken

Working with the College’s administrative offices, we determined which services and systems counted as critical, and prioritized them into categories:

Level 1– critical services that must be restored within 72 hours with minimal data loss. Data is backed up regularly in a co-location facility in Albany.
Level 2a – critical services to be restored immediately following Level 1 services, with a larger tolerance for data loss. Data is backed up in a co-location facility in Albany, but less frequently than Level 1.
Level 2b – services to be restored after Level 2a systems that do not require off-site backups.
Level 3 – systems that can wait until new equipment is purchased.

We are working with facilities to prepare a data closet that will store enough servers and storage to replicate all Level 1 and 2a services. The redundant systems are not as powerful as the main ones, but will be adequate to carry us through a disaster until new hardware can be procured.

The data closet equipment will be a combination of physical and virtual servers. Properly configured virtual servers can be quickly and easily migrated from one environment to another. Level 1 and 2a systems will either have their data replicated in the closet, or be synchronized with remote data sources so that they can continue to function. Although our original plan gave us 72 hours to get these systems up and running, our new configuration will allow us to get all Level 1 services restored within 8 hours or less, and possibly all Level 2a systems by the next business day.

In addition to the servers and storage array, we will also be moving some core network gear into this facility. By moving a core router in to the data closet, we should be able to keep about 70-80% of our wired network buildings up. After patching fiber, we could get all but a few buildings back online. We also plan to move half of our wireless controllers away from Jesup. That way if we lost Jesup, we would still be able to support nearly 500 of our 800 access points. By purchasing additional controllers at the time of the disaster, we could increase that coverage up to nearly 100%.