Siting a disaster recovery location is a business process decision more than it is an IT issue. It’s a balance between business recovery requirements and the cost of achieving those requirements … risk versus reward.
IT’s role is to ascertain exactly what the business recovery time objectives (RTO) of the organization are. How fast do critical operations need to be back online before putting the business at serious risk? The complementary consideration is recovery point objective (RPO). To what point in time prior to failure must data be recoverable, as in near real time: one hour or a day? With these goals in hand, the task will be to present management with cost/recovery options and their associated tradeoffs, including human factors.
There will be more than one RTO and RPO. The goal is to determine what information and processes are most critical to your business and have the shortest recovery times and work back from there. Recovery times for different workloads could range, for example, from minutes to one week. RTO/RPO for data subject to industry or government compliance may be mandated, in which case, there’s no choice. Doing this for a handful of servers is obviously a simpler exercise than if you have dozens, but no less important. Not all workloads can recover simultaneously and the more urgent the RTO/RPO, the higher the cost will be to achieve it.
We’re assuming worst-case scenario here, since most disasters – power outage, fire, unrecoverable hardware failure, sabotage and so on – are isolated to that company. If you plan for the worst then less disruptive non-catastrophic events will be covered. This brings us to considerations for selecting the best back-up and recovery site at an acceptable cost in light of the requirements of the business.
This is a complex problem. Each organization must individually decide what solution best suits its needs. The steps that follow can help point you in the right direction but where you end up will be a matter of business analysis, budget constraints, staffing, technology, location, environmental and geographic considerations and other factors.
Far Away is Not as Close as it Used to Be
Post 9/11 the Federal Reserve advised that there be an appropriate level of geographic diversity between primary and backup sites for back-office operations and data centers. Exactly what constitutes “an appropriate level” is the issue at hand.
A relatively close site allows you to have tighter synchronization between the primary and secondary sites and to leverage your current staff and offsite backup services. However, a geographic disaster such as a hurricane or tornado could take out both primary and recovery sites. Placing the DR site too far away can create replication issues for some systems, perhaps even require a complementary workforce, which could end up costing a great deal.
Ideally, you have to be far enough away to be beyond the immediate threat you are planning for. Far enough is a regionally dependent term, based on the nature of disasters or risks in the region. What may work in the Great Plains or New England could be ill-advised for the West or Gulf coasts. At the same time, you should be close enough to get to the remote facility quickly (think RTO/RPO), preferably by car since airports may be closed and trains not running. Choosing a site accessible via a number of transportation options is preferable. Meeting all these parameters may not be practical, and risk versus benefit tradeoffs will need to be made.
Put Eggs in Many Baskets
Putting sufficient distance between primary and secondary sites can also included being on separate power grids, utilities and telecommunications services, ideally with redundant network connectivity and access between sites. Hurricane Sandy knocked out land-based and cellular phone services across large swaths of the Atlantic Coast, prompting many to consider adding redundant communications services such as satellite to DR requirements. The farther the separation, however, the higher telecommunication costs will be and the more limited data replication technologies will be.
A recent report from Gartner (How to Identify the Best Data Center Locations for Disaster Recovery, January 10, 2014) noted that if achieving a stringent RTO requires remote data mirroring, then the two sites should be within 60 miles of each other because of network latency concerns. “At more than [60 miles], even with tuning capabilities, latency issues will begin to impact performance and make synchronous or active/active recovery model unrealistic,” the report cautioned.
When it comes to DR planning, there is no “finally.” However, for our purposes here we’ll conclude with people considerations. Distance is a big factor when it comes to staffing the recovery site. Back-up staff may be unable or unwilling to travel long distances to the secondary site after a disaster is declared, either due to personal circumstance or transportation difficulties. Cross training staff is essential; two to three teams should be prepared to deal with recovering critical business operations because you never know who’ll actually be available in the event of disaster.
Colocation or DR outsourcing may not be a panacea, however, as costs of pretty much everything continue to escalate – including downtime– owning, operating and maintaining your own secondary site and supporting infrastructure become less and less justifiable.
Once the business requirements are understood and satisfied and the siting parameters established, running the backup site itself is pretty routine. However, there are two significant advantages to outsourcing some or all of DR operations. First, you and your staff are freed to focus on the strategic business aspects of IT and developing new revenue-generating solutions. Second, disaster recovery as a service (DRaaS) provides flexibility and adaptability that is not realistic if you have two owned facilities that must be kept in perfect synchronization to function successfully.
Evaluate all options and solicit ideas and advice from reputable and dependable sources. Your decision should be one that you, your users and your organization can happily live with for a long time.