Let’s be honest. When we talk about Disaster Recovery, what we are normally talking about is backups. Be they inline backups, off-site backups, air-gapped backups, or any of the other myriad ways to backup up data, they are all just backups. When many administrators look at how their backups are running, they look at the backups and see that the jobs are completed. When management looks at the “DR” dashboard, they see that the backups are running and think all is well.
I would like to take a moment to flip the script. Has said admin done a restore test recently? Or ever? Is the data recoverable? How long does it take an organization to pull the data from the backup repository and restore the data and their services to the user community?
So, let’s talk about the recovery part of DR. This should be the most important part of the DR process, as a validated and rehearsed recovery plan will reduce the downtime and lost productivity when the disaster hits.
Check out our eBook to learn the Top Reasons Why Your Disaster Recovery is Destined to Fail.
The importance of validating backups
The process of creating backups is not going to save your company, your job, or your reputation. It’s the restore. It is a simple process to create backup jobs, but do you spend enough time making sure they work? Also, keep in mind the restore time. If it takes you two weeks to transfer from your backups to your servers, and then bring the servers to an operational state, is this really a valid restore?
In backup terms, we have something called “Recovery Time Objective.” There’s a lot articles out there that say in so many words, when someone of authority yells “DISASTER!”, you will have the business up and running up in X amount of time. That time window is your RTO. This time window should include acquiring any sort of data reacquisition. Processes such as off-site tape, downloads from a third-party DR provider, or transfer of data to your servers, take time. And that’s just assuming you don’t need to get any new gear, and restoral of services to the user community.
But how long will this take? Your DR tool may estimate 2-hours to transfer the data, but how long will it really take? That’s why we do the next step.
Tips for performing a Disaster Recovery Process Rehearsal
These events come in different names. DR Tests, disaster dry runs, Recovery Tests, and so forth. Industry wide, the terms “Disaster Recovery Tests” or DRTs are used. I like the phrase, “DR Process Rehearsal”, as I like to include in the DR test more than just what buttons you need to press on the servers to make them work. Things like contact lists, what-if scenarios, standard scripts for when people ask questions on the status and state of recovery, and an actual event where the staff practices the recovery would be included in a DR Process Rehearsal.
1. Ensure your Disaster Declaration is clear and defined
In the Rehearsal, you would practice a few basic items. The first may sound simple, but the act of declaring a disaster should be clearly defined (and yes, practiced) to ensure that it is calm, done with authority, and with a clarity that implies that this the correct course of action.
The person making the disaster declaration may be a single IT person, owner, Vice-President or director, or a committee. But it is important that the entity that can make this call be defined prior to the disaster. The other point is to have the staff know in no uncertain terms, that when this person or group declares an official disaster that the declaration is broadcasted, respected, and the corrective courses of action are known and begun immediately.
2. Execute your Recovery Plan to find areas to improve
So, once the disaster is officially declared, you can then jump into the recovery. This recovery should be documented, so there will not be guesswork. During this part of the DR Process Rehearsal, you should execute on the recovery plan exactly as it is written. It may be wrong. This is ok. Even if you know it is wrong, make a note of it and then do it exactly like it is in the plan. When the rehearsal is over, there will be a chance to discuss what went right and wrong, and then corrections to the plan can be completed for the next rehearsal. It may sound counterintuitive, but it is essential that the plan be followed exactly. The purpose of a DR Process Rehearsal is to identify errors in the plan and correct them so that the next Rehearsal (or a real DR event) will go as smoothly as possible.
3. Conduct regular DR Process Rehearsals
The DR Process Rehearsal should also be done on a regular basis. We may have a list of instructions in front of us that says what to do at each step. But, as humans, we can do things more confidently and quickly if we repeatedly practice the process. Football players practice their plans over and over again, so that they can execute their plays on game day. (Please ignore this analogy if your favorite team is coming off a 4-win season or less.) Actors will rehearse their scenes and lines many times before performing their plays in front of a live audience. As people practice their plays, scripts, or DR Processes, they become more proficient at it and more comfortable at it. It becomes second nature, making it easier to execute the script under the duress of a live event.
By having a defined plan and a practiced DR Process, you and your staff will not only know the process but be comfortable in the process. A disaster can be a chaotic time in your company. By having the plan in front of you and having everyone comfortable with the plan, you can the pillar of strength and reason. Both are commonly found to be missing from many people during a disaster. You can be the calm in the storm and bring everything back to normal quickly, easily, and stress free.
We are Anexinet would love help you be that calm, please reach out to us so we can discuss how. Check out our Disaster Recovery Kickstart to upgrade your DR Plan to eliminate vulnerabilities in just three short weeks.
Have industry news sent right to your Inbox