Disaster recovery strategy: five action items
January 28 was European Data Protection Day. It reminds companies that they are increasingly being held accountable by regulators and customers when it comes to data protection. Companies should automate the protection and recovery of data anywhere in the organization while ensuring 24/7 availability of business-critical applications. The following article provides five action items for a successful disaster recovery strategy.
According to Verizon Data Breach Investigations Report 2019 ransomware attacks account for 24 percent of the analyzed malware incidents. In Switzerland, too, the number of attacks continues to rise: as recently as July, the ransomware trio Emotet, Trickbot and Ryuk crippled the IT systems of the Offix Group. It is therefore becoming increasingly important for corporate decision-makers to put business continuity and data protection at the top of the agenda in order to remain operational in the event of a disaster. Against this background, a disaster recovery strategy is urgently needed. We show the five steps companies can take to prepare for a disaster scenario.
-
Set RTOs and RPOs
Not all business applications are equally relevant for companies. The requirements for recovery times are therefore individual. IT decision-makers should assess in advance which applications are mission-critical: If a business cannot function without a particular service, it should not be down for more than 15 minutes. The Recovery Time Objective (RTO) defines the maximum time that should elapse before all elements of an application are operational again. The Recovery Point Objective (RPO), on the other hand, specifies the maximum amount of data loss to maintain business continuity. Any recovery strategy must be guided by the defined RTOs and RPOs.
-
Automate processes
Automation is the be-all and end-all of a successful disaster recovery strategy. Recovery processes that are executed manually carry a significant risk of error. Companies should therefore establish failover and failback processes that run automatically, for example. Automated failover is particularly important for business continuity. If a mission-critical application fails, it ensures in the best case that the user does not even notice the system failure. Instead, failover initiates a switchover to another system that takes over for the failed one. Once the original system has been restored, failback involves switching the service back to the initial situation. The primary system is updated with the workloads that were produced in the secondary system in the meantime.
-
Evaluate failure scenarios individually
Depending on the situation, companies must flexibly decide which applications and systems should be restored. This may involve virtual machines, a large number of complex applications or an entire data center. The company's recovery strategy must be so adaptable that different scenarios can be handled smoothly and work operations can be resumed quickly. A detailed and comprehensive emergency plan helps with this.
-
Create unified policies for multi-cloud environments
If several different cloud environments are affected by a failure, the complexity of the recovery processes can increase - for example, because several employees who are familiar with the individual clouds have to be involved. The result is higher operating costs and longer downtimes. In addition, the probability of data loss increases. Multi-cloud environments should therefore be controlled centrally. Snapshot-based cloud backup solutions, which have been specially developed for dynamic multi-cloud workloads, can help here. With a suitable snapshot solution, companies can apply uniform policies for seamless backup of different clouds.
-
Regularly put processes to the test
This step in particular is often neglected in practice. Companies must regularly test whether the established disaster recovery process works and how much time it takes. The test phases run in the background and do not disrupt ongoing operations. The Veritas Resiliency Platform, for example, allows customers to set up the entire disaster recovery process with drag-and-drop ease. Real-time analytics can be accessed via an integrated dashboard. This allows organizations to monitor whether timeline targets are being met. At the click of a mouse, they receive reliable values on how long the entire switchover process takes and how much productive data is lost. This means that IT managers can use values determined under real conditions to describe the worst-case scenario, and crisis situations become calculable.
Author:
Sascha Oehl is Director Technical Sales DACH at Veritas