I was having a drink with a supplier the other day and he was explaining that during the outage “all of our settings had been replicated to DR, so we couldn’t switch to DR either.” The implication was that automatically replicating settings to DR was a bad idea, precisely to avoid a scenario such as this. Personally, I think it’s more about understanding purpose. What do you want to do with your DR solution?
- Deal with a complete physical loss of your primary site?
- Handle hardware failure on one of your machines?
- Handle a software failure on one of your systems?
Of these, only the first is actually disaster recovery. The second is known as redundancy and the third rollback. “Switching to DR” is intended for disaster recovery scenarios only. The problem is, these are all problems and all too often we try to solve all three at once under the heading of “DR” without actually thinking about what the letters stand for. Sadly, this tends to lead to solutions that achieve none of the aims. It’s not a question of balancing competing priorities here, it’s a question of recognizing that the problems are separate and so should their solutions.