Disaster Recovery & Business Continuity
Since last month's issue, we have ourselves suffered a minor disaster from which we were able to recover within 24 hours, although our critical customer-facing systems were unaffected.
At lunchtime on 11th April we experienced a severe thunder storm and although our building wasn't struck by lightning, there was a tremendous thunderclap immediately overhead and we lost our Internet access, while the server froze.
Further investigation showed that the server was fine and just needed rebooting, but that our broadband router was completely dead. All of our critical equipment is protected from power surges by an APC SmartUPS, and there was no record in the logs of power abnormalities at the time of the thunderclap. However, both the server and the router were connected to the broadband 'phone line. (The server has a fax modem connected.) What had happened was an electrical surge on the 'phone line that had temporarily overloaded the connected equipment.
Our HP server is fairly robust and only the fax modem would have suffered from the surge, but in the event it turned out to be undamaged.
The router couldn't be coaxed back to life, and this not only meant that our Internet connection was down, but that our incoming Voice-over-IP (VoIP) calls wouldn't get through.
However, our Business Continuity strategy for 'phone lines means that the VoIP services are remotely hosted and were already set up to automatically re-route calls to our outsourced answering service in the event that our Internet connection became unavailable. This meant that customers were still able to call us on the usual published telephone numbers and speak to someone who answered in our company name and could take details to pass onto us.
Our public website is also remotely hosted and continued to run so that customer could place orders online.
Our strategy for a router failure was to replace it as quickly as possible. We tried to contact the original suppliers for a replacement, but it transpired that they had gone out of business, so this meant we needed to research a new supplier, but without our usual Internet access. An aspect of our broadband strategy is that our Internet Service Provider (Demon Internet), provides a fall-back dial-up service. So we were able to fire up a dial-up connection from a laptop and browse for a suitable supplier online, albeit a little more slowly than usual!
We purchased a replacement and arranged for next-day (Saturday) delivery. The new router arrived by 9 a.m. and was installed with all our Internet services restored by lunch-time.
The remotely hosted VoIP services automatically switched to routing calls through our Internet connection and so within 24 hours, all was as it had been before disaster struck.
We have learned two lessons from this:
- Within 2 days of the problem we installed two APC SurgeArrest units on the connections between the broadband wall socket and the new router - one on the voice circuit used by the fax modem and one on the main ADSL link. So this should protect us from any future surges.
- The scale of our business does not justify holding standby equipment just in case of failure, but we now know that we need to maintain contact details of at least two current suppliers of any critical equipment that we might need to replace at short notice.
We hope that you might learn something from our experience and review your own Business Continuity plans.
What is important is to asses the potential risks and develop a strategy for dealing with them. Clearly, the potential cost of circumventing or limiting the effects of a particular type of disaster need to be weighed against the likelihood of the incident occurring and the consequences to your business that would result.
If you would like help or advice on Business Continuity, then please contact us |