| |
| Introduction and Background |
| |
| In today’s world, business interruption whether from an operational failure or a disaster is a constant possibility. With businesses increasingly depending on timely availability of information more than ever to manage today’s businesses, information is the business. If one part of the information infrastructure goes down, all other parts are affected. When a natural or manmade disaster interrupts business operation, top priority is to get operations, facility, and systems up and running in the shortest possible time at the same time keeping the skeleton services maintained to keep the organization into the business. In this write-up I would try to explain: |
|
| |
How to handle disasters in case it happens? |
|
| |
How to continue business during / after disaster? |
|
| |
How to rebuild the IS facility after the disaster? |
|
|
| |
| Everybody has his/her own perception of the disaster but broadly I think following may fit well into the definition broadly. |
| |
| Disaster Severity: |
| |
| In this DRP document we have categorized disaster severity mainly into three levels: |
| |
Level 1- Hardware/software / IPF failure (Disaster where by a single user or single group of users are effected) |
|
| |
Level 2 - Partial unavailability of data center (Where by multiple group of users are effected) |
|
| |
Level 3 – Unavailability of the total IPF (Data centre is down) |
|
|
| |
| Actions: |
| |
| Different courses of action are implemented depending on the severity of the disaster. |
| |
| Level 1: |
| |
If there is a hardware failure, the local vendor is called to report the failure and have an Engineer dispatched to the site to fix the hardware problem. For some critical systems, an onsite hardware replacement exists (in addition to the maintenance contract & Service Level Agreement).
If there is software (OS, Database, and Application) failure and can not be fixed then local vendor is called to determine if it can be resolved over the phone (he would be called immediately if the software falls under critical category as mentioned in your asset register). , Otherwise a local Software/ hardware Specialist is dispatched to the site to fix the software problem. If the failure cannot be fixed by that way, global helpdesk for that particular application shall be contacted (using email & telephone).
If data is lost due to a damaged disk drive, the disk drive is replaced and data is restored from the backup placed offsite. Data that is deleted by accident due to user error is restored from the backup as well. |
| |
| Level 2: |
| |
| If the computer room is partially operational due to electrical failure, fire, water damage or any act of crises due to any reason etc, the administration head or whoever other is responsible is notified about the incident and the appropriate vendors are called by the IT department to report the failure to the concerned vendor (electrical, water, air condition, hardware etc) so that they can dispatch their personnel to fix the problem immediately. |
| |
| Level 3: |
| |
| If there is total facility damage due to natural or human act the recovery team will be notified to implement the disaster recovery plan. |
| |
Plan A – inform the disaster recovery site about the disaster and dispatch the disaster recovery team to the disaster recovery site. |
|
| |
Plan B – if additional servers are needed, the first option is to rent servers from a rental company and the alternative option is to borrow loaner servers from platform vendor partners. |
|
| |
Plan C – if facility cannot be brought up with in the required time with additional equipment and hardware then activate the designated Warm Site to provide service. Organizations' Designated Warm or Hot sites can be outside city or even country. |
|
|
| |
| For the details, please consult Organization’s business continuity manual. |
| |
| Objectives of BR & BCP |
| |
Limit the magnitude of the loss |
|
| |
To minimize extent of interruption and the severity of the disaster. |
|
| |
Define alternatives for continuing critical services. |
|
| |
Establish in advance a method for the recovery of IT operations. |
|
| |
Minimize decision making during the crisis. |
|
| |
Rebuild the data processing facility, if needed/necessary |
|
|
| |
| Scope of work |
| |
| A well thought out plan would identify various kinds of disasters & how business continuity could be carried out during these. In addition to that it would minimize the risk of any damage or disaster that may occur by taking necessary steps and assign responsibilities to the individuals for carrying out these steps. |
| |
| How to avoid Disasters? |
| |
| It is almost impossible to cover all aspect of disasters, as it is difficult to forecast nature and magnitude of a catastrophe. However by studying common information security related disasters, following are the findings and key recommendations to avoid and minimise its impact; |
| |
Keeping the Physical network Infrastructure secure from fire and unauthorized access. |
|
| |
Properly planned & implemented Backup and Restore procedures for the safety of Data. Maintain daily backup of the entire system and keep it Offsite. Restore backup randomly to make sure that backup has been taken properly. |
|
| |
Carefully planned and properly tested Procedures to handle disaster
|
|
| |
|
| |
| Let me explain the underlined very important key-words one by one |
| |
| Physical Security: |
| |
| Management and related departments usually observe physical security procedures in the facility, and encourage increased security when appropriate by way of formal and informal meetings. It should be ensured that Data Center is the most secure area with restricted access that does not allow “tailgating”. |
| |
| Proper measures that can be taken in this regard are as under |
| |
Keeping the Physical network Infrastructure secure from fire and unauthorized access. |
|
| |
Properly planned & implemented Backup and Restore procedures for the safety of Data. Maintain daily backup of the entire system and keep it Offsite. Restore backup randomly to make sure that backup has been taken properly. |
|
| |
Keep areas clean and free of obstructions & fire hazards. |
|
| |
Rehearsal of office staff to evacuate the building in case of emergency (at least once in year) |
|
| |
Training some people in the Health and safety area. |
|
| |
Badge systems for the employees. |
|
| |
Biometric access controls for restricted area(s). |
|
| |
Checking of visitors & employee vehicles. |
|
| |
Checking of visitors as well as employee belongings |
|
| |
Installation of fire extinguishers (should be automatic if possible) |
|
| |
Regular checking of electrical circuits (look for, and eliminate/balance, any obviously overloaded circuits) |
|
| |
Backup files and documentation stored within the premises and offsite. |
|
| |
Fences or gates wherever possible. |
|
| |
Security guards during day and nights to watch any susceptive activity. |
|
|
|
| |
And above all, users' training is very important to enforce these controls.
Please refer to the Physical and Environmental Controls and information related Information Security Zones within Your Organization” for details about above mentioned measures. |
| |
| Identification of Critical Asset (Softwares/Hardware/ Applications): |
| |
| Probably the most important aspect of the BCP is that we should know what is important to an organization? Data of course! But where exactly is the Data? IT department should not make their own judgment in finding out the critical data and identification should be done by the relevant stack holders/ business units. Following is what I feel could be broad categories of critical software/ applications/drivers etc. |
|
|
|
|
| |
Internet Proxy/ Browsing: |
|
|
|
| |
Customized: Any application that is specific to your organization only… |
|
| |
Protection Software: Antivirus/ Anti spam etc |
|
|
| |
Device drivers: All kinds of device drivers for all important servers |
|
|
| |
| Identification of Critical Hardware |
| |
| Following is a list of critical hardware devices |
| |
Main Computers: Servers etc |
|
|
| |
Network Infrastructure: IPS, Firewalls, routers & Switches etc |
|
| |
User’s equipment: Workstation/ laptops etc. |
|
|
|
|
| |
Wireless connections/ Access points: |
|
| |
Bio Metric: Barcode Reader (Metrologic) |
|
|
|
|
| |
| The best would be to maintain an asset register containing all of the above, highlighting the most critical one and then a business Impact analysis should be carried out. |
| |
| Critical Hardware/ software/ applications/ Documentation |
| |
Please refer to the Organization’s Asset register for detail.
There should be a repository for individual users (any mapped drive, lets say H: for personal files, K: for common files, G: specific files to be shared within his/ her own department) for storing their critical data. All other users should be briefed where to store their data to facilitate IT department for an easy recovery (backup/restore). |
| |
| Network Topology Diagram (LAN/WAN): |
| |
| A most up to date network diagram of your company should be maintained. I would recommend to prepare a comprehensive network reference book (you can call it your Network Bible) containing every possible detail of your network in order to facilitate recovery, in case the main resources responsible for recovery are not there - for any reason. |
| |
| Backup Details: |
| |
| Most critical data should be located on data repository server. Carefully compile a list of data to be backed up in consultation with different departments and make arrangement to update it regularly. |
| |
| Precautionary Measures: |
| |
| Backups: |
| |
Scope: To maintain daily backup of the system and keep it off-site
Purpose: Data is the most critical component of IT Operations and it should be protected from any site loss. Our current practice is that we are taking back up in both tape cartridges and Hard disks.
Please refer to the appendix I, “Backup and restore procedure” for the detail |
| |
| Restore: |
| |
Restore entire system or partial on backup Server
Action: Recommended at least once in a two months or on need basis
Backup is restored on need basis or otherwise once in a four months time period (or as appropriate to your requirements) and it should be logged properly as mentioned in the Appendix I. |
| |
| Offsite: |
| |
Keep backup tape cartridges/hard drives and original CD off-site.
Action: Always keep backup of the data (taken in whatever form - either in tape cartridges/ hard drives). One Original copy of the software should be kept offsite
Please refer to the appendix II for details of the off-site operations. |
| |
| Site disaster: |
| |
| A procedure should be documented with all possible details in order for a new site to become operational, if in case the original site is down. |
| |
| Objective: |
| |
| Purpose of this exercise would be to: |
| |
| |
Create a temporary server setup (if not permanent!), so that at least the applications with highest Business Impact (as mentioned in the BIA document) keep operating. |
|
| |
Create a temporary network (if not permanent), so that Key / Line managers and important users that are working on different important projects should be in touch with the customer to ensure that they are there & to keep the business running! |
|
|
| |
| Appendix I |
| |
| Offsite |
| |
| Purpose: |
| |
| Purpose of the offsite storage is that in case your actual site is suffered with a disaster then you should have enough resources to rebuild your IT infrastructure and data processing facilities. |
| |
| Objectives |
| |
| It should be a secure place with very limited access to store/retrieve data on daily basis (protected by surveillance camera). A fireproof steel cabinet is recommended to be placed at Offsite location. It should be a place located away from your current office location and should be readily available for IT department as and when required. All Critical Hardware/ software/ application/ documents should be stored Offsite. |
| |
| What to Store Offsite: |
| |
| Following important things could be stored offsite |
|
|
| |
IT important files like (Hardware/Software inventory, licensing information etc.) |
|
| |
Hardware Recovery Document |
|
| |
IT Reference Book/ Network Manual |
|
| |
Following is a list of important “Software’s and documents”, which should be kept offsite |
|
|
| |
| Appendix II |
| |
| Server Specification |
| |
| Recommended Configuration for servers |
| |
For example, following is an example of what could be required by the IT to rebuild a s specific service. |
|
| |
Processor: Intel Quad core |
|
| |
Mother Board: Intel/ Asus |
|
| |
Hard Drive: 300GB (At least, the bigger the better) |
|
|
| |
Network Card: 10/100/1000 Mbps |
|
|
| |
Keyboard and Mouse: Standard |
|
|
| |
| Minimum Configuration |
| |
Processor: Intel/ AMD Pentium 4 - 3GHz (or equivalent) |
|
| |
Mother Board: Asus / MSI etc. |
|
|
|
| |
Network Card: Any standard 10/100 MBPS |
|
|
|
|
| |
| Appendix III |
| |
| How to handle Disaster? |
| |
| Disaster Recovery Management Team |
| |
| The disaster recovery management team, which shall consist of technical support personnel, and senior management team members to facilitate decisions, shall handle disaster recovery operations. |
| |
| Responsibility Matrix |
| |
| Although all IT team is responsible for the recovery of crashed system(s), however there should be a responsibility matrix that explains the major responsibilities of all the individuals responsible for a recovery keeping in view their functional areas. |
|
|
|
|
| |
Business Application / ERP Manager |
|
|
|
|
|
|
| |
| Write down the details of the responsibilities of all the above. |
| |
| For example following could be some of the responsibilities for various roles: |
| |
| Finance Manager: |
| |
| He/she shall make sure that the funds required by the IT department are available for the arrangements of the servers and to restore the systems and for the smooth running of company business during the temporary network building phase. |
| |
| Head of IT: |
| |
Co-ordination with other departments to obtain their requirements and to formulate an emergency working plan for smooth running of company IT operations.
Another job for the Head of IT is to coordinate with the management for quick decision making for the resources required to restore data recovery operations and to get new equipment on permanent basis to convert temporary network into permanent. It shall be his responsibility to keep management up-to date about latest regarding IT Services Continuity and Recovery. |
| |
| Important phone numbers |
| |
| You need to list down the names and contact details of important persons who have a role to play in Disaster Recovery and Business Continuity for your organization. |
| |
| ALL Vendors Contact Details: |
| |
| (Emergency phone numbers of the nearest Fire brigade/Police station and hospitals are displayed at appropriate places, available with admin department) |
| |