You are welcome to reproduce these articles in internal organisational publications provided the source is clearly identified
A critical question : Is your job really necessary in a crisis? Continuity Volume 7, Issue 1
Stuck in the tunnel! Blueprint (the Journal of the Emergency Planning Society) March 2003
How has 9/11 affected Business Continuity thinking and outlook? Business Continuity Management Forum 2002
The reality of Worst Case Scenarios : Continuity
(the Journal of the Business Continuity Institute) Volume 4, Issue 4
Is Business Continuity
relevant to Emergency Planning? : Blueprint (the Journal of the Emergency
Planning Society) June 2000
Justifying the Business Continuity Project
: Continuity Volume 2, Issue 1
Risk Evaluation and Control : The Definitive Handbook of Business Continuity
Management ed. A.Hiles and P.Barnes (Wiley)
Is risk management
relevant to the BC Manager? : Continuity Volume 2, Issue 4
Making
a success of BCM - The role of the Independent Consultant : Facilities
Management Today, March 2000.
|
|
‘The critical operational and/or business support activities (either provided internally or outsourced) without which the organisation would quickly be unable to achieve its business objective(s) i.e. services and/or products’.
The term ‘critical’ has a number of meanings but the one we have presumably borrowed is that from physics as in ‘critical mass’ which refers to a minimum amount of fissile material required to maintain a chain reaction, rather than the alternative ‘rigorously discriminating’. Unfortunately common usage has added the implication of ‘extreme importance’ to these dictionary definitions and it is this that causes difficulties in using the term ‘mission critical activities’.
A Business Impact Analysis should enable us to identify these ‘Mission Critical Activities’ but when thoroughly undertaken what emerges from the analysis is a complex web of interaction between business functions and organisational objectives. The connections of some functions to business objectives may be subtle but it would be rash to declare them unimportant and not necessary to consider in a resumption plan. (There is a parallel here with the original approach to IT recovery, in which data restoration would be limited to key data sets - but this was rapidly discredited once the complex interrelationships that exist between datasets was mapped). It should come as no surprise that every function and every person in an organisation is ‘critical’ in some sense after economic constraints have led to years of down-sizing. If there are people doing work which is not important to your organisation, then why are you still employing them?
In trying to determine an employee’s mission criticality it is tempting to ask them ‘How critical is your function to the business?’. We understand the purpose of the question, but does the interviewee? It could easily be interpreted by them as ‘Is your job really necessary (if not then you could be made redundant)’ in which case there is every incentive for them to exaggerate the importance of what they do.
The use of ‘critical (or key) functions’ can also create problems in implementing a continuity management programme. An Alex cartoon (in the Telegraph) showed the pinstriped character walking along the road with a colleague complaining that ‘I don’t know how I can show my face in the office again.... the indignity....I have just been designated as non-critical by the BC Manager’. By being divisive about there being ‘important’ and, by implication, ‘unimportant’ jobs and staff we are potentially creating rifts which an incident may ruthlessly expose. Successfully managing an incident will rely on the co-operation of all staff including those who are apparently ‘non-critical’ even if their role is just to keep out of the way for a while.
|
Figure : Percentage of staff required for resumption after an incident |
To resolve this problem we must recognise on what criteria we are trying to differentiate functions. As an example, the function of ‘actuary’ is critical to the success, or otherwise, of a life insurance company (as any Equitable Life pension holder will agree) but the office cleaners are, surely, not important. However, the timing of the implementation of actuarial decisions is unimportant on a scale of weeks or months whereas an uncleaned office could become a health and safety hazard in days. So the criteria on which we are differentiating functions is actually their urgency not their perceived importance or status. Some quite low-status functions, such as sorting the mail, need to be resumed urgently whereas some high-status functions, such as strategic planning, can wait for a while until their continued absence makes them urgent too.
The BIA will usually identify a continuum of resumption requirements over time across all business functions, not a split into critical and non-critical. The strategy developed from the BIA will typically require a small team to resume the most urgent functions, then staff numbers will need to increase as further functions are added, usually forming an ‘S’ shaped curve (as shown in the graph) with a tail made up of the least-urgent strategic functions. The challenge is then to match the provision of building and equipment resources to this growing requirements over time until all functions are resumed. By only considering the requirements of supposed ‘mission critical functions’ an effective limit is placed on the length of interruption for which the strategy is appropriate and there remains the possibility that a low-status but urgent task has been overlooked. It may not be possible to acquire suitable space quickly enough to accommodate those undertaking tasks which, as time has elapsed since the incident, have now become vital.
The simple solution to this difficulty is for us to use terminology which is less ambiguous to those outside the discipline. We could ask how urgent a particular task is and be easily understood. Alternatively if current usage of ‘critical function’ is too engrained, then could we preface it with the vital qualifier and use ‘Time-critical function’ instead? Continued use will act as a constant and timely reminder to us, and those we work with, of the critical parameter of our discipline.
A trip to the International Symposium of Business Continuity proved to be an unexpected practical experience of disaster management.
The 12:27 London to Bruseles on 17/10/01 entered the tunnel at about 2:00 with the usual announcement that we would be in the tunnel for about 20 minutes. However within a few minutes the brakes came on hard and we came quickly to a halt.
There followed frequent announcements of 'final checks being made' but it was forty minutes before we moved again, very slowly. However within a few minutes we stopped again. A further announcement was made that 'we may have to terminate our mission' a use of terminology which several passengers found very worrying.
The problem had been caused by part of the brake assembly on the front coach which had snapped causing the brakes to come on. The attempt to move the train slowly despite the jammed brakes caused the front coach to fill with eye-watering fumes which prompted the staff to order the evacuation of the front of the train. This was hampered by a man in first class who deliberately blocked the gangway for reasons that could not be discerned.
There was then an announcement that the whole train was to be evacuated with only hand-luggage to be removed (some warning of this would have considerably reduced the chaos when we reached Brussels). The evacuation of all passengers through the rear carriage was handled very professionally as it has, no doubt, been well rehearsed. We were led through a short corridor into the service tunnel which runs in parallel between the two train tunnels. There was a reassuring presence of many emergency service personnel and paramedics though, apparently, there was a fatality due to heart failure.
Around two hundred passengers stood in the service tunnel about two hours with little information apart from very loud and unintelligible tannoy announcements which were quite frightening. Eventually we were led through into the other train tunnel to board a replacement train to continue the journey to Brussels, now about five hours later than planned.
On arrival at Brussels the fun started as a couple of staff tried to sort out a catalogue of missed connections, lost luggage and lack of accommodation. A few budget travellers spent the night on the benches in the Eurostar terminal. Those at the conference had to shop for clothes in between frequent trips to the station for news. Fortunately our story became a talking point rather than a reason for exclusion from the conference for inappropriate dress. A newspaper reporter had been on the train so there was plenty of coverage of the incident each day. Luggage was finally returned three days later having been shuttled between Calais, Waterloo and Brussels.
Although train breakdowns in the tunnel do happen, apparently this was the first where it was impossible to move the train. As in many such incidents the emergency response was exemplary but the subsequent attempt to 'return to normal' (business continuity) showed the organisation unprepared for this easily predictable situation. Kits of essentials and meal vouchers handed out on arrival would have demonstrated preparedness and control. Also missed was an opportunity to control the press coverage (mobile phones didn't work in the tunnel much to some passenger's surprise) but the news was spread very quickly once we surfaced. Far from putting me off this mode of travel the experience has increased my confidence in the safety of using the tunnel to reach the continent but I now make sure that carry essentials in hand luggage and label all my bags.
Presented at the Business Continuity Management Forum 2002 - The London Chamber of Commerce. 26th April 2002.
The seven months since of September 11th have seen the publication of many stories and reams of comment on the impact of those events. I do not intend to go though those events in detail nor do I claim special knowledge of any of the organisations caught up in the disaster. Instead I plan to examine how some of the stories that have emerged from the events in Manhattan could be influencing your continuity planning and strategy particularly if you are based in a city centre.
Within an hour of the first plane striking the World Trade Centre, I had a call from a Financial Times reporter asking for information on how the risks for tall buildings were assessed. It appeared that the risk of working in tall buildings had suddently increased. Should all high buildings be immediately and permanently evacuated or would the risk have decreased tomorrow? A number of tall buildings in London were evacuated immediately after the attack as a precaution but were rapidly reoccupied to be followed in a number of companies board rooms urgent discussions about whether or not to relocate and I suspect many staff meeting places too.
The comment from the FT correspondent highlights the problem of using risk analysis to try to plan for rare but catastrophic events. There have been very few aircraft collisions with tall buildings and, fortunately, few major fires or explosions in skyscrapers so there is little historical record to go on. Any analysis tool which claims to be useful or scientific should give stable and replicable results, but the reliance of risk analysis on historical events and personal perception for an assessment of probability and the impossibility of identifying all threats are fatal weakness of this method. It sounds undeniable in theory but breaks down when you try to apply it in practice. The result of the method is dramatic swings from low risk to high risk after single incidents with a gradual falling off until the next incident. While perhaps suitable for determining levels of alertfor setting daily policing and surveillance, risk analysis does not provide a stable platform on which to base the longer term decisions such as facility location and IT strategy.
However the perception at the moment is that tall buildings may be dangerous so why do we build them?
Building tall is a partly a response to the higher cost of land in the centre of cities, known as the bid-rent curve. Developers maximise their revenues by providing as much space as possible at the highest rents and because the land they have bought it expensive. However as the building gets beyond about 50 floors there are diminishing returns as the space taken up by lifts and complexity of the utilities for the top floors takes an disproportionately high proportion of the floor space of the lower floors.
But developers build higher than pure economics justifies - the WTC was nearly 100 stories. The driving force for going higher is prestige - effectively by taking space you are advertising your company's strength and stability to your customers.
So is working in a giant billboard safe? It would seem intuitively obvious that a tall building due to the lack of opportunity for firefighting and escape available is more dangerous. However risk analysis may tell you that you probably safer from some of the hazards that afflict those closer to the ground for example, burglars, ram raiders and fires from adjacent industrial processes.
So I think that baring a spate of copycat strikes, this demand for high rise office locations will continue as long as Boards decide that the intuitive risk evalution equation comes down heavily in the favour of prestige and advertising. Certainly there is no apparent evidence in London of a slowdown with a new development planned for London Bridge, nor has their been a mass exodus from the City though a quiet removal of critical equipment may be underway.
So given that occupation of tall buildings will continue what are the business continuity issues they pose and how can these be mitigated?
There a number of fairly obvious resilience-related statements about an organisation that occupies a tall buildings or is adjacent to tall buildings.
So what can we learn about these issues from the events of September 11th.
There was no doubt that the regular evacuation drills instituted by companies are the 1993 were instrumental in saving 18K people who escaped before the buildings collapsed. Compare this with the 6hrs it took after the 1993 explosion. The owners of the building, the Port Authority of New York and New Jersey, had taken a strong lead in encouraging drills and installing emergency lighting. This is in contrast to the indifference to which some landlords treat the complex issues of building security in some UK multi-occupancy buildings. Does your building have a rehearsed evacuation plan?
For those that escaped, the initial confusion was exacerbated because most companies nominated an assembly point for staff that were too close to the building. So once the danger was appreciated staff fled to other parts of Manhattan or went home which made it impossible (given the loss of communications) to account for those who might have been to be in the building. In contrast, Morgan Stanley had chosen a site 10 blocks away in Verrick Street 800m which though still just within the area of flying debris appears to have been sufficiently far away on this occasion. Congregating in one place will have enabled the company to quickly identify survivors and possible casualties, give support and devise and rapidly diseminate a response.
The map shows the the extent of the damage in concentric rings around the WTC. Within 200m the devastation was total from collapse and fire. Within 400 metres there was structural damage to buildings from flying debris and extensive dust damage to air-conditioning and electrical equipment. Beyond this up to 800 metres there was damage to glazing and roofs from smaller projectiles and dust with power cuts lasting up to five days. Significant quantities of dust reached up to three kilometers from the site.
This detailed map shows the actual buildings and the extent to which they were damaged. Those in blue in the centre have collapsed but there was major damage to a number of the key financial buildings around the World Trade Centre - and also to the Police HQ (bottom right) and the NY Telephone Building (top left).
This map shows the same concentric rings of damage on the same scale but centred on Bank station. If your offices are in the square mile then try to place your staff assembly point on the map. Will it be accessible if a similar event happened in London?
Locating a suitable assembly point for staff in advance of a major destructive incident is always problematic. If it is a false alarm or a minor fire you don't want the disruption of your workforce crossing half the City on foot, however if it is a major incident the usual assembly point 'across the road' will be overtaken by events - staff will dispersed by the Emergency Services to what they consider a safe distance - which may be up to 800m for a large explosive device and police cordons will prevent access thereafter.
It should also not be where everyone else is planning to go - a survey in the City showed that the Monument was the chosen assembly point of companies with a combined total of 150,000 staff. In the City the cordon area is pre-planned therefore it is possible to nominate two locations - one close and the other outside the cordon if the first is inaccessible with clear instructions to staff.
Given the English weather the chosen location should, ideally, be under cover and offer refreshment facilities - quite a tall order but such places do exist.
Similar criteria of distance must apply to your ECC or incident room. This is a facility from which you manage the disaster if your main site is inaccessible. It may need to be no more than a single room that contain contact information and communications equipment but it obviously important that it must remain unaffected by the incident that has caused it to be brought into use.
In the case of Manhattan the Emergency Services' control centre was at the base of the building protected in a bomb-proof bunker which had proved itself apparently by surviving the 1993 bombing. With the World Trade Centre being a prestige location, a known target and a key facility it was putting a great deal of faith in 'bomb-proofness' to use this site. It might be reasonably be anticipated that a lower profile, more peripheral location might be more appropriate in a wider range of circumstances than a single bomb. The loss of this control room significantly hampered control in the early stages of the disaster until the back-up location could be brought on line.
Likewise the response of a number of organisations in the twin towers were serverely curtailed by the choice of location for their back-up media and facilities, compromising their resilience with a desire for operational convenience. They located facilities either on a different floor of their tower, in the other tower or in other buildings close by. Neither would qualify under any business continuity definition as 'off-site'.
So where do you locate your emergency control centre? It is not just damage that has to be considered. Any incident is bound to create gridlock and problems for public transport in the vicinity so half an hour's walk might be a good guide - about 1.5 miles.
One should also consider the location of other facilities and resources required post-incident. One company in the twin towers had to wait two days for access to their back-up tapes because their data storage company were too close to the incident site so had their own problems to sort out before they were able to assist others. There is a ever-present tussle between the convenience of day to day operations and the survivability of the company in an extreme event.
A number of communication issues were exposed by the damage in Manhattan. In the hours following the attack almost all fixed line and mobile networks in the vicinity of the WTC were inoperable. One large switching centre was destroyed and one heavily damaged. This then exposed a number of choke points which were unable to cope with the extra traffic forced through them. The cellular system, always the obvious emergency fall-back in BC plans, was severely overloaded both because of the number of calls and the damage to mobile masts and repeaters in the area. Sprint helpfully led two ("COWs") into the area to replace damaged mobile masts but there were continuing problems. When I first read this I hadn't come accross this acronym before and I couldn't see how two bovines were going to assist - until it was explained they were mobile 'Cells on Wheels'.
Of course there was wider disruption to businesses due to the power failures which knocked out telecomms equipment. Many key items of equipment will have been protected against power spike and short outages. However Uniterrupted Power Supplies provide protection only for a limited time - until their batteries have discharged - and generators only continue while they have sufficient fuel. Air conditioning units, which may be vital to provide a safe working temperature for equipment are often not connected to the protected supply so their failure causes the equipment to overhat and fail. In Manhattan many generators and air conditioning systems were put out of action by dust and debris blocking air intakes and jamming cooling fans.
These communication difficulties have led to discussion about the need for the US Financial service industries to have a dedicated, bomb-proof network. In my experience there is still much that could be done, relatively easily to ensure that existing telecomms systems are installed and maintained with higher resilience standards before further expensive systems are considered.
The loss of communications caused huge difficulties in accounting for staff. As already shown, the nomination of an assembly point can reduce this reliance but further resilience can be gained by giving and rehearsing standing instructions to staff about what to do in an evacuation, if they hear police advice not to travel or in reaching a cordon on the way to work.
The internet, whose origins were in providing a decentralised system that could withstand nuclear attack, might be expected to have provided the ultimate in resilient networks in this sort of crisis. Yet a massive surge in Internet usage around the world overloaded servers and routers and the found bottlenecks created by damage to the network. As a result most people went back to the TV for information.
Those directly affected by the telecoms outages saw their websites (where these were at a single location) out of service for days after the disaster, resulting in major losses for e-commerce related businesses in the area. Many of the rules of diversity of equipment and network providers have been neglected in the rush to e-commerce.
In the first few weeks an intense spotlight was on the perfomance of third-party recovery services. IBM supported over 100 clients, Comdisco 46 and SunGard 30. None lost any recovery facilities in the incident. On the whole the feedback has been positive and all client invocations were 'facilitated'. However behind the press statements lies a number of concerns:
Each of those clients will have expected a full service and planned to be using the recovery centre nearest to the World Trade Centre. A DR manager who called eight minutes after the first plane struck was told he was 11th on the list. Yet these people have signed contracts that stipulate an allocation of resources on a 'first-come, first-served'. Certainly the disruption for relocated staff will have been significantly more than anticipated. What happened to exclusion zones that are supposed to protect clients against multiple invocations?
The other contract clause that many will have signed relates to guaranteed occupancy of the recovery facility for a maximum of eight weeks. This term is a historical hangover from the time when contracts covered IT equipment only. After that, if another client invokes, you have four hours to vacate the facility - in other words you ought to have moved into other premises before that deadline. In discussion with one of the suppliers last summer I was assured that no disaster goes beyond eight weeks and that that is plenty of time to sort any problem out. While it may be true of most IT equipment, its not a long time to find new premises and equip them as well as recover business processes. It will be interesting to see whether the experience of this event forces recovery companies or their customer to review that stance.
Paradoxically one recovery supplier, Sema, which in contrast operates an 'equitable share' policy for concurrent invocations and maximum six month tenure, though put on standby by several customers received no confirmed invocations. With the agreement of its own clients, it offered its facilities to clients of the other recovery service companies.
The other area where companies slipped up in New York is in seeking to provide alternative facilities for IT staff or those identified as 'critical staff' only. Not only is this somewhat divisive (as illustrated in the cartoon seen a few year's ago in the Daily Telegraph's Alex strip) it ignores the fact that ALL functions in an organisation are critical - else why are you doing them? The missing dimension here is time. Everyone in the organisation is critical, but some are more time-critical than others. I do feel this is a key mistake in BC planning. A manager asked if he is 'critical' will immediately go on the defensive and try to justify his importance, asked instead if his role is time-critical he is likely to give a much more considered and realistic answer. As an example take the skill of an actuary in a pensions company. His experience is absolutely critical to the success of that company but if that function is not available for a few weeks the impact will be minimal. However, ignore it for a few months or more and there effect on the company could be catastrophic.
So the result of the loss of the buildings in New York and the severe underestimate of space required was a desperate search for alternative premises and hotel accommodation was snapped up rapidly. About 16 million square feet of offices (about 20% of the total) were destroyed and another 12M in the cordonned area, The economic downturn meant that a similar area of office space was available in Manhattan unfortunately it was mostly suitable for small to mid-size firms, and could not accommodate the trading floors, newsrooms and other large open spaces needed by major companies.
The practical advice to glean from this is to have thought about the build up of staff from day one onwards for up to three months - or how ever long you think it will take to find, buy, equip and network a suitable replacement office. Even if you cannot afford to lease the required space and keep it empty at least you will have a headstart in knowing what you require even if other companies are chasing the same facilities.
In New York the Regulatory Authorities are considering whether to designate certain areas of the city for backup sites in an attempt to get companies to return to Manhattan yet to reduce overall exposure... and this from the land of free enterprise. Tim O'Brien from the FSA actually raised the concentration of recovery centres in Docklands as a specific risk to City companies.
There have been few public details about the effect on computer systems though one recovery supplier said it was using just about every platform available. What did occur even in the UK was panic buying of equipment. What price recovery plans based on purchasing equipment post-disaster through normal channels? Anything that disupts production or distribution is likely to make things worse.
Accounts of technical recovery problems have also been understandably few but there have been general comments that shortcomings in many company's back-up regimes have been highlighted. Either data was found to be missing or corrupt or there are synchronisation problems - that is the back-ups have been taken at different times on different systems so there incompatibilities between the restored files. These stories suggest that there has not been enough invested in installing the required quality of back-up regime (such as offsite mirroring) and end-user testing to identify these problems.
The achilles heel of many recoveries is the loss of paperwork. Pictures from previous events have shown papers strewn all over the area - confidential documents and work in progress. Marsh's heavy investment in imaging paid off with little more than that day's work lost.
The Cantor story is one of the saddest corporate tragedies to come from this attack. They reportedly lost 70% of their workforce. It is difficult to envisage any plan that can address a loss on that scale. Another company lost a nearly all their worldwide business continuity staff who were at a conference that day.
However there are many companies who could be crippled by losing one or two key staff with specialist skills or knowledge. Whatever those skills, be it in their contacts, knowledge or a technical area it makes sense to have a clear programme of multi-skilling and the spreading of information. This has obvious benefits in a crisis where the ideal person for a task may not be available. On a training course I rang, on delegate related the anecdote that his CEO was heard to threaten that if any member of staff made themselves indespensible then they would be fired.
Should we apply the same logic to staff as we do to technology and make sure that there is some duplication and overlap at multiple locations. Perhaps the aim should be to create an organisation that is geographically and functionally diverse enough that the loss of one location cannot cripple the business. In the current economic climate this may be difficult to argue but is the logical outcome of the requirement to create resilience. Maybe the 'Distributed Headquarters' is the structure for the future but perhaps communications (both data and transport) will have to become more reliable before that can be realised.
All of the reports of organisations recovering highlight the crucial part played by their staff and suppliers - causing 'miracles' to be accomplished. Contrast this with the general lack of support for recovery activities and personnel from management level in companies prior to the attack - and now in most organisation I suspect now things have apparently returned to 'normal'.
How many recovery plans depend on the ability to move staff or computer back-ups rapidly to a remote location? In the US that usually means an internal flight and I know of at least one UK company where internal flights are a key element of the plan. Movement of people away from Manhattan was severly hampered by disruption to the subway, the road system was gridlocked, and bridges were closed and all flights were grounded. The situation was made worse by rumours, hoax calls and general unease. Recovery plans that rely heavily on rapid relocation are always questionnable because there are so many factors outside your control.
The initial grounding of aircraft in the US and the subsequent hurried implementation of security measures led to severe disruption of air freight. Cargoes were held up for long periods while equipment was installed and staff were trained to meet the hurriedly-drafted new rules. Ford, reliant on single sourced JIT supply, was forced to close some production lines due to lack of parts. There were many other significant delays to deliveries across the world - in November an African country was running dangerously low on phone cards due to delays with scanning equipment at Heathrow.
There is still a reluctance to fly in the US which I find illogical - its the problem with risk analysis again. Americans used to board aircraft with the same level of ground security as boarding a bus. Yet my town's Easter youth music festival has been cancelled because the US' school bands still wont fly due to the perceived danger. This is despite the improvements in security which now approach what we were used to in Europe. Still, there has been some environmental benefits of reduced Co2 emmisions from fewer planes! The railways in the US have had a sudden resurgence of passenger numbers - but their infrastructure is even worse than that in the UK - there have been two fatal crashes recently.
As a result of travelling difficulties there has been an upsurge in video conferencing & telecomuting which has been a lifeline to stuggling telecoms companies. Home working is often pointed to as a neglected area of BC strategy but there are significant logistical and managerial problems. It is only really a suitable recovery strategy if it also a business strategy. The problems need to be ironed out in normal working conditions if there is to be any chance that it will work in a crisis.
There are increasing pressures on all business, but particularly the financial sector to prove their resilience. Turnbull is usually quoted as the driving force here but by focusing on all risks to an organisation there is a danger that catastrophic threats become discarded into the 'very unlikely' category in the risk analysis and thus ignored because of the serious implications of implementing appropriate counter-strategies.
There may also be pressures from staff, expressed through resignations or recruitment difficulties, concerned about their personal safety when working in high buildings. After the Docklands Bomb two thirds of the employees of one company whose building was seriously damaged resigned on the grounds that they no longer felt safe in Docklands and blamed the company's managers for not protecting them. A survey by globalcontinuity.com shortly after the tragedy showed 43% respondence strongly against working in high-rise offices, though as memories fade this will have already dropped significantly. Yet who is going to be the first person to sue their company for the stress caused by making them work in a tall building?
So many financial and management pressures conspire to lead businesses to consolidate and agglomerate, it may require significant pressures to encourage diversification - perhaps this is an area where insurance companies could take a lead in looking at underwriting policies that discourage undue concentration of facilities in a single location.
The explosion at Canary Wharf did severe damage within about 200m, the Manchester bomb about 400m and severe damage stretched 800m in Manhattan - and now the threat of a nuclear or biological device? So how big will the next one be - what is our worst case scenario or are we wasting money and effort on trying to be bomb-proof.?
To keep this in proportion I think you need to consider how, in an extremely serious incident, the Emergency Services, your staff and other companies will react. Work and possible relocation comes well down the priorities when human life or family welfare is at stake. A major incident may lead to the intervention of the military and the taking of emergency powers. A catastrophe may destroy or alter your market. For these reasons there is a sensible limit to the size of incident for which it is worth planning, what I term a 'Maximum Survivable Incident'. Like the advice given if you stumble across a number of bears while walking in the woods, it might be best to act 'dead' for a while until the market is ready to resume.
Those companies who had such excellent plans that they were ready to trade on 12th September on the NY Stock market might have been rather annoyed that the Stock market was closed for several days. This is an area where agreement between regulators, agencies and financial services companies on the conditions under which trading would be suspended could save substantial investment in facilities that could be unnecessarily resilient.
I have concentrated, as expected on the lessons from the fallout of September 11th but I want to end with a caution that we allow the focus of continuity to concentrate on bombs and explosions. We must also not forget that the London's environmental foot (boot?) print is huge ; It depends on services and people from a considerable area. A major utility disruption such as contaminated water supply, a prolonged power or transport failure could cause a much more widespread and disruptive incident than a single explosion. As well as a terrorist surprise, man-made and natural disasters have plenty of potential to cause us challenges in the near future. The only prediction I am prepared to make is that the next major incident when it comes will be unexpected but the companies with the flexible rehearsed plans will be able to cope with whatever the incident throws up.
I set out to analyse how September 11th had changed BC thinking and outlook. In illustrating what lessons can be learned from the incident and subsequent recovery efforts, I conclude from the many recovery successes that the core principles of BC such as - resilience, duplication, dispersion, planning and testing held up well, though the reminder of the crucial role of staff was timely. The problems came where the rigorous implementation of these principles was compromised by an unsound risk assessment and demands for day to day operational efficiencies and a neglect of thorough user testing.
I hope I have pointed out some significant lessons from the information presented that you can take away with you to enhance your own Business Continuity planning.
Thankyou for listening. I am happy to respond to any questions or comments.
© Continuity Systems Ltd
'Worst case scenario' is a term which is regularly used as a convenient shorthand by BC practitioners to describe the severest incident covered by the continuity plans they develop. However with an apparent increase in the severity of extreme weather conditions, and the ever-present threats of tectonic and man-made catastrophes, this term may need to be used more carefully or even discarded. We should consider whether its use could give an organisation a misleading impression of the scope of the incident for which their continuity plan provides protection.
When assessed on a scale that starts with obliteration of the earth by an asteroid, through global pollution by radiation, a massive earthquake event to widespread flooding, the isolated loss of an organisation's head office by fire or flood is a smallish disaster which can scarcely be called 'worst case'. Its impact, while serious for the organisation, is at the local rather than regional level.
We make similar assumptions about loss of life. When faced with the difficulty of envisaging a catastrophic incident, we tend to use the 'denial of access' scenario and call it a 'worst case'. But if there are many staff casualties it may not be possible to recover the organisation successfully; an attempt at a rapid recovery may even look heartless.
Most BC plans are designed to cope with a local incident by providing existing staff with alternative facilities within daily travelling distance of the main site. The Ice Storm in Quebec prompted a reappraisal of assumptions about the optimum distance of recovery centre from a site as the weather paralysed a region 300 miles in diameter for six weeks. This raises the question of whether an organisation should maintain expensive and costly continuity plans that can cope with a major regional or even national emergency or select a less-than-worst-case scenario and hope that a more serious calamity will not happen.
One approach would be to decide to exclude catastrophic incidents on the grounds that their probability of occurrence is so low as to make them not worth considering. This risk analysis approach is unsatisfactory both because we cannot accurately measure the probability of rare events and their rarity is of little comfort if one does actually happen. Extensive power cuts hit Scotland twice in one week after Christmas 1998 due to storms described each time as 'once in thirty years' and there is no historical precedent for the recent flooding in SE England.
A more pragmatic approach is to examine the experience of organisations faced with regional scale disasters such as hurricanes and earthquakes. In this scale of event, an organisation's ability to recover may be hampered by the response of the emergency service whose priority of public safety may conflict with the organisation's need for staff to get to work. If the incident has already caused heavy casualties or is threatening further destruction, then even key recovery staff may feel that their priorities lie at home rather than at work. In an extreme case, facilities such as generators, communications and buildings set aside by the organisation for recovery use may be commandeered by the public authorities for community priority needs.
It is suggested that the solution is to remember that the aim of the continuity plan is to provide a means by which an organisation's objectives will continue to be met even when an incident threatens to derail them. There is an assumption here that business objectives will remain the same despite the interruption, and thus a predetermined recovery strategy can be adopted. It follows that if the incident is of such magnitude or such wide extent that the current organisational objectives are no longer relevant, then a pre-planned recovery strategy is likely to be inappropriate. A local council officer in Yorkshire illustrated this succinctly when, questioned by a councillor on the plans to collect council rents in the event of a nuclear attack, replied that there were none because no-one would be around to worry about it.
This reasoning can also be applied when considering how far apart buildings have to be before they can be considered to provide resilience for each other. For city centre buildings more than a few hundred metres apart to be destroyed simultaneously (other than by sabotage) would require a major explosion, flood or weather event and could result in many casualties. The emergency response to this magnitude of event could last for several days and would have priorities of safety and welfare which would severely impede any business recovery efforts at either location however carefully this has been planned. In addition staff living in the area may be having to cope with damage to their own property. Should this organisation have recovery facilities, or a second head office, at a distance? And if so, then how far distant should they be - in a different region or even another country? And even if it had those facilities would the organisation's reputation survive the press coverage?
It is suggested that each organisations should consider the extent of their 'Maximum Survivable Incident' (MSI) - the most serious and extensive disaster they wish to plan for beyond which no predetermined strategy can be expected to be appropriate. This MSI should then be used in place of any 'worst case scenario' in the Business Impact Assessment which will then ensure that an appropriate recovery strategy is chosen.
Some organisations, particularly the Emergency Services and local authorities, will necessarily have a large MSI since they have welfare responsibilities to a large area and will need to spread their facilities appropriately to prevent a single incident rendering all their resources inoperable. However at some point on the scale the incident will become serious enough for national and even international resources to be deployed and to replace local efforts.
For commercial organisations their Business Continuity Strategy should be developed around a Board-agreed Maximum Survivable Incident scenario. If the Board demands an ability to survive a regional or national crisis, even if these involve major casualties, then the organisations resources must be dispersed accordingly. If it accepts a more limited disaster as their MSI then appropriate continuity plans can be implemented without the additional overheads of trying to provide for major incidents which would take the means of recovery out of their control and make their business objectives obsolete.
Knowing when to admit defeat in a disaster is vital for an organisation. Expending effort on an attempt at recovery from a catastrophe greater than an organisation's MSI is futile and may mean that opportunities opened up by the incident are overlooked.
Ian Charters is an independent Business Continuity Planner with six years
experience in assisting companies and other organisations to develop appropriate
recovery plans. He also presents workshops and seminars for Survive.
Ian is a Member of the Business Continuity Institute and a member of the
Emergency Planning Society's - Business Continuity group.
P&O Stena Line's computer systems that manage the loading of ferries at Dover Docks is highly resilient, being split between two data centres, two miles apart with back-up circuits. However when a technical fault crashed the system in September 1999, police invoked part of their Operation Stack emergency plan which involved parking all the lorries on the M20. The interruption was significantly prolonged because the temporary lorry park and resulting traffic chaos delayed both technical staff and replacement equipment from reaching either site.
From the perspective of ensuring Business Continuity in an organisation, to understand the plans and powers of the local authority and emergency services could mean the difference between recovery success and business failure in an emergency. Premises owners can be denied access to a building and its environs by the emergency services where there is concern for safety or where evidence of a crime may be destroyed. As a last resort, equipment and facilities can be commandeered, though usually requests for voluntary assistance is the preferred route as in the 1999 French storms.
One of the vaguest points in most business continuity plans is how the organisation's staff will work with the emergency services to handle an incident and then retrieve control of the site from them.
There is a wealth of practical experience of these issues in the heads of the country's Emergency Planning Officers gained from recent experiences and in many cases augmented by colourful past exploits. Were James Bond ever to retire, he would not find himself out of place in that sector. EPOs may find banana skins lurking around every corner but they also share with the Business Continuity profession a determination to learn from every incident - even if is dealing with something as unexpected as a dead beached whale. Inviting the local EPO to attend and comment on your company's Business Continuity exercise can be a sobering, even depressing, experience. However the practical experience they freely offer can only enhance the company's ability to cope if an incident occurs.
Responding to the media scrum can become the Achilles heel of a company's attempt to recover from an incident. Many companies have a press officer or nominate a senior manager to brief the press but few have the training, experience or backup to handle the immediate demand for statements. Many companies will lack the media contacts and be unaware of the correct local procedures for issuing press releases, such as the Scottish Lord Advocate's guidelines. Here again there is a wealth of knowledge on these issues in the public sector where disasters, albeit other people's incidents, are handled more frequently.
The BCI standards, formerly 13 in number, were redefined recently to 10 in consultation with the DRII. Some topics were combined but two new ones were added, to reflected the experience of recent incidents, Public Relations and Co-ordination with Public Authorities. This identified the need for companies to improve their recovery planning by collaboration with the public sector services.
The Manchester bomb demonstrated the reliance that businesses place on a the local authority in the aftermath of a major incident. The larger companies were able to call on their own resources and plans to look after their staff and their business. The smaller enterprises could only look to the City council to assist with access, salvage, insurance and relocation issues. The prosperity of an area and its tax base depends on its local businesses so councils have a real interest in assisting businesses to survive.
However it should not be seen as a one-sided relationship. The most significant contribution that private companies can offer to the public sector is their site and staff for use in emergency exercises. They also have the money to sponsor these exercises, since the failure of their plans can usually be shown to have serious financial implications and may be a statutory requirement.
Whilst many local authorities have had experience in handling disasters within their community, few have addressed the impact of an incident affecting their own buildings and staff. As local authorities are encouraged to viewing the running of council services as a 'business', the business continuity expertise developed to identify and protect critical functions in the private sector is valuable. Maintaining an ability to provide an acceptable level of service to customers is both a public and private imperative.
Where hazards are highly visible this mutual interest can lead to a close and on-going co-operation between public and private sectors. Since 1969 the chemical companies that surround the town of Grangemouth have worked in partnership with the local council (now Falkirk Council) and the emergency services. Leaflets telling householders how toxic gas escapes will be notified to them are written and funded by the companies and distributed by the council. Regular training and exercises are conducted with premises provided by the companies. In the event of an incident, the council's emergency control team is supplemented by representatives from each company authorised to contribute their company's resources - for example their private fire engines, a generator or specialist staff - as required. Companies share their experience of an incident candidly with each other so that each can learn lessons from it. With ten major incidents in the last thirty years such co-operation is vital especially with the limited resources at the council's disposal since the fragmentation of local government in the most recent reorganisation.
By making contact with the appropriate local public bodies and working with them, a company can ensure that it understands the responsibilities of the various organisations it must deal with. It can ensure its particular needs are known and how to press their case in an incident. The public bodies can develop and rehearse their plans with an understanding of business needs and be in a position to determine the best balance between public and commercial interests in responding to a major incident.
Introduction
Business Continuity Conferences nearly always feature at least one seminar on the theme of gaining Board 'approval' for the Continuity Project. Budgets are forced out of reluctant directors by describing the impact of a variety of serious but unlikely events on the organisation. However convincing these scenarios they are always open to being refuted by the observation that business failures are rare and are far more likely to be the result of poor product design, marketing mistakes or financial mismanagement.
Whilst assisting organisations to develop Business Continuity Plans I have observed that organisational changes have from resulted from the project beyond those specifically intended. This has convinced me that a more positive approach to the Board, stressing the benefits of the project to the organisation and the 'bottom line', is more likely to be accepted. However, to achieve this the BC planner may need to interpret terms of reference a little more generously than is customary.
A Business Continuity project will work with key personnel and functions within a business to develop a recovery plan. This is not a one-way process of information collection - these key areas will also be changed by the process. Structures and procedures are put in place in the organisation to enable recovery in the event of an interruption. Many of these are of value in normal business operations too and some are highlighted below.
Communication
To be effective a recovery plan needs to be communicated to all staff so that everyone works in the same direction during an incident. Few companies seem to have an effective means of diseminating information to their more junior employees but once briefings are set up as a pre-requisite for recovery planning, their use for more general discussions comes naturally. If these new channels are used well this should result in staff bing better informed and feeling more involved.
Teamwork
A telling point was made during a recent presentation about the recovery of a Scottish distillery from a flood. The speaker said that the teamwork encouraged by the salvage company assisting the recovery had been so impressive that it continued after normal production was resumed and had significantly improved employee relations and productivity. Why do we need a serious interruption to prove the benefits of working together?
Retention of staff
Whilst most business activites are predictable, organisations need to retain at least a few 'dynamic' staff who can react to the unexpected and create new opportunities. Retaining these is a challenge to which the Business Continuity project can offer at least a partial solution since they are often suitable recruits for one of the recovery teams. Joining a team that dons hard hats and high-vis jackets and simulates exciting scenarios can give sufficent challenge to retain them.
Many junior staff in larger companies would say that they feel under-valued. Making clear to staff that key objectives of the development of a recovery plan are to ensure their safety and preserve their jobs should improve their perception.
Recruitment and induction training costs per employee are substantial. Any measure that reduces staff turnover is valuable.
Awareness
Many disasters could be prevented at minimal or no cost by timely action. Risks, obvious to a visitor, are ignored daily by staff because it is someone else's responsibility or too much trouble to do something about it. Raising the awareness of all staff to the possible impact on their own job of a serious interruption could lead to them taking action that prevents a disaster.
Marketing
Manufacturers operating JIT production are beginning to realise their vulnerability to supply interruptions. A Business Continuity plan and the guarantees that it allows you to offer and demonstrate to your customers can be used by the marketing department to increase market share, increase prices or boost reputation. An IT facilities management company is now developing a new sales strategy emphasising its ability to provide a service even if its main facilities are inaccessible.
Resources
Stand-by buildings and computer equipment are often significant costs in a recovery plan. Alternative uses, as long as they can be cancelled at short notice, can justify this expenditure. A recovery centre can make an excellent training facility with which staff will then be familiar when required for its emergency function. A in-house stand-by computer facility in another location can be used to save an upgrade when year- or month-end reporting are the only peaks that over-load the production machine's capacity.
Quantification
The results of a Business Impact Review contain information of use beyond its original scope. It attempts to quantify how the organisation will be affected by the loss of key functions. The insight this exercise gives into the operation of the organisation and its reliance on various functions should interest all senior management considering organisational changes or major projects.
Strategy
Stretching the Business Continuity brief to its utmost, the BC planner should have an input into all strategic business decisions. Contingency measures are always easier to build into a new facility than to devise afterwards. Capital investment is more likely to be available and at lower rates if the strategy is demonstrably resilient to external variables.
Plans to consolidate and centralise facilities may reduce resilience and flexibility to respond both to incidents and market opportunities and this can be highlighted by the BC planner at an early stage before decisions are irrevocable. One specialist reinsurance company shelved its plans to relocate all of its functions in one City building following the presentation of a Business Impact Review. Alternatively a planned reorganisation may free equipment or facilities that can resource a contingency plan at minimal cost.
Conclusion
The above examples suggest there are many demonstrable benefits to
the organisation from undertaking a Business Continuity project which be
can used to win support from the Board though all fall outside the BC planners
usual remit. Are we prepared to become more involved in normal business strategy
and management to support our case for continuity planning?
This paper was written in response to an article entitled 'The ethics of
fear in booming continuity sales' by Chris Needham-Bennet (WHERE PUBLISHED?)
which defines the role of a BC professional as a risk manger with a little
extra scope to their responsibility. His risk manager should have relevant
data on probabilities and cost analyses for the industry sector and tackle
each risk accordingly. The author expresses concern that business continuity
professionals are allowing their enthusiasm to 'lead to a distortion or exaggeration
of facts and a reliance on fear in their sales methodology' by putting forward
improbable, catastrophic scenarios and are in danger of becoming 'risk folk
devils and bogeymen'.
In proposing a strategy for risk management that consists of the identification, prioritisation and mitigation of likely risks he both ignores the weaknesses of this risk management approach and shows a misunderstanding of the objectives of Business Continuity Planning.
The techniques and concepts of risk analysis come from two main disciplines - insurance underwriting and engineering. They both aim to measure risk in terms of an impact multiplied by its probability. By this measure the 'plane landing on the building' scenario is so improbable as to merit no attention.
The insurance underwriter needs to make sure that policy premiums from his portfolio will cover claims with a profit margin. To this end they will use historical statistics on the industries and locations of the insured. They cover the inadequacies of this data by aiming for a spread of risk types and locations and can still get their sums wrong even when aggregating thousands of risks. To seek to apply these aggregated generic historical statistics to the future of a specific site in a specific industry with its unique methods of working cannot be an acceptable methodology.
The engineer takes a small sample of each component of a piece of equipment, tests it to destruction then aggregates the results to calculate a probability of failure of the complete machine. No business can be understood in such mechanistic terms. They are far too complex, change rapidly and are affected by many outside influences. Staff, buildings and procedures are fortunately rarely tested to destruction and therefore estimated failure rates are unavailable.
Given these limitations most risk studies of business interruption derive the probability of the occurrence of each threat from estimates given by respondents to risk questionnaires. The resulting grids, graphs and pie-charts can look impressive but can obscure the fact that the figures are based on guesses however well-informed.
Though many risks are easily identified, analyses of actual disasters show that many result from factors such as human error, a combination of unfortunate circumstances or temporary conditions such as building works. These risks are difficult to identify in advance and impossible to assign realistic probabilities to. The cause of some disasters can even be traced to risk management 'solutions' which have failed or have led to impacts where risk reduction methods led to unexpected failures in other areas. That those in business continuity can always find a pertinent recent disaster example suggests that these threats are not imagined or even unlikely.
The most common causes of business failure are lack of sales and cash flow problems. To this business continuity has no remedy, nor should it since this is the business' core competence. Once the business is on a secure financial footing, however, there is an investment to be protected. The role of the business continuity manager, in my opinion, is to protect the business from adverse events in areas outside that core competence (which is of course different for each industry).
The Board give the BC manager the responsibility for ensuring continuity of core business through any adverse circumstances. This may be required either by statute, pressure from customers or accepted best business practice. The challenge is to devise an implement an appropriate strategy that will allow the business to provide a near-continuous service from its critical functions in a worst-case scenario. To determine the meaning of 'near-continuous', 'critical' and 'worse-case' within the company is the first challenge the manager must face.
One necessary simplification made to make planning possible is to consider just a few generic incident scenarios. The choice of realistic and comprehensive scenarios is a further challenge requiring experience. Listing the wide variety of causes and to trying to calculate the probabilities that could lead to that type of situation do not assist the development of the continuity plan in any way, though they could point to measures which might increase resilience.
In practice the task of managing risks is often given to the BC manager but it should be clearly understood from the Board which is the principal role. A list of key actions drawn up using risk management techniques may differ markedly in content and priority from that evolved through business continuity methods. This is because one is aimed at reducing exposure and losses from known threats the other at surviving a worst-case scenario to which occurrence it is impossible predict a cause or to assign a probability. Both lists are valid and defensible and the measures finally implemented will usually involve some compromise between the two. However it should be remembered that a risk reduction measure may work in isolation, but a half-built continuity plan will almost certainly fail.
So are BC professionals just 'risk devils and bogeymen' picking over
and relating each new disaster with relish? The enthusiasm with which we
analyse disasters is not to heighten fear in our clients rather there is
a willingness to learn lessons from the failures (and successes) and to pass
this experience on for others to avoid the same misfortune.
'How do you know that you have created a successful BC plan if the organisation
doesn't experience a disaster?' This is a real issue in a business environment
which expects success to be measureable. Without a real incident how can
one be sure that all the time-critical functions have been identified and
that adequate preparations have been made for them?
For the consultant who is invited into a company to 'do' Business Continuity for them because they lack internal resources or expertise, it is an even bigger question. How can the consultant know that they have succeeded? Exercising the plan is vital to assess and improve the readiness of an organisation but is there a level of preparedness at which point the project should be deemed to be finished?
There are attempts to create a common objective standard to assess organisation's preparedness but this has so far proved elusive. Perhaps a common standard will remain so because there is no such thing as a standard organisation and, for the same reason, it is impossible to develop a successful off-the-shelf recovery plan.
The first stage of any BC project must be a Business Impact Assessment from which an appropriate continuity strategy can be determined. To see this stage as a solely technical and statistical exercise loses much of its value. It presents opportunities for the consultant to raise awareness amongst the staff being interviewed and to start to get 'under the skin' of the organisation. Discussing the experience of past incidents can clarify the inherent responsiveness of the organisation and identify individuals who may be able to take on BC roles. It also provides an opportunity to understand how proposals are presented and decisions made by the organisation. Much of this valuable information will be lost if there is no continuity of personnel into the implementation stage of the project.
The success of the implementation of BCM in an organisation by an external consultant will depend to a large degree on how well he has understood its culture. It is almost necessary to 'go native' to understand, for example, in what form a plan should be documented and how staff may react in a crisis. In this respect one of the small number of independent Business Continuity consultants will have the edge since they don't bring the baggage of their own company's structure with them. They are not answerable to anyone except the client and they can rely on their experience rather than their employer's standard methodology. Being solo they provide a continuity of personnel throughout the project yet through an informal network they can call on additional specialist skills if required. The result is often a more inventive and cost-effective solution which makes best use of the organisation's resources.
Experience shows that the wide range of organisational cultures makes such a flexible approach vital. A financial services organisation will spend precious weeks deliberating on the exact number of desks to contract for, only to find its decision overtaken by other events. A chemicals company reckons it can face any crisis without a plan since the management's engineering background limits its perception of a disaster to a plant explosion. A car components factory has to reuse a ramshackle portakabin as an off-site store to keep costs down. An insurance company is used to paying out on other people's disasters, but reluctant to admit they could have one of their own. Within departments of the same company too there can be a variety of attitudes ; we have all met IT departments who consider a mainframe recovery plan sufficient for all eventualities. To respond to these situations requires a detailed knowledge of personalities and procedures within the organisation but combined with the external 'expert' view.
So how, after working in so many complex situations, can the consultant claim success and move on? To hand over a smartly bound and detailed plan is easy but of questionnable value. It creates the impression of a job completed, whereas this plan is really only the first step on an endless journey. What he should be leave behind is a structure that will ensure that the organisation's ability to maintain business continuity will continue to grow and develop. This can be achieved in many ways but its long-term success will depend on the extent to which the Business Continuity project becomes part of the specific culture of this particular organisation.
Success usually involves the general raising of awareness but also the identification and training of a Business Continuity team some of whom may wish to develop their knowledge and experience and go on to seek BCI certification. As the team takes shape the consultant, originally in charge of the project, must increasingly take a back seat and empower the new team to assert their independence, even if they make a few mistakes.
The success of a management consultancy is often rather cynically
judged by how long they can attach themselves to an organisation - with considerable
savings on their marketing budget. Some consultancies offering Business Continuity
services appear to want to provide a recovery team on a permanent basis to
an organisation depite the issues raised of availability, knowledge and responsibility.
However my own personal criterion of success is when the new in-house Business
Continuity Team decides that they now have the confidence to take the project
forward on their own. With a call of 'See you in six months at the next exercise'
I can depart reflecting on a job well done.
__________________________________________________________________
Home | Continuity
|
Consulting | Training | Resources
| Healthcheck
| Contact Us