Critical systems and bottlenecks

A critical civilian system goes down – it’s a scenario that evokes some apocalyptic pictures of destruction and mayhem; remember, for instance, “Die Hard 4.0”? Actually this could happen with any corporate infrastructure, since all of them have certain critical systems of their own.

A critical civilian system goes down – it’s a scenario that evokes some apocalyptic pictures of destruction and mayhem; remember, for instance, “Die Hard 4.0”? A group of motivated hackers try hacking into government and commercial computers, and bring down all traffic lights in a major city. Unsettling. Actually, every business has its own sort of “critical” system that’s necessary for its continuous existence. It should be well defended as well as resilient to threats. Architectural errors may diminish this resilience heavily.

Threatpost ran an article on an incident in April with one of the mainstays of American society – 911 service.

“In the early hours of April 10, a series of errors led to a massive, multi-state outage in the emergency call management centers (ECMCs) that handle 911 calls in seven geographically dispersed states. The incident originated at an obscure but critical call routing hub in Englewood, Colo., and ended up knocking out the emergency communication infrastructure for more than 11 million citizens”, wrote Brian Donohue at Threatpost. Well, this sounds bad. 911 service is a critical system that’s supposed to be working ceaselessly. An outage as massive as this that affects (possibly) 3.5% of the US population is a clear emergency. During the outage 87% of the 911 calls made during this outage failed; it seems like a miracle that there were no deaths as a result.

The detailed analysis of the incident is available at Threatpost; in short it looks like a number of factors cross-contributed to the outage, namely a software error, ageing equipment, some human errors and, most of all, a big architectural deficiency:

“The enormous breadth and the geographic dispersion of the outage, the FCC says, was in part attributable to an architecture that consolidated critical 911 functions in two locations serving multiple states, without adequate safeguards in place,” Donohue writes.

Now, this is a thought-provoking situation. A system of that scope has an apparent bottleneck, and once it is stuck, the entire system crumbles.

This scenario is applicable to just about any corporate IT infrastructure. As said above, almost all companies have their own “critical” system, required for ceaseless operations and, essentially, survival. As a matter of fact, every corporate system has its own “bottlenecks”, for instance – the main gateway, or even a website, that’s vital to the business.

Recently a fellow IT worker told a story he witnessed a year ago: a small company’s main website had been planted with some malware. It was easily detected by an antimalware solution, wasn’t too harmful, and took one day’s worth of downtime to eradicate it without a trace.

But while the malware was there, the search engines brought the website’s rankings down so low that it took weeks, even months to restore its previous position. And this proved to have ghastly consequences for the company’s business: it was on the brink of closing, staff had been cut in half and finances – what finances? A scary story, but unfortunately real. The website was vital, it had been “poisoned” with malware (possibly by some competitors). Then there was a tailspin, which the company barely survived.

Yet another possible scenario that some admins experiences recently: a Cryptolocker ransomware slipped into the corporate network and into backup servers, encrypting everything within its grasp. If the network and storage have architectural deficiencies – i.e. aren’t segmented – and the Crypto can reach everywhere guess what happens next?

In a nutshell, while architectural errors may bring problems on their own, there’s a strong possibility that in case of emergency they would aggravate the situation. Just the same way as if a fire exit was blocked with some old furniture, rubbish or a broken lock. Unless there are more fire exits, the consequences will be disastrous. Unless a system has an extra margin of safety, its security can’t be guaranteed.