The Public Cloud isn’t the Panacea it’s being Marketed As

I guess the Public Cloud isn’t the Panacea that it’s being marketed as.

Learning from Amazon’s Cloud Collapse

Nothing against Amazon, everyone has tech issues sooner or later. This is just another in a long list of many reminders that there is no magic bullet for data availability, only a sound strategy

DeBunking 5 9′s

99 percent availabilityI recently came across the following article: EMC SAN failure blamed for Intermedia hosted email outages

At approximately 6:15 a.m. PT on Thursday 4/16, a hardware failure occurred on one of the EMC storage area networks (SANs) located in Intermedia’s New Jersey data center. The service processor for one of the controller nodes had a failure. This failure caused the entire load for that SAN to be shifted to the service processor on the redundant controller node.

The spare capacity on the single service processor was not enough to handle the entire load of all systems connected to the SAN, which caused a degradation of performance … The degradation of performance on the SAN in turn impacted the overall system’s ability to process email messages creating a queuing of several hundred thousand messages within the system. The back log was large enough that it took 32 hours for it to clear after the original event.

The vendor [EMC] determined that the service processor failure occurred due to a unique bug in the specific version of firmware on the system. This bug caused the service processor to “panic” and automatically take itself off line.

I cite this article not to pick on EMC, who is the market leader in Shared Storage, but to prove a point. There is a concept that is touted in the IT Industry, mainly by Software and Hardware Vendors, called 5 9’s of Availability. This refers to systems, networks, or applications being available 99.999% of the time in a given year. Everyone claims that their product guarantees 5 9’s or is engineered for 5’ 9’s, but what does that really mean? Take a look at the table below:

5 9s Table

As you can see above, 5 9’s of availability is the equivalent of just under 26 seconds of downtime per month. 26 Seconds! One reboot on one Server already kicks you over that mark!

The lesson, as always, is to plan for the best and prepare for the worst: Make sure your hardware and software is up to date; Keep your Maintenance and Support contracts current; Implement a solid Change Control Process; Build Redundancy into your Infrastructure; Have a Good Backup Strategy.

While bugs are typically associated with Microsoft products (and here I am intentionally picking on Microsoft), they do, and will, happen to all vendors:

Why did McAfee Goof? It was Automatic

VMware bug bombs virtual servers

Y2.01K bug trips up Symantec

Microsoft confirms 17-year-old Windows bug

avatarJorge Azcuy
Director of Technical Services

Posted on April 27th, 2011. Filed under Industry Updates, Popular Posts.