Wednesday, September 29, 2010

Naive Navitaire: Virgin on the Ridiculous part two

Following my 30 or so hour delay on a 90 minute trip, I've been investigating this failure a little bit. I guess I am still unsure how a single server failure could take out a major Australian airline for a day. Then there are the questions about why they don't have better backup and manual systems in place.

There is a bit more information around since my earlier post, Virgin Blue updated its press release on Monday afternoon says:
"We are advised by Navitaire that while they were able to isolate the point of failure to the device in question relatively quickly, an initial decision to seek to repair the device proved less than fruitful and also contributed to the delay in initiating a cutover to a contingency hardware platform."
and Virgin Blue and The Register reports that the failure was in a Solid State storage array.

So basically, the SAN array died, Navitaire guys thought that they could fix it that didn't work and it took them 21 hours to get the backup system working in its place (or to get the hardware replaced and the data restored I'm not sure which). I am flabbergasted that a service provider that at services every Australian Airline and another 70 or so airlines around the world could have such a terrible response to the failure.

From what I can gather around the net the New Skies System is based on .NET and I presume some sort of SQL back-end. This sort of setup lends itself very well to redundancy via data mirroring and load balancing across a group of servers. So why wasn't there a redundant data server sitting there ready? Early quotes (that I can't seem to put my hands on now) indicate that Virgin Blue have a "cheaper" back up solution than Qantas, that should have kicked in within three hours. Obviously 21 hours is a lot longer than 3.
A Virgin Blue spokesperson told iTWire that Navitaire was supposed to have a parallel system in place and in case of disaster this would go live within three hours. However, it did not actually come into play until almost a full day after the first incident.
But why? I guess we'll possibly never know. There is however one thing that I know, if Navitaire did have that backup system working within 3 hours then while I would have probably been delayed it certainly wouldn't have taken more than 30 hours for me to get home.

So for that I say screw you Navitaire! Where is my compensation?

No comments:

Post a Comment