I am appalled by several features of the failure that has affected many Jaguar servers in the last 24 hours. This has involved many people and is dragging on for a very long time. Those unfortunates, like myself, who are on skywalker look likely to suffer at least 24 hours of down time.
Given the seriousness of the incident, Jaguar owes its clients some explanations, and so far none has been forthcoming.
Although it is clear that efforts need to be directed to fixing problems, there is no excuse for the lack of a clear statement to say truthfully exactly what was the nature of this incident. How was it caused and how could it have had such disastrous effects despite claims of 100% uptime, UPS protection, generators, and so on. Vague statements are not good enough.
The level of information provided has been extremely poor. At no time has enough been said to allow clients to make informed decisions about how to cope with the problem. Never has any estimate been given of time to fix. Sometimes hours have gone by with no information whatsoever. Even now, there is the barest indication of how long skywalker users will continue to suffer.
I appreciate that it is hard to make predictions, and that they may go wrong. But that is no excuse for the total failure to provide any guidance.
Now that it is apparent that skywalker will be down for some time, I have made some limited improvement by reconfiguring my DNS (which is thankfully not at Jaguar) so that people looking for web sites that are down will at least see an apology and explanation, rather than just a blank screen.
It is extremely annoying that there was no warning yesterday of the possible extent of the down time. The implementation of an apology message would have been done 15 hours ago had there been any clue that there was a risk of the failure duration extending this far. As it was, given that it takes time for DNS changes to propagate, it did not make sense to redirect if the server was likely to be up again within an hour or two.
And of course that still leaves the question of why Jaguar is unable to restore a server within a reasonable time. If the data centre is vulnerable to major failures, then far more rapid restoration of backups is essential. Why is this such a major problem, and why was it not planned for?
So far the answers given have been sketchy or non-existent. Jaguar owes it to its clients to be more open and to provide far better information.


LinkBack URL
About LinkBacks



Reply With Quote



Bookmarks