serversNormally my VPS provider, LiquidWeb, does a very good job. In my one year with them, I have experienced zero downtime– until this this week. While it’s true that no provider is perfect, I think the company could have done a better job of handling a recent hardware problem.

For about a week or so, the parent server my VPS is hosted on was rebooted multiple times, up to several times a day. Each reboot took my sites down for around ten minutes or so. This wasn’t the end of the world, but was effecting my traffic figures. If a site isn’t reliably people will stop visiting.

So I contacted LiquidWeb about the problem. The support staff was very friendly and responded quickly, but I wasn’t so pleased with the response:

…we’ve been experiencing some problems with the
parent server your VPS is hosted on. We are aware of the reboots, and will replace any necessary hardware if it comes to that. While I agree that it’s frustrating to have the server reboots occur, we do have to balance maintenance with keeping services available for all of the customers on this parent server.

Later that day, I received an update saying the server would be offline for some time in order to replace a broken RAID controller. I’m upset about two things:

  1. LiquidWeb took a wait-and-see approach to server maintenance.
  2. It’s excuse didn’t hold true. There’s some merit to giving customers advanced notice about downtime, but if something needs to be fixed, it needs to be fixed. LiquidWeb knows this and did not give much notice about the RAID controller replacement.

What does this amount to? It seems to me as though the company waited as long as possible to replace a failing part in order to keep costs down and keep uptime figures high. This bet ended up not paying off. The controller had to be replaced anyway. But in the meantime, its customers suffered a lot of hassle with the reboots. It would have been better if LiquidWeb had simply done the maintenance and incurred the downtime instead of waiting until the last possible moment. Next time, why not just replace the part preemptively?

