outage – Internetblog.org.uk https://www.internetblog.org.uk Web hosting, Domain names, Dedicated servers Fri, 29 Jan 2016 11:05:52 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.5 https://www.internetblog.org.uk/files/2016/01/cropped-favico-32x32.png outage – Internetblog.org.uk https://www.internetblog.org.uk 32 32 Twitter has 99.1% uptime for June https://www.internetblog.org.uk/post/1497/twitter-has-991-uptime-for-june/ Mon, 05 Jul 2010 19:54:03 +0000 http://www.internetblog.org.uk/post/1497/twitter-has-991-uptime-for-june/ According to uptime service monitor Pingdom, social networking site Twitter had a June uptime figure of 99.17%. Although this sounds high, it is actually low by industry standards, especially for a large site like Twitter with so many resources at its disposal.

The 0.83% downtime figure equates to 5 hours and 43 minutes of lost Tweeting. Network configuration issues as well as spikes of traffic due to the World Cup and NBA Finals caused the downtime.

Unfortunately, Twitter fanatics will not be able to get the lost time back. Maybe the site will have better uptime this month?

Photo | Flickr

What to do when your server goes down https://www.internetblog.org.uk/post/1427/what-to-do-when-your-server-goes-down/ Mon, 14 Jun 2010 20:57:52 +0000 http://www.internetblog.org.uk/post/1427/what-to-do-when-your-server-goes-down/ Stressed woman
First of all: do not panic. What may appear to be an outage, may actually be an issue with your network connection or Internet congestion. Once you have eliminated the usual suspects, there are a few steps you can take to resolve the issue quickly and get your dedicated server back up and running.

1. Test an SSH connection. If you can still SSH into your server, you most likely just have a software issue. If your web server application (such as Apache) has crashed, a simple restart may fix the problem. If you notice it starting to crash routinely every day or every week, you may have a security exploit.

2. If you cannot SSH into your server, try to ping and traceroute the server. If you get network connections all the way up the traceroute but cannot connect to your server, that means the network is fine, but the physical server may have crashed or been shutdown. Follow the normal procedure for rebooting. If your server is remote, you can ask your web host to reboot it. Some hosts also have automatic reboot switches that you can activate remotely. If something is wrong with the network, check with your host. They may already be diligently trying to fix the problem.

3. If rebooting does not fix the problem, and you cannot access your server, your host may offer you a KVM connection so that you can troubleshoot your server’s network settings.

4. If your host cannot even get the server to start in order to use KVM, they will probably have to re-image your box. This will erase everything, and you will be thankful at this point that you have kept backups of all websites on your server.

Photo Source: stock.xchng

Amazon EC2 cloud service experiences power outage… again https://www.internetblog.org.uk/post/1312/amazon-ec2-cloud-service-experiences-power-outage-again/ Fri, 14 May 2010 13:49:12 +0000 http://www.internetblog.org.uk/post/1311/amazon-ec2-cloud-service-experiences-power-outage-again/ electricity poleEarlier this week, Amazon’s EC2 cloud service experienced yet another power outage. This time, a car crashed into a local utility pole and knocked out the power. The generator transfer switch failed. A number of East Coast customers lost service for about an hour.

A very similar incident occurred in 2007 at a RackSpace data center. Regardless of this, Amazon needs to get its act together. Why didn’t the server load transfer over to the generators properly?

The cloud computing provider surely won’t be signing up very many new customers if these power outages continue. Finally, current EC2 users must be very upset about this and worried about Amazon’s long-term reliability.

Germany's .de experiences major outage https://www.internetblog.org.uk/post/1310/germanys-de-experiences-major-outage/ Thu, 13 May 2010 22:36:57 +0000 http://www.internetblog.org.uk/post/1309/germanys-de-experiences-major-outage/ brandenburger gate
The majority of the Internet’s 13.6 million .de domains were unavailable from between 1:30pm and 2:50pm German time yesterday. DENIC, the .de operator, reports that the names went “kaputt” after empty zone files were accidentally uploaded to the DNS root system.

Information on the number of names affected varies. According to one source, every .de name starting with the letters “a” through “0” saw downtime. DENIC is still investigating the outage.

Source | The Register

Amazon addresses cloud computing power issues https://www.internetblog.org.uk/post/1295/amazon-addresses-cloud-computing-power-issues/ Mon, 10 May 2010 19:28:38 +0000 http://www.internetblog.org.uk/post/1294/amazon-addresses-cloud-computing-power-issues/ power lines
After power outages on Amazon’s EC2 cloud computing service resulted in a loss of service for some users on May 4 and May 8, Amazon has announced that it is working on a change in its power distribution to address the issue. The company said the changes will, “significantly reduce the number of instances that can be affected by failures like we have seen in the last week.”

The outages were caused by the failure of several electrical components as well as human error. Several disgruntled users report experiencing data loss as well.
While most EC2 users will unaffected by the power failures, this just goes to show that cloud computing isn’t perfectly reliable and there is still a lot of progress to be made in the field of distributed computing.

THAT caused a web host outage? https://www.internetblog.org.uk/post/1275/that-caused-a-web-host-outage/ Tue, 04 May 2010 16:26:11 +0000 http://www.internetblog.org.uk/post/1274/that-caused-a-web-host-outage/ truckUsually when a web host goes down, the cause is something very mundane. Maybe a router went offline or a hardware upgrade didn’t go as planned. In the case of Rackspace in 2007, however, something no one could have expected knocked one of its data centers out: a truck.

In a dizzying domino effect, a truck crashed into a utility pole. The pole then crashed into a nearby transformer, blowing it up. The power went out and Rackspace’s generators couldn’t handle the equipment load. All of its dedicated server clients were taken offline.

It took around 12 hours for service to be restored. Although very costly and inconvenient, Rackspace takes the cake for most the coolest data center outage cause.

Source | Randomkitty.net
Photo | jsnward

Customers of The Planet experience outages https://www.internetblog.org.uk/post/1270/customers-of-the-planet-experience-outages/ Mon, 03 May 2010 20:18:02 +0000 http://www.internetblog.org.uk/post/1270/customers-of-the-planet-experience-outages/ dedicated servers
Dedicated server owners at The Planet’s facility in Houston, Texas, experienced an outage lasting around 90 minutes last night. Customers were pleased to once again have access to their sites, only to experience more downtime this morning.

Since then The Planet has brought all its servers back online. The hosting company says a router issue in the core network caused the two outages. While the problem may be solved now, some customers wish more was done to update them on the situation.

Regardless of what ever happens, hosts have an obligation to keep their customers updated. It’s unclear if The Planet did a good job of this or not this morning and last night, but when choosing a host, check to see what sort of communication lines it has with customers. You definitely don’t want to be left in the blue if there is ever an issue.

Source | Data Center Knowledge

Why uptime matters https://www.internetblog.org.uk/post/1160/why-uptime-matters/ Thu, 01 Apr 2010 10:34:02 +0000 http://www.internetblog.org.uk/post/1160/why-uptime-matters/ rackmount server
Ever notice that most hosts have 99.9% uptime. There’s a good reason why they try so hard to keep things running. While a few percentage points might not seem like a big deal, over the course of a year they can really add up. Just take a look at the numbers:

99.9% uptime= 8.76 hours of downtime
99.5% uptime= 43.8 hours
99.0% uptime= 87.6 hours
97% uptime= 262.8 hours

Even a host that has 99.5% uptime still experiences 2 days of downtime per year! For some, that may not be a problem. But keep in mind that falling uptime figures are a very slippery slope. A seemingly decent up-time of 97% translate to outage time of 262.8 hours, or a little under 11 days.

Thanks, LiquidWeb https://www.internetblog.org.uk/post/1152/thanks-liquidweb/ Tue, 30 Mar 2010 18:59:29 +0000 http://www.internetblog.org.uk/post/1152/thanks-liquidweb/ serversNormally my VPS provider, LiquidWeb, does a very good job. In my one year with them, I have experienced zero downtime– until this this week. While it’s true that no provider is perfect, I think the company could have done a better job of handling a recent hardware problem.

For about a week or so, the parent server my VPS is hosted on was rebooted multiple times, up to several times a day. Each reboot took my sites down for around ten minutes or so. This wasn’t the end of the world, but was effecting my traffic figures. If a site isn’t reliably people will stop visiting.

So I contacted LiquidWeb about the problem. The support staff was very friendly and responded quickly, but I wasn’t so pleased with the response:

…we’ve been experiencing some problems with the
parent server your VPS is hosted on. We are aware of the reboots, and will replace any necessary hardware if it comes to that. While I agree that it’s frustrating to have the server reboots occur, we do have to balance maintenance with keeping services available for all of the customers on this parent server.

Later that day, I received an update saying the server would be offline for some time in order to replace a broken RAID controller. I’m upset about two things:

  1. LiquidWeb took a wait-and-see approach to server maintenance.
  2. It’s excuse didn’t hold true. There’s some merit to giving customers advanced notice about downtime, but if something needs to be fixed, it needs to be fixed. LiquidWeb knows this and did not give much notice about the RAID controller replacement.

What does this amount to? It seems to me as though the company waited as long as possible to replace a failing part in order to keep costs down and keep uptime figures high. This bet ended up not paying off. The controller had to be replaced anyway. But in the meantime, its customers suffered a lot of hassle with the reboots. It would have been better if LiquidWeb had simply done the maintenance and incurred the downtime instead of waiting until the last possible moment. Next time, why not just replace the part preemptively?

Photo | Flickr

WordPress.com blog hosting suffers outage https://www.internetblog.org.uk/post/1020/wordpresscom-blog-hosting-suffers-outage/ Mon, 22 Feb 2010 04:50:27 +0000 http://www.internetblog.org.uk/post/1020/wordpresscom-blog-hosting-suffers-outage/ The WordPress.com blog hosting service suffered a two-hour-long outage today. The downtime had nothing to do with the WordPress CMS, but instead rendered the 10 million sites using its free blog hosting service unavailable.

The cause of the outage is still being investigated, but right now it seems as though one router caused all the ruckus. Apparently someone at one of the four data centers where WordPress rents space made a configuration change to a core router. This not only blocked off access to the blogs at that particular facility, but the other three data centers as well.

Photo | ozdv8