Error: I'm afraid this is the first I've heard of a "writeback" flavoured Blosxom. Try dropping the "/+writeback" bit from the end of the URL.

Sat, 14 Jun 2003

My hosting server was down for more than a day

Do you get what you pay for? RCTHost, my hosting server has been quite good and incredibly reasonable until recently but what happened recently is just amazing...

It started a few months ago with the upgrade of the Cpanel, the interface which allows you to control your website when you don't have a shell (telnet or ssh access).
It seems that they just tried to upgrade that module but the server just couldn't take it and was suffering from a high CPU usage, instability, rebooting several times a day, etc.

That was still ok at that time, considering that I'm only using this server to host my homepage and receive emails. The only painful thing for me was that I didn't have access to the statistics of my website, to check the bandwidth utilization or the popularity of my homepage ! Obviously, nothing serious

However, what happened 2 days ago is just unbelievable...

In the afternoon, I noticed that the server was not responding anymore. I thought about one more mere reboot again and let it go.
A few hours later, the server was still not responding. I was getting upset because I was expecting some emails and dropped them a note in their helpdesk interface to know what happened
The next day, the ticket was "closed" (there is apparently no status for acknowledged, or at least, they don't seem to use it much).
The message was saying that the server had been compromised (understand hacked) by a Trojan Horse.

As a result, they were not able to boot the server properly and had to go for a full reinstallation of the OS and to restore the accounts and data. No need to mention that this is obviously a lengthy process, which lasted about 36 hours in my case.
You want to hear more about this fascinating story ? When they tried to restore users' data, they realized that the previous problems they had just prevented them from "backuping" data properly, so the latest backup they had was the one made on April 19th !

They could have tried to take the physical harddisk of the crashed server and mount it on a healthy one to copy data over to ensure data integrity...
Instead, they just went for a full reinstallation, assuming that they had a recent full backup, which would allow them to recover data(which is IMHO a bad choice anyway, since data is likely to have changed since the latest backup...).
Well, to be honest, they claim they have tried to convince the actual engineers in charge of their datacenter to do so, but they just didn't listen to them... right...

Fortunately enough, I had done a backup of my site 10 days before that catastrophy happened and I "just" had to upload the files again and to readjust what I had done between my last backup and now.
However, I just do not know how many mails I've lost because people didn't try to resend the message, or tried and gave up since the server was down for more than 24 hours :(

I'm still among the lucky ones. Some people there are running a business there and every hour of unavailability is more money lost, not to mention the reputation of their business, which is way more difficult to restore then just data on a server.
Some others don't have local backups for the last 2 years and were relying on the hosting server's stability or were hoping that the backups would work... Big mistake !

I will stay with rcthost as of now, because I'm not running anything critical on their server and for their very low price, but I'll make sure I make backups on a regular basis. I was just lucky this time but it could have been much worse.
After the accounts were available again, Rcthost was claiming that the server should be stable and running without problems (although I started to upload my files when I got an error message and realized that the server had rebooted again).

Yet, I can't help wondering about the actual stability of the company behind this service and I do not know whether this was just an exceptional situation or if this is how it starts, until the customers start leaving because data is lost...

#