As you might have noticed, we’ve been struck by a bit of bad luck lately.
About 6 weeks ago, a major FS crash occurred on one of the main servers of iRail. As this machine was quite old (7 years, an old P4), our host decided to decommission this server and replace it by a brand new Xeon server.
However, the process to transfer all data took a couple of days, and the IBBT was kind enough to provide us for hosting during this transition. This gave us the opportunity to run some load tests on their servers as well.
Not 3 weeks later, another problem occurred on another server. Xen (who was admittedly outdated, but could not be updated at that time) froze the entire server, and refused to restart its networking after resetting the device. I then decided to go for a clean install (upgrade to Xen 4.x and switch from Ubuntu to Debian). This outage affected minor services such as the blog. API was up during this time.
Long story short, it’s all back up now since yesterday evening. We’ll be checking out cloud solutions in the near future to prevent this issues from happening again.
There will be additional changes in the future (a dedicated VM for a few iRail services), but this is what it looks like now:
Aleph (dom0) proxies HTTP requests (using nginx) to 2 different VMs:
- dedicated BeLaws VM
- an Apache server running the api and iRail.be (and a few other iRail services such as Trac).
In the near future we’ll add caching again (it’s disabled at this moment), but still will be managed by TheDataTank.