iRail server move and bugfixing
I’m Bert, a 21 year old student at Ghent University. I came into contact with iRail when I fixed an existing but broken application to view realtime train data on Qt devices (mostly for Nokia’s Symbian OS). A few years back I ran into Pieter again at Open Summer of Code, and I’ve been contributing to various iRail projects since. Last week I started a 3 week student job to fix issues with the API, improve performance and move servers. In the meanwhile, I’m testing with my own Hyperrail Android application.
In the past week we have been rather busy setting up a new server for the iRail project. This new server will handle all API requests and all requests to irail.be. While moving servers, we also tried to optimize the code in order to get faster API responses – with success. Response times dropped from between 500 and 5000ms to an average of 250ms. I’ll try to explain what changes were made, and how they affect the API.
Requests from our scraper to the NMBS website are now cached, meaning that every response from the NMBS is kept in memory for 15 seconds. Especially the liveboard endpoint is impacted by this, as most requests for big stations will be able to use cached NMBS data. By caching inflatable water slide at the NMBS data level, the cache isn’t affected by the requested output format (xml or json), resulting in more hits and faster responses.
Caching station lookups
A while ago, the `&fast=true` parameter was introduced to produce faster responses. When this parameter was supplied, station names received from the NMBS would be passed on to the clients without looking up station ids or standardizing the name. This resulted in faster queries, but this only addressed the symptoms, instead of the cause.
By caching the station lookups (where a station name from the NMBS is looked up in stations.csv and all information is returned), the time required to look up a station almost dropped to 0. Furthermore, the number of stations is limited, and this data doesn’t change. This means that once a station has been cached, it doesn’t ever expire (unless data is updated manually). As a result, within the first 100 queries after starting PHP, all big stations will be cached, and further requests won’t have cache misses. At this moment, the `&fast=true` parameter no longer affects the speed, as the station lookups are instant now. A comparison of tests with and without station caching can be found in Github issue #111.
Spitsgids has had a round of bugfixing. Some issues with posting data, and some with viewing the occupancy for trains have been resolved.
Not only spitsgids, but also the API got its round of bugfixes. As a result, the API should be more reliable. Most important here are some changes to the vehicle endpoint. The name field will now always be formatted as “BE.NMBS.xxxxx” where xxxxx is a train id as used by the NMBS (like IC817, S51245, …), and a shortname, which is just the id as used by the nmbs (for example IC817). Earlier, the name field contained the id which was passed in the url. If you build applications on top of the iRail API, be sure to test if this doesn’t break your application and update your application if necessary.
Furthermore the connections endpoint includes some additional information: vias not only include the direction and id of the train arriving in this via, but also the direction and id of the train which leaves at this via station. This should allow to use the API more easily, and should help in building applications which post data to spitsgids.
New documentation and uptime monitoring
In the past, users notified us whenever the API broke down. From now on, all API endpoints are monitored and notifications are sent out when one of them goes offline. Everyone can check the current status on status.irail.be.
The documentation which was previously hosted on our blog got outdated over time, and wasn’t as clear as it could be. Therefore, the documentation has been rewritten, and is available on docs.irail.be, is available on github.
I’ll be working 2 more weeks on the iRail project. During these weeks I’ll work on the definition of good URIs as identifiers, and on the use of LinkedConnections data as data source for the iRail API. I’ll try to fix as many issues for the iRail API and spitsgids as well.
A special thanks goes out to Ghent University: they’re funding my work during these three weeks so volunteers can work more efficiently on iRail, and I will look at better publishing mechanisms for transport data. This way we can make sure we can keep accessing the valuable research data.
Pingback: One million daily requests: where do they come from, and how we’ll cope with them – iRail Blog