While the iRail API has been getting quite some updates, we remained silent about our open data on stations. Mostly because there wasn’t really anything to talk about. Today is different.
What we had before
Before today, we provided a list of stations, with their ID, translations, coordinates and ‘importance’ (e.g., Brussels West would be less important than Brussels South). While this enabled a lot of people to create nice things, it isn’t enough. In the past, we’ve received questions about station addresses, and at UGent, research is being done to plan routes keeping accessibility and disabilities of travelers in mind. We didn’t have this data, while it’s clearly valuable.
And what we offer now
We have created a scraper to scrape every station page there is, based on our existing stations.csv file. The results are stored in facilities.csv. The following data is now publicly available:
- Address of the station
- If ticket vending machines are present
- If luggage lockers are present
- If there is free parking, taxi, bicycle parking, taxi, bluebike, bus, tram, metro
- If there are wheelchairs available,
- If there is a ramp to roll wheelchairs on a train
- If there are parking spots reserved for people with a disability
- Whether or not the platform is elevated, if escalators are present up and/or down, if there is an elevator on the platform
- If there is a hearing aid signal
- Opening hours for ticket sales
All code and data is open source in the irail/stations repository.
Now go create something awesome with it!
Days are getting shorter, and iRail API responses are getting longer. We’ve added some more data to various API endpoints, so you can build even better applications. We’ve also improved support for HTTP caching, and are closing in on a release for LinkedConnections. Let’s check it out!
For a while, we weren’t able to provide the status for trains in a connection. We didn’t have that data, and we were at peace with it. But everything changed when the NMBS decided to delete a file from their site. Our status monitoring told us we were in trouble a minute or two later. It was the endpoint we were using to query connections, and now that it’s gone, we had to get our data somewhere else. With tens of thousands daily API requests, we’d also better fix it fast. I continued a previous investigation into the NMBS mobile app API. Quickly I was able to figure out how it worked, and we were able to get the API back online within 4 hours and 19 minutes. Within a few days the new code was mostly bug-free, and any bugs in the last weeks were quickly fixed, thanks to the help of some other developers who were happy to assist.
With this bad news also came some good news. While the old endpoint didn’t provide data about trains arriving and leaving the next station in your route, the new one does. Therefore, we are now able to provide you with the left and arrived status of trains in a connection. Make something beautiful with it.
We also have alert data, pin-pointed to specific parts of the route. This way end-users can be even better informed about where in their journey they can come across some hinder. In order to demonstrate what’s possible with this new data, I’ve taken some screenshots using the Hyperrail for Android application which is already update to use these parameters.
Last but not least, there can now be walking parts in a route. This is rare, but in some cases it’s necessary, for example when travelers get off at Haren but need a next train at Haren-zuid, which is within walking distance, without any trains connecting those two stations.For the techies: a new attribute ‘isWalking’ has been added to every departure/arrival of a connections object. This way you can easily distinguish those routes. Just don’t try to parse a train or destination for those walking parts, but you can read all that in our documentation.
I’ll keep it short here: You can now see if a stop is part of the ‘normal’ route of this train, or if it’s an extra stop which was added last minute by traffic control.
Log and feedback data
All iRail API log data, and spitsgids feedback, is now automatically published on gtfs.irail.be every morning at 3:00. This way everyone can instantly make use of the most recent spitsgids data. Feel free to scrape this site if you need data for multiple days. The formats are documented in the API documentation.
Conditional get requests
The API supports caching and conditional get requests. Conditional get requests allow you to tell the server which version of the data you have. If there is newer data available, the server will return an HTTP 200 status code along with the new data. If no new data is available, the server will return an HTTP 304 Not Modified status code, along with an empty body. We only support the if-none-match header for now, with more details in the docs.
Having all these new fields doesn’t mean we’re done here. We’ve almost gotten LinkedConnections ready for production, and we’ll be opening up a lot of data about stations and their accessibility. Make sure to follow @iRail on twitter to stay up to date!
I’m Bert, a 21 year old student at Ghent University. I came into contact with iRail when I fixed an existing but broken application to view realtime train data on Qt devices (mostly for Nokia’s Symbian OS). A few years back I ran into Pieter again at Open Summer of Code, and I’ve been contributing to various iRail projects since. Last week I started a 3 week student job to fix issues with the API, improve performance and move servers. In the meanwhile, I’m testing with my own Hyperrail Android application.
In the past week we have been rather busy setting up a new server for the iRail project. This new server will handle all API requests and all requests to irail.be. While moving servers, we also tried to optimize the code in order to get faster API responses – with success. Response times dropped from between 500 and 5000ms to an average of 250ms. I’ll try to explain what changes were made, and how they affect the API.
Requests from our scraper to the NMBS website are now cached, meaning that every response from the NMBS is kept in memory for 15 seconds. Especially the liveboard endpoint is impacted by this, as most requests for big stations will be able to use cached NMBS data. By caching at the NMBS data level, the cache isn’t affected by the requested output format (xml or json), resulting in more hits and faster responses.
Caching station lookups
A while ago, the `&fast=true` parameter was introduced to produce faster responses. When this parameter was supplied, station names received from the NMBS would be passed on to the clients without looking up station ids or standardizing the name. This resulted in faster queries, but this only addressed the symptoms, instead of the cause.
By caching the station lookups (where a station name from the NMBS is looked up in stations.csv and all information is returned), the time required to look up a station almost dropped to 0. Furthermore, the number of stations is limited, and this data doesn’t change. This means that once a station has been cached, it doesn’t ever expire (unless data is updated manually). As a result, within the first 100 queries after starting PHP, all big stations will be cached, and further requests won’t have cache misses. At this moment, the `&fast=true` parameter no longer affects the speed, as the station lookups are instant now. A comparison of tests with and without station caching can be found in Github issue #111.
Spitsgids has had a round of bugfixing. Some issues with posting data, and some with viewing the occupancy for trains have been resolved.
Not only spitsgids, but also the API got its round of bugfixes. As a result, the API should be more reliable. Most important here are some changes to the vehicle endpoint. The name field will now always be formatted as “BE.NMBS.xxxxx” where xxxxx is a train id as used by the NMBS (like IC817, S51245, …), and a shortname, which is just the id as used by the nmbs (for example IC817). Earlier, the name field contained the id which was passed in the url. If you build applications on top of the iRail API, be sure to test if this doesn’t break your application and update your application if necessary.
Furthermore the connections endpoint includes some additional information: vias not only include the direction and id of the train arriving in this via, but also the direction and id of the train which leaves at this via station. This should allow to use the API more easily, and should help in building applications which post data to spitsgids.
New documentation and uptime monitoring
In the past, users notified us whenever the API broke down. From now on, all API endpoints are monitored and notifications are sent out when one of them goes offline. Everyone can check the current status on status.irail.be.
The documentation which was previously hosted on our blog got outdated over time, and wasn’t as clear as it could be. Therefore, the documentation has been rewritten, and is available on docs.irail.be, is available on github.
I’ll be working 2 more weeks on the iRail project. During these weeks I’ll work on the definition of good URIs as identifiers, and on the use of LinkedConnections data as data source for the iRail API. I’ll try to fix as many issues for the iRail API and spitsgids as well.
A special thanks goes out to Ghent University: they’re funding my work during these three weeks so volunteers can work more efficiently on iRail, and I will look at better publishing mechanisms for transport data. This way we can make sure we can keep accessing the valuable research data.
As of today, you can request the official GTFS real-time feeds of SNCB! This is great news: our real-time feeds were only a scraped version of what was announced on their websites. If you now want to rely on a high-quality data feed of train disturbances and train delays, you can rely on their new feeds. Request this over here:
With iRail, we are allowed to use the GTFS data within our API and redistribute it. We slowly migrate our datasets to use this new data. Someone in for a day of hacking?
* You still need to sign a contract and thus cannot call this open data just yet.
Tomorrow it’s the Open Belgium 2017 conference. We are hosting a session on the state of play of Open Transport Data in Belgium. We look forward to welcoming Arnaud Wattiez from SNCB, Sébastien Goffin from STIB and Bert Van Hemelen of De Lijn, who each will give a presentation on their recent activities and plans on the Open Data domain. Afterwards, Peter Defreyne (Antwerp Management School) and I will ask questions to the panel to discuss the next steps.
Of course, many of these datasets are also available through the iRail API under the CC0 waiver, or through our unofficial GTFS dumps at gtfs.irail.be. We are only a third party that gives out data without warranty and can not guarantee sustainability (we are online for 9 years this year though). Whether iRail is legal remains in the gray zone for the time being: we believe there is no sui generis database law applicable on the data. However, much depends on the intention of the public transit companies themselves as well: do they want to stimulate their data to be picked up for maximum reuse? The quality of the open data would raise tremendously when the transport companies would publish official Open Data.
After all this time, I’m still impressed with TEC. They are not going to be at Open Belgium 2017 tomorrow. They said: “we will not be able to tell you something new after being at Open Belgium 2014 and 2015”. They provide both historic dumps as their currently planned schedules for entire Wallonia. Providing historic schedules is very relevant for e.g., mobility studies: how is the mobility of a certain area evolving? This is not possible at any of the other data providers.
In each case, TEC gots it right with http://opendata.tec-wl.be/. With its simple and pragmatic Open Data portal, it is a frontrunner in Belgium for a couple of years now. They also have plans to open up their real-time datasets.
You can request access to the files of SNCB, STIB and De Lijn, to both the planned files as to the real-time data (at least, soon to come with SNCB). In each of the cases, a non-open contract needs to be signed. There we call this data sharing, and not Open Data. Apart from their licenses not being compliant to the Open Definition, we can also see practical issues with this data for use within a Web ecosystem: when added to an open data portal, it cannot directly link a machine to the files.
Next, we also do not want transport agencies to open up an API for e.g. real-time data. We want them to share the raw data files needed to create APIs. APIs are often rate-limited, which defeats the purpose of maximum reuse and rewarding successful third parties. When real-time data needs to be shared, a couple of files that update each 30 seconds is still easy to host and provides the flexibility to third parties to implement any use case, instead of being heavily tied to the functionality an API exposes. The MIVB-STIB at this moment have the most complex system to reach the data itself, with API keys and rate limits on top of simple data files.
Yesterday, the news broke the SNCB is giving a monopoly to Olympus Mobility on their ticket reselling. Also with iRail we already requested access to reselling tickets, as this is the only feature why people still use the SNCB app over apps like Railer or BeTrains. Alexander De Croo, who also will be speaking tomorrow, agrees with us: open data reusers should become partners in ticket selling instead of competitors. Tomorrow, we will ask Arnaud Wattiez from SNCB to elaborate on this: are there plans to give others access to ticket reselling, just like De Lijn is already doing?
Mensen moeten makkelijk snelste traject kunnen vinden en tickets kunnen kopen. NMBS en innovatieve mobiliteitsapps zijn bondgenoten. https://t.co/NopnfNJL6L
— Alexander De Croo (@alexanderdecroo) March 4, 2017
In each case, tomorrow will be a day where we will learn a lot about the future of Open Transport Data in Belgium from the transport companies themselves. See you there!