One million daily requests: where do they come from, and how we’ll cope with them

Since improving iRail server performance back in October, the iRail servers have been serving everyone without any major issues. The servers handled around 300.000 requests every day, which left quite some margin. Back then, we estimated the server capacity at 1 million request per day.

Today is that day. Our server monitoring alerted us of increasing CPU usage, and we found the origin: we’re now serving over 900.000 requests every day. Due to the fast increase in the number of requests, where half of the requests are coming from only one IP address, we temporary started enforcing stricter rate limiting, meaning that instead of 5 requests per second, only 3 will be allowed. This shouldn’t affect you, but if it does, please get in touch with us to work out a solution.

In order to know where requests are coming from, and how this evolved over time, we analyzed log files from the past 74 days. After analyzing the 18gb of data, I decided to created an infographic. This way everyone can see the current state and how we evolved. It’s interactive, so you can enable or disable any client by clicking it in the graph legend.

Infogram

Why clients use iRail

Most of the clients can be grouped into one of three categories:

Routeplanning applications for users
Digital signage, applications meant to be shown on TVs to inform people passing by of nearby public transport departures
Gathering information on delayed trains

The first one is a smaller category, which need an API like iRail. The second category also needs an API, but is more specific: they usually just need the departing trains for one or more nearby stations the next hour or two. The last category is different compared to the previous two: they need data which isn’t available through an API requests. In most cases they need to resort to loading the departures for all stations over and over again. This means they need a lot of requests (about 600) to update their information, which takes 2 minutes when obeying the old rate limiting system. Using the stricter rate limiting rules, this will take three to four minutes.

Linked Connections

It’s clear we need to do better. therefore, we can proudly announce that we’re currently beta testing Linked Connections, a brand new way of publishing transport data. Our new linked connections API will publish a list of all departing trains in Belgium, sorted by departure time.

What does this mean for you?

Offering users a routeplanning app? You will be able to implement your own route planning and support offline searches. As always, we’ll publish some open-source examples so you don’t have to figure it all out on your own. Does this sound scary to you? We’ll keep providing a traditional API.
Wanting to show the next departures? Just load the list for the time period you want to get the departures for, and filter out the departures at the station(s) you’re interested in. Again, we’ll provide some open-source examples and we’ll keep providing a traditional API.
Want to retrieve all delays? This category is the big winner. Just load the list for an hour or two, which is about 12 requests. That’s 50 times less requests!

The new API will serve the same list to everyone, meaning that with 1 million daily requests, you’ll have between 80% and 90% chance to hit a cache. Caching means we can serve more requests and we can server them faster – this also means rate limiting will be quite loose (around 10 requests per second, or more), so you can refresh your data often enough.

A few last words

Creating this infographic was only possible thanks to the user-agents included in the requests. However, still 5 of the top-8 clients don’t include a clear user agent. This means that we could not inform them of the new stricter rate limiting, or other updates to the iRail API (We only included the application name in the infographics above).

If possible, set the user-agent header string. Include the name of your application, and a way to contact you in case something would go wrong. An example user agent string format is <application name>/<application version> (<website>; <mail>)
, which could result in the following string: irail/1.2.0 (irail.be; hello@irail.be). The use of a user agent string like this isn’t obligated or enforced, but definitely allows for better communication.

Do you have a great concept?

If you have a great idea which needs railway data, you can instantly get started with our free to use API. You can find all information on our API at docs.irail.be. Want to analyze data yourself? We’re publishing API logs and occupancy logs at gtfs.irail.be.

Stay up to date by following us on twitter, joining our gitter channel, or make sure to follow this blog.

About Bert

Server administrator and developer for iRail

16. March 2018 by Bert
Categories: iRail | Tags: api, irail, statistics | 3 comments

iRail Blog

Search