What are the schedules for my transport? What is the fastest way to get from my home to work? Here are two examples of questions that developers of journey planner solutions had to answer. To do this, developers rely on journey planner algorithms with different specificities, but all of them improve drastically the journey planning experience.
Every second, 77 people are taking public transportation in France. Today, many solutions allow us to get around easily in cities: Google Maps, OpenTripPlanner or our “Lyko APIs“. In the age of digital and Big Data, these applications need to be more and more powerful. They are therefore capable of providing us with precise itineraries while taking into account the user’s time constraints. Whether by bus, metro or tramway, there are many ways to travel. An exponential volume of data is surfing the Internet and making these journeys easier. Focus on the 4 main journey planner algorithms.
The first journey planner algorithm: Dijkstra
In 1959, Edsger Dijkstra, a famous mathematician and computer pioneer, published in an article of only 2 pages, the first journey planner algorithm. This algorithm answers a simple problem: to determine the shortest path in a graph, starting from a point x to a point y. Note that all the vertices of this graph are interconnected (like a transportation network), and that each segment has a weight represented by a number.
Understanding with an example
Here is a model of this graph. To better understand the principle of this algorithm, we will start with an example: Find the shortest path from point S to M. To do this, we will have to create a subgraph. It will rank the different vertices in an ascending order, from their minimum distance to the starting vertex. At the end, we will calculate the total distance of the path. This will correspond to the total sum of the weights of the segments used (it is important to note that S is 0).
To better understand this example, we build a table. Indeed, this table gathers all the vertices of the graph, and will help us to visualize the algorithmic process. This process is done turn by turn.
Vertex S, being the starting vertex, can be saved in the table because we won’t spend any more time there.
We notice 4 vertices connected to S, vertex T at 8 distance, vertex L at 5 distance, vertex N at 8 distance, and finally vertex E at 10 distance. In a table, we will therefore record these different vertices represented by a number (weight of the distance) and a letter (represents the point of origin of the connection).
We thus choose the vertex L. Indeed, it is the segment outside the subgraph that has the minimum distance (5). We won’t come back to it, so we can save it in the table.
|.||||5S||||||||0 + 5|
Three vertices are neighbors of vertex L: N, E and M. The relative distances of these vertices with respect to the initial vertex S are 7, 13 and 12 (taking into account the value of the previous distance which is 5).
Looking at the table, we can see that the shortest distance is indeed the summit N located at distance 7 from the summit S. The algorithm will therefore choose the vertex N, and record it in the table. We will no longer go through this vertex.
|Round 2||.||||.||7L||5 + 2|
Vertex M is the only neighbor of vertex N and it is not derived from the subgraph. It is located at distance 11 from the initial vertex S (taking into account the value of the previous distance which is 7).
|Round 2||.||||.||7L||11N||7 + 4|
We can therefore save the vertex M in the table.
The problem stated at the outset is therefore solved. We thus arrived at summit M from S by the shortest way! To answer this problem, we need to take again the table by crossing it from the end to the beginning. We arrived at the summit M by passing by the summit N, the summit L and the summit S. Moreover, the distance between each summit was cumulated with the one of the previous summit. We finally notice the total distance of the trip. It is of 11.
Only one point is directly connected to this point E and it is point M, located at distance 22 from vertex S.
The problem initially presented is therefore solved. We thus arrived at vertex M from S by the shortest way! To answer this problem, it is only necessary for us to take again the table by going through it from the end to the beginning. We thus placed ourselves on the column M. What is the smallest weight to arrive at M? It is 11N! So we can go up: the smallest weight of N is 7L, the smallest weight of L is 5S and the smallest weight of S? This is our starting point. So by reversing our table, we find the shortest path: S -> L -> N -> M.
To go further
Here is a very complete explanatory video that will allow you to understand in more detail, the functioning of the Dijkstra journey planner algorithm. It is also interesting for better understand journey planner algorithms in general.
RAPTOR, an efficient journey planner algorithm
RAPTOR (RoundbAsed Public Transit Optimized Router) was described in 2012 by Daniel Delling in a Microsoft article. This algorithm quickly became popular, and was integrated into travel planners such as OpenTripPlanner. Indeed, RAPTOR is one of the most efficient journey planner algorithms. It offers quite surprising results in terms of performance. For example, it is able to calculate all possible routes between two random locations of the London network (20,000 stops and 5 million daily departures) in just 8 milliseconds!
But how does he do it?
Daniel Delling’s paper presents the idea of this algorithm rather well. Indeed, first, for each given route (from point A to Z), RAPTOR categorizes it into an itinerary, taking into account the path it follows. Then, for each route, the earliest arrival time is calculated for each stop. Finally, after exploring all these routes, the results are saved as rounds. We then move on to the next one. With this step, RAPTOR gets quite close to the logic of the Dijkstra algorithm, seen in the previous example.
If we take the example above, we can see a route from station A to station C on a direct path. The arrival time at station C will therefore be 12:00. On the second round, we notice that this arrival time has been improved. Indeed, the algorithm proposes us directly to go to station B to take a train and to finally arrive at station C at 11:00. The algorithm works during the whole improvement process. Once all the routes have been checked and the arrival time cannot be improved anymore, the mechanism stops and the request ends.
It is important to remember that RAPTOR operates on a Time of Arrival Improvement Cycle. At the beginning of each round, it examines the stops. If the arrival time at these stops has been improved by the previous round, it will explore the stops crossed by the paths.
To go further
Understanding journey planner algorithms can be quite complicated. Nevertheless, if you want to know more about it, do not hesitate to read the Microsoft article explaining the entire algorithmic process. Moreover, Linus Norton’s article has strongly inspired my explanation of the RAPTOR algorithm.
The Connection Scan Algorithm
In 2013, 54 years after the publication of the Dijkstra algorithm, Julian Dibbelt, Thomas Pajor, Ben Strasser and Dorothea Wagner presents the Connection Scan Algorithm (CSA) in the article Intriguingly Simple and Fast Transit Routing. Known to be an even more powerful algorithm than RAPTOR, it has the advantage of being simpler to understand and implement than the latter.
How does it work?
It does not take many lines to integrate the connection algorithm. The CSA (Connection Scan Algorithm) simply scans a pre-calculated timetable corresponding to all connections. Indeed, a possible journey between two stations is represented by a connection. It will therefore only select the optimal solution in terms of travel time. The timetable that the CSA will go through must be, beforehand, sorted by ascending the departure date.
Here is an example of a time table :
|Departure station||Arrival station||Departure time||Arrival time|
We notice that the table is composed of one column :
- Departure station
- Arrival station
- Departure time
- Arrival time
and of 9 connections represented by the rows of this table.
The objective is to get from a starting point x to an arrival point y by leaving at time t0. To do this, we will go through this connection list considering the time and the arrival station. If they are optimal, we will keep them.
Understanding with an example
Starting from the example in the time-table above, let’s imagine that we want to join C from A, leaving at hour 5.
We will detail this example with the help of a table shown below:
|Connection||N / A|
This table is composed of the different stations of the time-table, the departure times associated with these stations and the established connections. We leave from station A at time 5, so there is no connection at the start.
Let’s start by browsing the time-table from the start. Lines with a start time smaller than 5 are not taken into account. We ignore them.
We thus arrive at the connection (A, B, 5, 6) which enters in our criteria.
Indeed, the departure time of this connection (5) is equal to the departure time of A (5 ≤ 5). We therefore update the table :
|Connection||N / A||(A, B, 5, 6)|||
Let’s move on to the connection (B, D, 5, 6). It does not match our time criteria. Indeed, the departure time of this connection (5) is smaller than the departure time of B (recorded above), which is 6.
Finally, let’s go to the last line of the time table with the connection (B, C, 6, 7). This connection fits perfectly in our time criteria. Indeed, the starting point of this connection is point B, at hour 6. It is the same as station B that we recorded earlier.
Let’s update the table.
|Connection||N / A||(A, B, 5, 6)||(B, C, 6, 7)|
The time-table has been entirely browsed. As a result, the algorithm stops. To get the desired route, just browse the table from the destination. That means from C. We find the connection (B, C, 6, 7). Now, look for the entry associated with the start of the connection. In this case, it’s B. We find (A, B, 5, 6) as the entry point of our connection and our starting point. We have finally completed the search.
You can thus find the route to be taken:
- (A, B, 5, 6)
- (B, C, 6, 7)
The Connection Scan Algorithm thus allowed us to answer the initial problem.
The example presented above is based on a small number of stations, which makes it quite simple to understand and solve. Nevertheless, it should be noted that this journey planner algorithm has a complexity proportional to the number of stations studied.
To go further
If you wish to go further in the understanding of journey planner algorithms do not hesitate to consult the original article.
The transfer model algorithm
The transfer model algorithm was developed in 2010 at Google’s engineering office in Zurich, Switzerland. It was developed by researcher Hannah Blast alongside a number of Google engineers. This algorithm has a higher computational performance compared to other existing algorithms. It is also fast enough to update the result of a search, while you drag the end points of two stops on Google Maps.
A simple concept
The basic concept of the transfer pattern algorithm is rather simple to understand. In a first step, let’s take all the stations of the city of London, for each of these stops the algorithm will pre-calculate all the possible routes between them. Then, it will store each of these trips as transfer patterns. In other words, it will store all the transfers for a trip x to y in a database. For example, for a path from A to B, we will store :
A, C, D, B
A, E, B
A, C, B
When searching between a point x and y, we retrieve all the corresponding patterns between x and y. Then, for each pattern, we retrieve the timetables between each pair of stations making up the pattern. And finally, for each pattern calculated with the schedules, we take the one that arrives the earliest.
In order to understand the algorithmic process in more detail, we will go through a step-by-step explanation.
We assume that beforehand we have the maximum possible journey for a given territory, for example for the city of London this would correspond to millions of pre-calculated journeys.
The initial approach of the algorithm would be that for each path x and y, we will store these transfer patterns in a database.
Let’s take a route from A to K. It will contain the following models :
A, C, D, K
A, D, E, K
A, D, K
A, E, C, K
Now imagine the same type of transfer patterns for all routes in a given territory.
Once the transfer models are stored in a database, they will be easily accessible and reusable when searching for a route. But how can these models be reused ?
When searching for a route, the passenger will have the possibility to use the transfer patterns presented above to reach his destination. All the models corresponding to the A to K trip will therefore be retrieved and processed in order to meet the request. As a reminder, here is our model :
A, C, D, K
A, D, E, K
A, D, K
A, E, C, K
However, retrieving only the patterns is not enough. Indeed, the algorithm will go through a phase of model processing. For this, each of them will be transformed into a list of trips. The algorithm processes this list of transfers into a list of origin and destination pairs, representing the different stages of the trip.
Let’s take the transfer patterns in the list above.
First, for A -> C -> D -> K, the list of pairs would be:
[A, C], [C, D], [D, K]
Then, for A -> D -> E -> K, the list of pairs would be:
[A, D], [D, E], [E, K]
After that, for A -> D -> K, the list of pairs would be:
[A, D], [D, K]
Finally, for A -> E -> C -> K, the list of pairs would be:
[A, E], [E, C], [C, K]
After transforming our transfer patterns into a list of pairs, the algorithm retrieves the list of schedules corresponding to each of them :
A->C C->D D->K
9:00, 9:20 9:20, 9:50 9:40, 10:10
10:00, 10:20 10:20, 10:50 10:40, 11:10
11:00, 11:20 11:20, 11:50 11:40, 12:10
12:00, 12:20 12:20, 12:50 12:40, 13:10
13:00, 13:20 13:20, 13:50 13:40, 14:10
14:00, 14:20 14:20, 14:50 14:40, 15:10
A->D D->E E->K
8:50, 9:20 10:00, 10:20 10:30, 11:00
11:00, 11:20 11:10, 11:30 11:30, 12:00
12:00, 12:20 12:10, 12:40 12:30, 13:00
12:30, 12:50 13:10, 13:40 13:30, 14:00
13:00, 13:20 14:40, 15:00 14:30, 15:00
14:00, 14:20 15:20, 15:30 15:30, 16:00
A->D D->K 8:50, 9:20 9:40, 10:10 11:00, 11:20 10:40, 11:10 12:00, 12:20 11:40, 12:10 12:30, 12:50 12:40, 13:10 13:00, 13:20 13:40, 14:10 14:00, 14:20 14:40, 15:10 A->E E->C D->K 10:00, 10:30 10:10, 10:40 9:40, 10:10 11:00, 11:30 10:50, 11:20 10:40, 11:10 12:00, 12:30 11:40, 12:00 11:40, 12:10 13:00, 13:30 12:20, 13:00 12:40, 13:10 14:00, 14:30 13:20, 14:00 13:40, 14:10 15:00, 15:30 14:20, 15:00 14:40, 15:10
As we have just seen, the algorithm has built a transfer pattern for each trip. This pattern corresponds to all the stops on which the passenger will potentially need to stop. We then transformed these patterns into a list of origin and destination pairs. Finally, for each of these pairs, we assigned a list of schedules.
So what do we have left to do? The last step is to retrieve the pattern with the earliest schedules.
If we take a look at the time-tables above, we notice that two transfer patterns have a pair that indicates its arrival time at 10:10. This seems to be the earliest arrival time to K among all the patterns present. Nevertheless, for the model A->E->C->K, the earliest times of the second pair E->C are : 10:10, 10:40. Indeed, it does not correctly precede the earliest time designated above. Therefore, we do not use this pattern. The pattern above has two pairs, so the schedules follow each other correctly to arrive earlier: at 10:10. When searching for the user, the model retrieved will be : A->D->K.
Corresponding schedules are : 8:50, 9:20 -> 9:40, 10:10
To go further
The transfer model is the fastest journey planner algorithm. It allows you to retrieve results in real time. Nevertheless, this “peer-to-peer” approach has its limitations. Indeed, when it comes to calculating routes for large cities, for a country, or even for an entire continent, it proves to be very expensive. Moreover, it has limitations in terms of long-distance connections (Paris – Madrid, for example). To address this problem, a variant of the transfer model algorithm has been created: the scalable transfer model algorithm. For more information on this subject, please read this article.
The original article presenting the algorithm is also interesting to study if you want to go further in the understanding of journey planner algorithms.
Finally, we saw a special way of storing transfer models, allowing later to speed up requests in real time. Nevertheless, a more complex technique exists, but which allows using less storage memory: the use of acyclic cycles.
Classification of journey planner algorithms
About Lyko and its journey planner algorithms
At Lyko, we have developed our own intermodal trip calculator. Of course, each of the algorithms presented above has these limitations. It was therefore necessary to base ourselves on a “home-made” algorithm, combining the Dijkstra algorithm and the Connection Scan Algorithm (CSA).
The combination of these two algorithms has enabled us to develop a high-performance, resource-efficient route planner.
Now it’s your turn!
In this article, I have just presented the main algorithms for public transport journey planner. However, there are many other algorithms, so you have a lot of choice to start working. Try to find the right balance between performance and power consumption. Indeed, some algorithms require high-performance CPU. Also think about how your data is processed and how to include it correctly in the development of your program.
Discover also the Lyko’s Public Transit API
to integrate easily all GTFS datas in your platform and provide the best intermodal route