M'DECRYPT

Mobilitech, Public Transit

27/09/2021

9 min

Public transport: The main data exchange formats

When building a MaaS application, data is a key resource. Indeed, it plays an important role in the exchange between mobility actors. In a previous article, we talked about intermodal trip planners, an indispensable tool in any MaaS solution. The algorithms behind these planners are partly powered by public transport data. Without an adapted standard, they would be impossible to integrate. Today, we realize that the more varied data there is, the greater the scope of a MaaS application. This data can be used to feed the route planner by increasing its search field and making it more customizable.

In Europe, there are several million public transport stations, including metro, bus, tramway, etc. The relative data that we will call transport data are multiple and have various functions. Indeed, they detail and inform us about the different elements that constitute a public transport network. For example: schedules, lines, stops, etc. In fact, we can classify this data in 2 types: theoretical and real time transport data.

In this article, we will zoom in on the world of transportation data. In order to facilitate the exchange between the different actors of mobility, you will understand how these data have been formalized.

Transmodel, the reference model

Before presenting the different transport data formats, it is important to introduce the reference model, Transmodel. In the development of public transport information systems, there are times when two systems want to exchange and/or be integrated with each other. However, this integration may be difficult to achieve because one system is unable to understand the data of the other.

Here is a concrete diagram that demonstrates the complexity of the exchange of different business applications in a public transport information system.

Here the different business interfaces communicate with each other 2 by 2, which makes the scalability of the system impossible.

To address this problem, Transmodel, through its documentation, provides a conceptual data model. This DCM describes the main data structures that are used for public transport information systems, including trip planning, timetable planning, fares, real time data, etc. As a result, Transmodel becomes the key to system interoperability. In 2005, Transmodel V5.1 was adopted as the European standard EN12896 by the European Committee for Standardization. As a standard reference, its documentation describes the semantics of public transport data. This model is useful for two types of audiences: public transport authorities and software and system designers.

Transmodel is therefore defined as a central interface facilitating exchanges between the various business applications. The Transmodel MCD is a unique reference model of the data and will concern all of the carrier’s businesses.

The data exchange formats that we are going to see are all based (with some exceptions) on the conceptual reference model that Transmodel V6 provides.

Theoretical transport data

The first reference formats we will discuss are the so-called static formats. Indeed, these correspond to theoretical transport data, they are composed of schedules, stations, stops, lines, etc. This type of data is not updated regularly.

NeTex

Based on the Transmodel abstract model, NeTex or Network Transport Exchange is a format for the exchange of theoretical data for public transport, defined at European level. In the context of system interoperability and in order to harmonize information exchange, NeTex provides a means of exchanging information such as stops, route schedules and fares, between different IT systems. Indeed, it will set up common rules for the dissemination and exchange of this data between different actors, which can be transport operators, local authorities, application developers, etc. NeTex offers a very wide functional scope, wider than GTFS in particular.

As explained in the portal of standards for public transport offer data, a NeTex format file is an XML file. This set of xml files, which we will call “profile“, will define the whole network: stops, lines, timetables, fares, etc. Each European country defines the way it wants to exchange these XML files. France has defined 5 NeTex profiles:

The “Stops” profile, which is the exchange profile for the description of public transport stops.
The “Networks” profile, which presents the topology of public transport networks.
The “Schedules” profile, which is an exchange profile for the description of public transport schedules.
“Accessibility”, is an exchange profile for the description of the accessibility of public transport networks.
The “Fares” profile covers fare information, including complex fare models.

The NeTex format is already well used in Europe. Indeed, several companies have already adopted this format. In England it was used for the London Olympics and other European countries are already finalising their profiles.

GTFS

In 2011, Google developed a new unofficial standard, General Transit Feed Specifications. It defines a file format for transit schedules and associated geographic information. In the USA, it is considered as a basic standard, as well as for the publication of transit data in open data. A whole community of developers and tools has grown up around this format, and it is now the most widely known and used format in the world. GTFS is also known as static GTFS to differentiate it from GTFS-realtime, the specification for real-time data.

The standard includes what are called GTFS streams, which are actually a set of text files grouped in a ZIP folder. Each of these files is in CSV format and presents a type of data (stops, schedules, etc.). GTFS is indeed simpler than NeTex.

GTFS flows

As I mentioned earlier, there are several files that we call streams that make up the GTFS format, which are actually text files. Each of these files models and defines a specific aspect of public transport information (stops, routes, trips and schedule information). A public transportation agency has the ability to produce a GTFS stream to share this public transportation information. Developers who want to integrate this information into their applications can use these streams. Let’s take the example of Lyko: in the context of the creation of its intermodal trip planner, it used open data, notably via the transport.data.gouv platform. This GTFS format data was therefore used to feed its calculator. Let’s see together, the main flows present in the GTFS format:

agency.txt, defines the transport agencies.

stops.txt, lists all stop points.

routes.txt, defines the routes in origin-destination format (set of routes).

trips.txt, presents the different trips for each route (composed of at least two stops)

stops_times.txt, lists the arrival and departure times from a specific stop

Other data streams may be present but are optional. If you want to know about them, you can go to the dedicated page. GTFS is indeed very relevant and is the most widespread format for open data. Nevertheless, it remains much less rich than a format like NeTex.

Real-time transportation data

Like static transportation data, real-time transportation data is updated regularly. There are two types of formats: SIRI, a standard based on Transmodel and GTFS-realtime, a popular standard developed by Google. They both allow to answer a need for real-time exchanges, in particular with regard to route calculation applications. They also improve the relevance of MaaS solutions.

SIRI

The first real time exchange format we are going to see is the SIRI (Service Interface for Real time Information) standard which is based on the TRANSMODEL reference model. In October 2006, the CEN (European Committee for Standardization) established it as a European standard. SIRI will allow the exchange of information from public transport operators in real time between servers in XML format. This information will be integrated for various and precise uses. Indeed, they will be used in particular to update the schedules of the stops in real time on Internet, mobile etc. To distribute status messages about services or to coordinate bus movements on a territory. SIRI offers 8 different web services, called profiles, that comply with the standard and meet a set of identified needs:

SM – Stop Monitoring, schedules by stop
GM – General Message, information messages
ET – Estimated Timetable, schedules by line
VM – Vehicle Monitoring, vehicle tracking
PT – Production Timetable, planned schedules by line
CT – Connection Monitoring, supervision of correspondence
SX – Situation Exchange, detailed information on disruptions
FM – Facility Monitoring, condition of the equipment (accessibility)

For a detailed explanation of these profiles, you can visit the VDV website (leading CEN group).

SIRI is above all a specified and complex standard. Very comprehensive, the last update of its specifications was made in 2015. It is composed of 3 parts:

Context and Framework ( CEN TS 15531-1:2015 )
Communications infrastructure ( CEN TS 15531-2:2015 )
Functional service interfaces ( CEN TS 15531-3:2015 )

Two additional parts were published in 2010:

Functional service interfaces – Facility Monitoring ( CEN/TS 15531-4 )
Functional service interfaces – Situation Exchange ( CEN/TS 15531-5 )

You can access the SIRI specifications and XSD schema through the Transit Supply Data Standards Portal.

The SIRI standard is therefore useful for system-to-system exchanges. It uses the SOAP protocol (Simple Object Access Protocol) in XML format and encompasses a set of subdomains (profiles).

@wikipedia | Multimodal screen using SIRI

Siri Lite

As SIRI is not adapted for open data and web application development, a derived standard has been developed: Siri Lite. Indeed, real time information is also useful for application development, especially to provide information about upcoming events, disruption alerts, etc. Moreover, it uses the REST protocol to facilitate its integration. This protocol is used in an almost systematic way in open data context. In order to be made much more compact, Siri Lite uses only a few services of SIRI :

SM – Stop Monitoring, horaires par arrêt
GM – General Message, messages d’information
ET – Estimated Timetable, horaires par ligne
VM – Vehicle Monitoring, suivi des véhicules

GTFS-realtime

GTFS-RT or GTFS real-time is an extension of GTFS (General Transit Feed Specification). It allows users to benefit from a real-time update on public transportation. Indeed, public transport agencies, through GTFS real-time, will provide application developers with a real-time update of their fleets. This requires a prior GTFS exchange with consistency between planned and real-time data. We are talking here about stream specification, this term may seem quite complex but in reality, it is not. It is actually a data flow, updated at regular intervals by the transit agencies.

It is therefore the transit agencies that publish the GTFS real-time streams, presented in 3 types they are called stream entities:

Route updates: delays, cancellations, modified routes.
Service alerts: displaced stops, unexpected events affecting a station, a route or the entire network
Vehicle location: information about vehicles, including their location and traffic density.

If you want to know more about the different flow entities, click here.

The data exchange format of GTFS-RT is not based on a CSV format like GTFS but is based on Protocol Buffers. This system is much more compact and optimized than SIRI, but is still much less suitable for the public market.

Between standardized formats and popular standards

In this article, we have seen the main data formats. On the one side, we have the NeTex and SIRI formats based on Transmodel, a standardized model with complex specifications. Large institutions have promoted and developed these formats. They also have the will to democratize these standards. Indeed, they are pushing public transport agencies to use them for the publication of their data.

On the other side, we have the popular GTFS standard, developed and promoted by the giant Google. This standard was originally set up to power the Google Maps tool. A large community of developers then helped to carry the project. This one is now the most used, and most suitable for open data, application development and route planners. Lyko has decided to make GTFS its exchange format to feed its intermodal trip planner.

So we can see a parallel between standardized formats and popular standards, public institutions and private institutions, European groupings and GAFA members. Nevertheless, this parallel tends towards a common goal, to make transport data as accessible as possible in order to promote the development of mobility and MaaS.

Conclusion

In this article, we have seen the list of different data exchange formats related to public transport. These exchanges between the different mobility actors are favourable to the evolution of the MaaS concept and new intermodal trip calculators. Nevertheless, public transport data are not the only ones to favour this. In fact, it was in 2014 that a grouping of public and private sector organizations decided to develop the GBFS shared mobility data format. General Bikeshared Feed Specification is a new uniform standard for real-time data exchange, for free-floating mobility and shared car. The North American Bikeshare Association (NABSA) is the organization promoting this standard.

Find the specifications of GBFS on Github.

You might also like:
How can machine learning take MaaS to the next level?