Some time ago Deutsche Bahn, the German state-owned railway company, started an Open-Data project including access to their timetable API. At the moment this service is available for long-distance traffic only.

The REST interface offers XML and JSON data including

I’ll use the JSON data only.

The team of DB-Open-Date issues API keys to anyone sending an E-Mail to dbopendata@deutschebahn.com.

Search for stations

The service Location.name supplies the specific IDs to stations matching a certain search string. These stopIDs as I will call them are needed other services I’ll present later. Briefly, you cannot query the API using the name of a station, you’ll always need the respective stopID.

Assumend you want to know the stopID for “Berlin Hbf” (Berlin Main Station), the respective API call would be:

https://open-api.bahn.de/bin/rest.exe/location.name?authKey=xxx&lang=de&input=Berlin+Hbf&format=json

Let’s see how I implemented this in R:

library(RJSONIO)
library(dplyr)
library(magrittr)

getStopID <- function(station = "Berlin Hbf", authKey) {
  station %<>% gsub(" ", "+", .)
  api_url <- paste("http://open-api.bahn.de/bin/rest.exe/location.name?authKey=",authKey,"&lang=de&input=",station,"&format=json", sep="")
  stopID_json <- fromJSON(api_url)
  stopID <- stopID_json$LocationList$StopLocation[[1]][c("name","id")] %>%
    t() %>%
    data.frame(stringsAsFactors = FALSE)
  return(stopID)
}

As you can see, getStopID requires two arguments: the search pattern (which is “Berlin Hbf” by default) and the authentication key authKey. Whitespaces in the search pattern get replaced by + and then the JSON data is fetched from the API. The API actually provides several results for stations that match the search pattern. So when you look for “Berlin” you may get quite a lot of stations and IDs, simply because they have the pattern “Berlin” in their names. Therefore I restricted the function to return the first result only, which is always (?) the most obvious or the main station.

getStopID("Berlin", authKey)
##         name        id
## 1 Berlin Hbf 008011160

Now we have a stopID we can use for further services.

id <- getStopID("Berlin", authKey)$id

Departure boards

The Stationboard service provides the next 20 departures at a given station (represented by it’s stopID) at a given time and date. Moreover this service also provides access to train-specific URLS from which journey details of that train can be fetched (-> later!).

So there are two tasks here:

departureBoard <- function(stopID = "008011160", date = substr(Sys.time() + 1800, 1, 10), time = substr(Sys.time() + 1800, 12, 16), authKey, refs = FALSE) {
  time %<>% gsub(":","%3a",.)
  api_url <- paste("http://open-api.bahn.de/bin/rest.exe/departureBoard?authKey=",authKey,"&lang=de&id=",stopID,"&date=",date,"&time=",time,"&format=json", sep="")
  json <- api_url %>% fromJSON()
  json <- json$DepartureBoard$Departure
  dep <- data.frame(name = sapply(json, "[[", "name") %>% as.character() %>% unlist(),
                    type = sapply(json, "[[", "type") %>% as.character() %>% unlist(),
                    stopid = sapply(json, "[[", "stopid") %>% as.character() %>% unlist(),
                    stop = sapply(json, "[[", "stop") %>% as.character() %>% unlist(),
                    time = sapply(json, "[[", "time") %>% as.character() %>% unlist(),
                    date = sapply(json, "[[", "date") %>% as.character() %>% unlist(),
                    direction = sapply(json, "[[", "direction") %>% as.character() %>% unlist(),
                    track = sapply(json, "[[", "track") %>% as.character() %>% unlist(),
                    JourneyDetailRef = sapply(json, "[[", "JourneyDetailRef") %>% as.character() %>% unlist(),
                    stringsAsFactors = F)
  dep[dep == "NULL"] <- NA
  board <- dep %>%
    select(Date = date, From = stop, To = direction, Time = time, Train = name, Track = track)
  if (refs == TRUE) {
    refs <- dep %>%
      select(Date = date, Time = time, Train = name, Ref = JourneyDetailRef)
    return(list(departureBoard = board, departureBoardRef = refs))
  } else {
    return(board)
  }
}

By default departurBoard() looks for trains leaving Berlin Main Station on Sys.Date() at Sys.time() plus 30 minutes. Further arguments are the necessary authKey and a logical value determining whether the train-specific URLs shall be returned or not.

The way I convert the JSON data to data.frame is probably not the fastest one but it avoids trouble with missing values. Once you have the raw data tidied up a bit the actual work is done. As you can see departureBoard() returns a data.frame representing the departure board when refs = FALSE but returns a list containing the departure board and a second table. This additional table holds the train-specific information.

departureBoard(authKey = authKey)
##          Date              From                 To  Time    Train   Track
## 1  2016-05-04        Berlin Hbf        München Hbf 13:34  ICE 691      13
## 2  2016-05-04 Berlin Hbf (tief)     Hamburg-Altona 13:42 ICE 1682       8
## 3  2016-05-04 Berlin Hbf (tief)     Ostseebad Binz 13:42  IC 2255       7
## 4  2016-05-04 Berlin Hbf (tief)     Düsseldorf Hbf 13:49  ICE 848 2 D - G
## 5  2016-05-04 Berlin Hbf (tief)           Köln Hbf 13:49  ICE 858 2 A - D
## 6  2016-05-04 Berlin Hbf (tief)         Aachen Hbf 13:57  IC 2223       7
## 7  2016-05-04 Berlin Hbf (tief)        München Hbf 14:28 ICE 1683       1
## 8  2016-05-04        Berlin Hbf          Basel SBB 14:31  ICE 375      14
## 9  2016-05-04        Berlin Hbf Amsterdam Centraal 14:34   IC 142      13
## 10 2016-05-04        Berlin Hbf      Gdynia Glowna 14:37    EC 55      11
## 11 2016-05-04 Berlin Hbf (tief)     Hamburg-Altona 14:39  ICE 800       8
## 12 2016-05-04 Berlin Hbf (tief)     Düsseldorf Hbf 14:52  ICE 546 4 D - G
## 13 2016-05-04 Berlin Hbf (tief)        Koblenz Hbf 14:52  ICE 556 4 A - D
## 14 2016-05-04 Berlin Hbf (tief)        Praha hl.n. 15:00   EC 177       1
## 15 2016-05-04 Berlin Hbf (tief) Frankfurt(Main)Hbf 15:03 ICE 1538       2
## 16 2016-05-04 Berlin Hbf (tief)           Köln Hbf 15:03  IC 1921       7
## 17 2016-05-04 Berlin Hbf (tief)     Hamburg-Altona 15:06   EC 174       5
## 18 2016-05-04 Berlin Hbf (tief)        Leipzig Hbf 15:30  IC 2185    <NA>
## 19 2016-05-04 Berlin Hbf (tief)           Köln Hbf 15:34  IC 1952       3
## 20 2016-05-04        Berlin Hbf        München Hbf 15:34  ICE 693      13

For later, let’s store the train-specific information:

dep <- departureBoard(authKey = authKey, refs = TRUE)

Arrival boards

This is pretty much the same as departureBoard() but providing the next 20 arrivals.

arrivalBoard <- function(stopID = "008011160", date = substr(Sys.time() + 1800, 1, 10), time = substr(Sys.time() + 1800, 12, 16), authKey, refs = FALSE) {
  time %<>% gsub(":","%3a",.)
  api_url <- paste("http://open-api.bahn.de/bin/rest.exe/arrivalBoard?authKey=",authKey,"&lang=de&id=",stopID,"&date=",date,"&time=",time,"&format=json", sep="")
  json <- api_url %>% fromJSON()
  json <- json$ArrivalBoard$Arrival
  arr <- data.frame(name = sapply(json, "[[", "name") %>% as.character() %>% unlist(),
                    type = sapply(json, "[[", "type") %>% as.character() %>% unlist(),
                    stopid = sapply(json, "[[", "stopid") %>% as.character() %>% unlist(),
                    stop = sapply(json, "[[", "stop") %>% as.character() %>% unlist(),
                    time = sapply(json, "[[", "time") %>% as.character() %>% unlist(),
                    date = sapply(json, "[[", "date") %>% as.character() %>% unlist(),
                    origin = sapply(json, "[[", "origin") %>% as.character() %>% unlist(),
                    track = sapply(json, "[[", "track") %>% as.character() %>% unlist(),
                    JourneyDetailRef = sapply(json, "[[", "JourneyDetailRef") %>% as.character() %>% unlist(),
                    stringsAsFactors = F)
  arr[arr == "NULL"] <- NA
  board <- arr %>%
    select(Date = date, Stop = stop, Train = name, From = origin, Time = time, Track = track)
  if (refs == TRUE) {
    refs <- arr %>%
      select(Date = date, Time = time, Train = name, Ref = JourneyDetailRef)
    return(list(arrivalBoard = board, arrivalBoardRef = refs))
  } else {
    return(board)
  }
}
arrivalBoard(authKey = authKey)
##          Date              Stop    Train                 From  Time
## 1  2016-05-04 Berlin Hbf (tief) ICE 1682          München Hbf 13:33
## 2  2016-05-04        Berlin Hbf    EC 54        Gdynia Glowna 13:43
## 3  2016-05-04 Berlin Hbf (tief)  ICE 847             Köln Hbf 14:09
## 4  2016-05-04 Berlin Hbf (tief)  ICE 857             Köln Hbf 14:09
## 5  2016-05-04 Berlin Hbf (tief) ICE 1683       Hamburg-Altona 14:19
## 6  2016-05-04 Berlin Hbf (tief)  IC 2252       Ostseebad Binz 14:19
## 7  2016-05-04        Berlin Hbf  ICE 692          München Hbf 14:25
## 8  2016-05-04 Berlin Hbf (tief)  IC 2186          Leipzig Hbf 14:30
## 9  2016-05-04 Berlin Hbf (tief)   EC 177       Hamburg-Altona 14:55
## 10 2016-05-04 Berlin Hbf (tief)   EC 174          Praha hl.n. 14:58
## 11 2016-05-04 Berlin Hbf (tief) ICE 1537   Frankfurt(Main)Hbf 15:01
## 12 2016-05-04 Berlin Hbf (tief)  ICE 547       Düsseldorf Hbf 15:06
## 13 2016-05-04 Berlin Hbf (tief)  ICE 557             Köln Hbf 15:06
## 14 2016-05-04 Berlin Hbf (tief)  IC 1952 Berlin Gesundbrunnen 15:18
## 15 2016-05-04 Berlin Hbf (tief)  ICE 709       Hamburg-Altona 15:21
## 16 2016-05-04        Berlin Hbf   IC 143   Amsterdam Centraal 15:22
## 17 2016-05-04        Berlin Hbf  ICE 372       Interlaken Ost 15:28
## 18 2016-05-04 Berlin Hbf (tief) ICE 1208        Innsbruck Hbf 15:33
## 19 2016-05-04 Berlin Hbf (tief)  IC 2257      Berlin Südkreuz 15:38
## 20 2016-05-04        Berlin Hbf    EC 44   Warszawa Wschodnia 15:43
##      Track
## 1        8
## 2       14
## 3  6 D - G
## 4  6 A - D
## 5        1
## 6        2
## 7       12
## 8        6
## 9        1
## 10       5
## 11       6
## 12 3 D - G
## 13 3 A - D
## 14       3
## 15       1
## 16      11
## 17      12
## 18       7
## 19       6
## 20      14

Journey details

Now we will take a closer look on that train-specific information I talked about earlier. To any train listed in a departure or arrival board there’s a specific URL that calls the Journey Details service from the API. There you get more detailed information on the route of the train containing all stops, arrival and departure times as well as the coordinates of the respective stations.

That’s how I used it:

journeyDetails <- function(refs, Train) {
  ref_url<- refs[refs$Train == Train,]$Ref
  journey <- ref_url %>%
    as.character() %>%
    fromJSON()
  journey <- journey$JourneyDetail$Stops$Stop
  journey <- data.frame(name = journey %>% sapply("[", "name") %>% as.character(),
                        id = journey %>% sapply("[", "id") %>% as.character(),
                        lon = journey %>% sapply("[", "lon") %>% as.numeric(),
                        lat = journey %>% sapply("[", "lat") %>% as.numeric(),
                        routeIdx = journey %>% sapply("[", "routeIdx") %>% as.numeric(),
                        arrTime = journey %>% sapply("[", "arrTime") %>% as.character(),
                        arrDate = journey %>% sapply("[", "arrDate") %>% as.character(),
                        depTime = journey %>% sapply("[", "depTime") %>% as.character(),
                        depDate = journey %>% sapply("[", "depDate") %>% as.character(),
                        track = journey %>% sapply("[", "track") %>% as.character(),
                        stringsAsFactors = F)
  return(journey)
}

journeyDetails requires two arguments: First the stored references and second the name of he respective train. So just take a look back at dep and choose a train name:

journeyDetails(dep$departureBoardRef, "ICE 556")
##                    name      id       lon      lat routeIdx arrTime
## 1  Berlin Gesundbrunnen 8011102 13.388515 52.54896        0    <NA>
## 2     Berlin Hbf (tief) 8098160 13.369548 52.52559        1   14:46
## 3        Berlin-Spandau 8010404 13.197530 52.53447        2   15:01
## 4          Hannover Hbf 8000152  9.741016 52.37676        3   16:28
## 5         Bielefeld Hbf 8000036  8.532722 52.02926        4   17:20
## 6           Hamm(Westf) 8000149  7.807823 51.67808        5   17:48
## 7             Hagen Hbf 8000142  7.460246 51.36274        6   18:22
## 8         Wuppertal Hbf 8000266  7.149543 51.25436        7   18:39
## 9              Köln Hbf 8000207  6.958729 50.94303        8   19:09
## 10             Bonn Hbf 8000044  7.097136 50.73201        9   19:32
## 11            Andernach 8000331  7.404837 50.43454       10   19:59
## 12          Koblenz Hbf 8000206  7.588342 50.35093       11   20:11
##       arrDate depTime    depDate   track
## 1        <NA>   14:41 2016-05-04 7 A - D
## 2  2016-05-04   14:52 2016-05-04 4 A - D
## 3  2016-05-04   15:03 2016-05-04 3 D - G
## 4  2016-05-04   16:31 2016-05-04      12
## 5  2016-05-04   17:22 2016-05-04       4
## 6  2016-05-04   17:54 2016-05-04   10A-C
## 7  2016-05-04   18:24 2016-05-04       7
## 8  2016-05-04   18:41 2016-05-04       1
## 9  2016-05-04   19:12 2016-05-04       7
## 10 2016-05-04   19:37 2016-05-04       3
## 11 2016-05-04   20:01 2016-05-04       1
## 12 2016-05-04    <NA>       <NA>       4

The returned table lists all stations the train will head for on its way from Berlin Gesundbrunnen to Koblenz Hbf with the respective arrival and departure times, the order of stops and the stations’ coordinates.

Make some plots

journeyDetails() offers the opportunity to plot a train’s route on a map including some additional information. For a brief example I will take the journey details from above.

journey <- journeyDetails(dep$departureBoardRef, "ICE 556")
journey$time <- ifelse(!is.na(journey$depTime), journey$depTime, journey$arrTime)

library(ggplot2)
library(ggmap)
library(ggrepel)

ger <- get_map(location = c(5.14,47.13,15.36,55.25), zoom = 6, maptype = "toner-background")

ggmap(ger) +
  geom_point(data = journey, aes(x = lon, y = lat), size = 3, colour = "blue", alpha= 0.6) +
  geom_path(data = journey, aes(x = lon, y = lat), size = 1, colour = "blue", alpha = 0.4) +
  geom_text_repel(data = journey, aes(x = lon, y = lat, label = paste(name)), 
                  colour = "red2", box.padding = unit(0.2, "lines"), fontface = "bold", point.padding = unit(0.3, "lines"))

Well this is definitely not a nice plot, but it surely reveals the possibilities you have.

DBopen

In the next days I’ll try to make a package (DBopen) out of these functions and maybe add some further functionality. As always: Feel free to contribute!