Some time ago Deutsche Bahn, the German state-owned railway company, started an Open-Data project including access to their timetable API. At the moment this service is available for long-distance traffic only.
The REST interface offers XML and JSON data including
I’ll use the JSON data only.
The team of DB-Open-Date issues API keys to anyone sending an E-Mail to dbopendata@deutschebahn.com.
The service Location.name
supplies the specific IDs to stations matching a certain search string. These stopID
s as I will call them are needed other services I’ll present later. Briefly, you cannot query the API using the name of a station, you’ll always need the respective stopID
.
Assumend you want to know the stopID
for “Berlin Hbf” (Berlin Main Station), the respective API call would be:
https://open-api.bahn.de/bin/rest.exe/location.name?authKey=xxx&lang=de&input=Berlin+Hbf&format=json
Let’s see how I implemented this in R:
library(RJSONIO)
library(dplyr)
library(magrittr)
getStopID <- function(station = "Berlin Hbf", authKey) {
station %<>% gsub(" ", "+", .)
api_url <- paste("http://open-api.bahn.de/bin/rest.exe/location.name?authKey=",authKey,"&lang=de&input=",station,"&format=json", sep="")
stopID_json <- fromJSON(api_url)
stopID <- stopID_json$LocationList$StopLocation[[1]][c("name","id")] %>%
t() %>%
data.frame(stringsAsFactors = FALSE)
return(stopID)
}
As you can see, getStopID
requires two arguments: the search pattern (which is “Berlin Hbf” by default) and the authentication key authKey
. Whitespaces in the search pattern get replaced by +
and then the JSON data is fetched from the API. The API actually provides several results for stations that match the search pattern. So when you look for “Berlin” you may get quite a lot of stations and IDs, simply because they have the pattern “Berlin” in their names. Therefore I restricted the function to return the first result only, which is always (?) the most obvious or the main station.
getStopID("Berlin", authKey)
## name id
## 1 Berlin Hbf 008011160
Now we have a stopID
we can use for further services.
id <- getStopID("Berlin", authKey)$id
The Stationboard
service provides the next 20 departures at a given station (represented by it’s stopID
) at a given time and date. Moreover this service also provides access to train-specific URLS from which journey details of that train can be fetched (-> later!).
So there are two tasks here:
departureBoard <- function(stopID = "008011160", date = substr(Sys.time() + 1800, 1, 10), time = substr(Sys.time() + 1800, 12, 16), authKey, refs = FALSE) {
time %<>% gsub(":","%3a",.)
api_url <- paste("http://open-api.bahn.de/bin/rest.exe/departureBoard?authKey=",authKey,"&lang=de&id=",stopID,"&date=",date,"&time=",time,"&format=json", sep="")
json <- api_url %>% fromJSON()
json <- json$DepartureBoard$Departure
dep <- data.frame(name = sapply(json, "[[", "name") %>% as.character() %>% unlist(),
type = sapply(json, "[[", "type") %>% as.character() %>% unlist(),
stopid = sapply(json, "[[", "stopid") %>% as.character() %>% unlist(),
stop = sapply(json, "[[", "stop") %>% as.character() %>% unlist(),
time = sapply(json, "[[", "time") %>% as.character() %>% unlist(),
date = sapply(json, "[[", "date") %>% as.character() %>% unlist(),
direction = sapply(json, "[[", "direction") %>% as.character() %>% unlist(),
track = sapply(json, "[[", "track") %>% as.character() %>% unlist(),
JourneyDetailRef = sapply(json, "[[", "JourneyDetailRef") %>% as.character() %>% unlist(),
stringsAsFactors = F)
dep[dep == "NULL"] <- NA
board <- dep %>%
select(Date = date, From = stop, To = direction, Time = time, Train = name, Track = track)
if (refs == TRUE) {
refs <- dep %>%
select(Date = date, Time = time, Train = name, Ref = JourneyDetailRef)
return(list(departureBoard = board, departureBoardRef = refs))
} else {
return(board)
}
}
By default departurBoard()
looks for trains leaving Berlin Main Station on Sys.Date()
at Sys.time()
plus 30 minutes. Further arguments are the necessary authKey
and a logical value determining whether the train-specific URLs shall be returned or not.
The way I convert the JSON data to data.frame
is probably not the fastest one but it avoids trouble with missing values. Once you have the raw data tidied up a bit the actual work is done. As you can see departureBoard()
returns a data.frame
representing the departure board when refs = FALSE
but returns a list containing the departure board and a second table. This additional table holds the train-specific information.
departureBoard(authKey = authKey)
## Date From To Time Train Track
## 1 2016-05-04 Berlin Hbf München Hbf 13:34 ICE 691 13
## 2 2016-05-04 Berlin Hbf (tief) Hamburg-Altona 13:42 ICE 1682 8
## 3 2016-05-04 Berlin Hbf (tief) Ostseebad Binz 13:42 IC 2255 7
## 4 2016-05-04 Berlin Hbf (tief) Düsseldorf Hbf 13:49 ICE 848 2 D - G
## 5 2016-05-04 Berlin Hbf (tief) Köln Hbf 13:49 ICE 858 2 A - D
## 6 2016-05-04 Berlin Hbf (tief) Aachen Hbf 13:57 IC 2223 7
## 7 2016-05-04 Berlin Hbf (tief) München Hbf 14:28 ICE 1683 1
## 8 2016-05-04 Berlin Hbf Basel SBB 14:31 ICE 375 14
## 9 2016-05-04 Berlin Hbf Amsterdam Centraal 14:34 IC 142 13
## 10 2016-05-04 Berlin Hbf Gdynia Glowna 14:37 EC 55 11
## 11 2016-05-04 Berlin Hbf (tief) Hamburg-Altona 14:39 ICE 800 8
## 12 2016-05-04 Berlin Hbf (tief) Düsseldorf Hbf 14:52 ICE 546 4 D - G
## 13 2016-05-04 Berlin Hbf (tief) Koblenz Hbf 14:52 ICE 556 4 A - D
## 14 2016-05-04 Berlin Hbf (tief) Praha hl.n. 15:00 EC 177 1
## 15 2016-05-04 Berlin Hbf (tief) Frankfurt(Main)Hbf 15:03 ICE 1538 2
## 16 2016-05-04 Berlin Hbf (tief) Köln Hbf 15:03 IC 1921 7
## 17 2016-05-04 Berlin Hbf (tief) Hamburg-Altona 15:06 EC 174 5
## 18 2016-05-04 Berlin Hbf (tief) Leipzig Hbf 15:30 IC 2185 <NA>
## 19 2016-05-04 Berlin Hbf (tief) Köln Hbf 15:34 IC 1952 3
## 20 2016-05-04 Berlin Hbf München Hbf 15:34 ICE 693 13
For later, let’s store the train-specific information:
dep <- departureBoard(authKey = authKey, refs = TRUE)
This is pretty much the same as departureBoard()
but providing the next 20 arrivals.
arrivalBoard <- function(stopID = "008011160", date = substr(Sys.time() + 1800, 1, 10), time = substr(Sys.time() + 1800, 12, 16), authKey, refs = FALSE) {
time %<>% gsub(":","%3a",.)
api_url <- paste("http://open-api.bahn.de/bin/rest.exe/arrivalBoard?authKey=",authKey,"&lang=de&id=",stopID,"&date=",date,"&time=",time,"&format=json", sep="")
json <- api_url %>% fromJSON()
json <- json$ArrivalBoard$Arrival
arr <- data.frame(name = sapply(json, "[[", "name") %>% as.character() %>% unlist(),
type = sapply(json, "[[", "type") %>% as.character() %>% unlist(),
stopid = sapply(json, "[[", "stopid") %>% as.character() %>% unlist(),
stop = sapply(json, "[[", "stop") %>% as.character() %>% unlist(),
time = sapply(json, "[[", "time") %>% as.character() %>% unlist(),
date = sapply(json, "[[", "date") %>% as.character() %>% unlist(),
origin = sapply(json, "[[", "origin") %>% as.character() %>% unlist(),
track = sapply(json, "[[", "track") %>% as.character() %>% unlist(),
JourneyDetailRef = sapply(json, "[[", "JourneyDetailRef") %>% as.character() %>% unlist(),
stringsAsFactors = F)
arr[arr == "NULL"] <- NA
board <- arr %>%
select(Date = date, Stop = stop, Train = name, From = origin, Time = time, Track = track)
if (refs == TRUE) {
refs <- arr %>%
select(Date = date, Time = time, Train = name, Ref = JourneyDetailRef)
return(list(arrivalBoard = board, arrivalBoardRef = refs))
} else {
return(board)
}
}
arrivalBoard(authKey = authKey)
## Date Stop Train From Time
## 1 2016-05-04 Berlin Hbf (tief) ICE 1682 München Hbf 13:33
## 2 2016-05-04 Berlin Hbf EC 54 Gdynia Glowna 13:43
## 3 2016-05-04 Berlin Hbf (tief) ICE 847 Köln Hbf 14:09
## 4 2016-05-04 Berlin Hbf (tief) ICE 857 Köln Hbf 14:09
## 5 2016-05-04 Berlin Hbf (tief) ICE 1683 Hamburg-Altona 14:19
## 6 2016-05-04 Berlin Hbf (tief) IC 2252 Ostseebad Binz 14:19
## 7 2016-05-04 Berlin Hbf ICE 692 München Hbf 14:25
## 8 2016-05-04 Berlin Hbf (tief) IC 2186 Leipzig Hbf 14:30
## 9 2016-05-04 Berlin Hbf (tief) EC 177 Hamburg-Altona 14:55
## 10 2016-05-04 Berlin Hbf (tief) EC 174 Praha hl.n. 14:58
## 11 2016-05-04 Berlin Hbf (tief) ICE 1537 Frankfurt(Main)Hbf 15:01
## 12 2016-05-04 Berlin Hbf (tief) ICE 547 Düsseldorf Hbf 15:06
## 13 2016-05-04 Berlin Hbf (tief) ICE 557 Köln Hbf 15:06
## 14 2016-05-04 Berlin Hbf (tief) IC 1952 Berlin Gesundbrunnen 15:18
## 15 2016-05-04 Berlin Hbf (tief) ICE 709 Hamburg-Altona 15:21
## 16 2016-05-04 Berlin Hbf IC 143 Amsterdam Centraal 15:22
## 17 2016-05-04 Berlin Hbf ICE 372 Interlaken Ost 15:28
## 18 2016-05-04 Berlin Hbf (tief) ICE 1208 Innsbruck Hbf 15:33
## 19 2016-05-04 Berlin Hbf (tief) IC 2257 Berlin Südkreuz 15:38
## 20 2016-05-04 Berlin Hbf EC 44 Warszawa Wschodnia 15:43
## Track
## 1 8
## 2 14
## 3 6 D - G
## 4 6 A - D
## 5 1
## 6 2
## 7 12
## 8 6
## 9 1
## 10 5
## 11 6
## 12 3 D - G
## 13 3 A - D
## 14 3
## 15 1
## 16 11
## 17 12
## 18 7
## 19 6
## 20 14
Now we will take a closer look on that train-specific information I talked about earlier. To any train listed in a departure or arrival board there’s a specific URL that calls the Journey Details
service from the API. There you get more detailed information on the route of the train containing all stops, arrival and departure times as well as the coordinates of the respective stations.
That’s how I used it:
journeyDetails <- function(refs, Train) {
ref_url<- refs[refs$Train == Train,]$Ref
journey <- ref_url %>%
as.character() %>%
fromJSON()
journey <- journey$JourneyDetail$Stops$Stop
journey <- data.frame(name = journey %>% sapply("[", "name") %>% as.character(),
id = journey %>% sapply("[", "id") %>% as.character(),
lon = journey %>% sapply("[", "lon") %>% as.numeric(),
lat = journey %>% sapply("[", "lat") %>% as.numeric(),
routeIdx = journey %>% sapply("[", "routeIdx") %>% as.numeric(),
arrTime = journey %>% sapply("[", "arrTime") %>% as.character(),
arrDate = journey %>% sapply("[", "arrDate") %>% as.character(),
depTime = journey %>% sapply("[", "depTime") %>% as.character(),
depDate = journey %>% sapply("[", "depDate") %>% as.character(),
track = journey %>% sapply("[", "track") %>% as.character(),
stringsAsFactors = F)
return(journey)
}
journeyDetails
requires two arguments: First the stored references and second the name of he respective train. So just take a look back at dep
and choose a train name:
journeyDetails(dep$departureBoardRef, "ICE 556")
## name id lon lat routeIdx arrTime
## 1 Berlin Gesundbrunnen 8011102 13.388515 52.54896 0 <NA>
## 2 Berlin Hbf (tief) 8098160 13.369548 52.52559 1 14:46
## 3 Berlin-Spandau 8010404 13.197530 52.53447 2 15:01
## 4 Hannover Hbf 8000152 9.741016 52.37676 3 16:28
## 5 Bielefeld Hbf 8000036 8.532722 52.02926 4 17:20
## 6 Hamm(Westf) 8000149 7.807823 51.67808 5 17:48
## 7 Hagen Hbf 8000142 7.460246 51.36274 6 18:22
## 8 Wuppertal Hbf 8000266 7.149543 51.25436 7 18:39
## 9 Köln Hbf 8000207 6.958729 50.94303 8 19:09
## 10 Bonn Hbf 8000044 7.097136 50.73201 9 19:32
## 11 Andernach 8000331 7.404837 50.43454 10 19:59
## 12 Koblenz Hbf 8000206 7.588342 50.35093 11 20:11
## arrDate depTime depDate track
## 1 <NA> 14:41 2016-05-04 7 A - D
## 2 2016-05-04 14:52 2016-05-04 4 A - D
## 3 2016-05-04 15:03 2016-05-04 3 D - G
## 4 2016-05-04 16:31 2016-05-04 12
## 5 2016-05-04 17:22 2016-05-04 4
## 6 2016-05-04 17:54 2016-05-04 10A-C
## 7 2016-05-04 18:24 2016-05-04 7
## 8 2016-05-04 18:41 2016-05-04 1
## 9 2016-05-04 19:12 2016-05-04 7
## 10 2016-05-04 19:37 2016-05-04 3
## 11 2016-05-04 20:01 2016-05-04 1
## 12 2016-05-04 <NA> <NA> 4
The returned table lists all stations the train will head for on its way from Berlin Gesundbrunnen to Koblenz Hbf with the respective arrival and departure times, the order of stops and the stations’ coordinates.
journeyDetails()
offers the opportunity to plot a train’s route on a map including some additional information. For a brief example I will take the journey details from above.
journey <- journeyDetails(dep$departureBoardRef, "ICE 556")
journey$time <- ifelse(!is.na(journey$depTime), journey$depTime, journey$arrTime)
library(ggplot2)
library(ggmap)
library(ggrepel)
ger <- get_map(location = c(5.14,47.13,15.36,55.25), zoom = 6, maptype = "toner-background")
ggmap(ger) +
geom_point(data = journey, aes(x = lon, y = lat), size = 3, colour = "blue", alpha= 0.6) +
geom_path(data = journey, aes(x = lon, y = lat), size = 1, colour = "blue", alpha = 0.4) +
geom_text_repel(data = journey, aes(x = lon, y = lat, label = paste(name)),
colour = "red2", box.padding = unit(0.2, "lines"), fontface = "bold", point.padding = unit(0.3, "lines"))
Well this is definitely not a nice plot, but it surely reveals the possibilities you have.
In the next days I’ll try to make a package (DBopen
) out of these functions and maybe add some further functionality. As always: Feel free to contribute!