Bus, from the Latin omnibus, meaning “to or for, by, with or from everyone”
This is an R Markdown Notebook. Like the other Notebooks in this package, it contains a self-learning data science tutorial. This series of tutorials is based around public transport data. We will work with data from Southeast Queensland, Australia, but you can probably run many of the the examples on data from your city.
We will start with ‘static GTFS data’. GTFS is a standard format specification for public transit timetables and routes. The SE Queensland data are available from Translink. They are also in the file ../static_data/SEQ_GTFS.zip. The data are in a series of CSV files (but with .txt extensions)
First, we unzip the supplied file into a temporary directory. The temporary direction will be deleted when you quit R.
staticGTFS<-tempdir()
unzip("../static_data/SEQ_GTFS.zip",exdir=staticGTFS) #FIXME will use package.file() eventually
list.files(staticGTFS)
## [1] "agency.txt" "calendar_dates.txt" "calendar.txt"
## [4] "feed_info.txt" "routes.txt" "shapes.txt"
## [7] "stop_times.txt" "stops.txt" "trips.txt"
stops<-read.csv(paste(staticGTFS,"stops.txt",sep="/"))
summary(stops)
## stop_id stop_code
## 1 : 1 Min. : 1
## 10 : 1 1st Qu.: 4577
## 100 : 1 Median :300210
## 1000 : 1 Mean :171094
## 10000 : 1 3rd Qu.:311491
## 10001 : 1 Max. :600854
## (Other):12852 NA's :197
## stop_name stop_desc
## Beams Rd at St Flannans : 4 Mode:logical
## Cleveland Redland Bay Rd near Beveridge Rd: 4 NA's:12858
## Grieve Rd near Pillinger Rd : 4
## Miles Platting Rd near Gardner Rd : 4
## Mount Cotton Rd near Killarney Cr : 4
## Musgrave Ave near Botanical Dr : 4
## (Other) :12834
## stop_lat stop_lon zone_id
## Min. :-28.17 Min. :152.1 2 :4829
## 1st Qu.:-27.62 1st Qu.:153.0 3 :2473
## Median :-27.50 Median :153.1 1 :2421
## Mean :-27.48 Mean :153.1 5 :1181
## 3rd Qu.:-27.40 3rd Qu.:153.1 6 : 752
## Max. :-26.16 Max. :153.5 4 : 541
## (Other): 661
## stop_url location_type
## : 320 Min. :0.00000
## http://translink.com.au/stop/000001/gtfs/: 1 1st Qu.:0.00000
## http://translink.com.au/stop/000002/gtfs/: 1 Median :0.00000
## http://translink.com.au/stop/000003/gtfs/: 1 Mean :0.01532
## http://translink.com.au/stop/000004/gtfs/: 1 3rd Qu.:0.00000
## http://translink.com.au/stop/000005/gtfs/: 1 Max. :1.00000
## (Other) :12533
## parent_station platform_code
## :12231 :12205
## place_QSBS : 17 1 : 190
## place_INTGCY: 15 2 : 178
## place_INTCAR: 11 3 : 49
## place_INTCHE: 10 A : 44
## place_INTCAN: 9 B : 39
## (Other) : 565 (Other): 153
We can plot the stops to get the crudest possible map
plot(stop_lat~stop_lon,data=stops)
This isn’t a very good map: the points are ugly, and the aspect ratio is wrong. A unit of longitude is larger than a unit of latitude except at the equator. The scale factor is the cosine of the latitude: in Brisbane, about 0.89
We can also use the zone_id variable to colour the points, and can make them smaller.
plot(stop_lat~stop_lon,data=stops, asp=1/0.89, col=zone_id,pch=19,cex=.5)
The zones don’t match what you’d expect: the colours seem to be reused in each population centre. Let’s try a table
table(stops$zone_id)
##
## 1 1/2 2 2/3 3 3/4 4 4/5 5 5/6 6 6/7 7 8
## 201 2421 126 4829 24 2473 3 541 9 1181 7 752 1 85 205
Oh. There are stops on the zone boundaries. The data are more complicated than we expected.
For a simpler map, I’ll recode the zones so each stop is listed only in its further-out zone.
stops$outzone<-factor(c(1,2,2,3,3,4,4,5,5,6,6,7,7,8)[stops$zone_id])
stops$inzone<-factor(c(1,1,2,2,3,3,4,4,5,5,6,6,7,8)[stops$zone_id])
plot(stop_lat~stop_lon,data=stops, asp=1/0.89, col=outzone,pch=19,cex=.5)
But it’s going to be useful to see where these stops really are: let’s try to get them on a street map. A quick search finds a package called “RgoogleMaps”
library("RgoogleMaps")
center <- with(stops, c(mean(stop_lat), mean(stop_lon)))
zoom <- with(stops, min(MaxZoom(range(stop_lat), range(stop_lon))))
MyMap <- GetMap(center=center, zoom=zoom, destfile = "MyTile1.png")
tmp <- PlotOnStaticMap(MyMap, lat = stops$stop_lat,lon = stops$stop_lon,destfile = "MyTile1.png", cex=1.5,pch=20,col=palette()[stops$outzone], add=FALSE)
The stops overlay southeast Queensland as we expect, so the data are broadly reasonable
Now lets zoom in a bit more, and make the points partially transparent
MyMap <- GetMap(center=center, zoom=zoom+3, destfile = "MyTile1.png")
tmp <- PlotOnStaticMap(MyMap, lat = stops$stop_lat,lon = stops$stop_lon,destfile = "MyTile1.png", cex=1.5,pch=20,col=adjustcolor(palette()[stops$outzone],alpha.f=.3), add=FALSE)
The railway stations at the airport and in central Brisbane are in the right places, and the buses follow the roads. The data look basically reasonable and we’re happy.