- Introductory comments
- An example: the Propensity to Cycle Tool
- dplyr vs tibbles
- Discussion: can GDS save the world?
2018-04-11, Institut für Geographie - Friedrich-Schiller-Universität Jena
Many ways of saying the same thing:
My definition: building an evidence-base for sustainable systems.
Code example:
d = frame_data( ~Attribute, ~GIS, ~GDS, "Home disciplines", "Geography", "Geography, Computing, Statistics", "Software focus", "Graphic User Interface", "Code", "Reproduciblility", "Minimal", "Maximal" )
knitr::kable(d)
| Attribute | GIS | GDS |
|---|---|---|
| Home disciplines | Geography | Geography, Computing, Statistics |
| Software focus | Graphic User Interface | Code |
| Reproduciblility | Minimal | Maximal |
Reasoning:
"Their very spirit undergoes a pervasive transformation,” and they finally end up as “experts at exchanging smiles, handshakes, and favors." (Reclus 2013, original: 1898)
Source: Brent Toderian
Source: Brent Toderian
Source: Brent Toderian

~200 km cycle network in Seville, Spain. Source: WHO report at ATFutures/who
The front page of the open source, open access Propensity to Cycle Tool (PCT).


| Tool | Scale | Coverage | Public access | Format of output | Levels of analysis | Software licence |
|---|---|---|---|---|---|---|
| Propensity to Cycle Tool | National | England | Yes | Online map | A, OD, R, RN | Open source |
| Prioritization Index | City | Montreal | No | GIS-based | P, A, R | Proprietary |
| PAT | Local | Parts of Dublin | No | GIS-based | A, OD, R | Proprietary |
| Usage intensity index | City | Belo Horizonte | No | GIS-based | A, OD, R, I | Proprietary |
| Cycling Potential Tool | City | London | No | Static | A, I | Unknown |
| Santa Monica model | City | Santa Monica | No | Static | P, OD, A | Unknown |
"The PCT is a brilliant example of using Big Data to better plan infrastructure investment. It will allow us to have more confidence that new schemes are built in places and along travel corridors where there is high latent demand."
"The PCT shows the country’s great potential to get on their bikes, highlights the areas of highest possible growth and will be a useful innovation for local authorities to get the greatest bang for their buck from cycling investments and realise cycling potential."
Included in Cycling and Walking Infrastructure Strategy (CWIS)


Aim: tackle the challenge that cycling uptake is often limited by infrastructural barriers which could be remediated cost-effectively, yet investment is often spent on less cost-effective interventions, based on assessment of only a few options.


See: https://www.cyipt.bike (password protected)


\[ logit(pcycle) = \alpha + \beta_1 d + \beta_2 d^{0.5} + \beta_3 d^2 + \gamma h + \delta_1 d h + \delta_2 d^{0.5} h \]
logit_pcycle = -3.9 + (-0.59 * distance) + (1.8 * sqrt(distance) ) + (0.008 * distance^2)

Data analysts and 'scientists': don't wrangle, munge or 'hack' your valuable datasets. Use #datacarpentry: https://t.co/gXrlIJH91R pic.twitter.com/GSWS7O7zBz
— Robin Lovelace (robinlovelace) February 20, 2017
The humble data frame is at the heart of most analysis projects:
d = data.frame(x = 1:3, y = c("A", "B", "C"))
d
## x y ## 1 1 A ## 2 2 B ## 3 3 C
In reality this is a list, making function work on each column:
summary(d)
## x y ## Min. :1.0 A:1 ## 1st Qu.:1.5 B:1 ## Median :2.0 C:1 ## Mean :2.0 ## 3rd Qu.:2.5 ## Max. :3.0
plot(d)
In base R, there are many ways to subset:
d[1,] # the first line
## x y ## 1 1 A
d[,1] # the first column
## [1] 1 2 3
d$x # the first column
## [1] 1 2 3
d [1] # the first column, as a data frame
## x ## 1 1 ## 2 2 ## 3 3
Recently the data frame has been extended:
library("tibble")
dt = tibble(x = 1:3, y = c("A", "B", "C"))
dt
## # A tibble: 3 x 2 ## x y ## <int> <chr> ## 1 1 A ## 2 2 B ## 3 3 C
It comes down to efficiency and usability
Like tibbles, has advantages over historic ways of doing things
[ do everythingghg_ems %>%
filter(!grepl("World|Europe", Country)) %>%
group_by(Country) %>%
summarise(Mean = mean(Transportation),
Growth = diff(range(Transportation))) %>%
top_n(3, Growth) %>%
arrange(desc(Growth))
# dplyr must be loaded with library(dplyr)
wb_ineq %>%
filter(grepl("g", Country)) %>%
group_by(Year) %>%
summarise(gini = mean(gini, na.rm = TRUE)) %>%
arrange(desc(gini)) %>%
top_n(n = 5)
vs
top_n(
arrange(
summarise(
group_by(
filter(wb_ineq, grepl("g", Country)),
Year),
gini = mean(gini, na.rm = TRUE)),
desc(gini)),
n = 5)
Only 1 way to do it, making life simpler:
select(dt, x) # select columns
## # A tibble: 3 x 1 ## x ## <int> ## 1 1 ## 2 2 ## 3 3
slice(dt, 2) # 'slice' rows
## # A tibble: 1 x 2 ## x y ## <int> <chr> ## 1 2 B
u_pct = "https://github.com/npct/pct-data/raw/master/west-yorkshire/l.Rds"
if(!file.exists("l.Rds"))
download.file(u_pct, "l.Rds")
library(stplanr)
l = readRDS("l.Rds")
plot(l)
sel_walk = l$foot > 9 l_walk = l[sel_walk,] plot(l) plot(l_walk, add = T, col = "red")
library(dplyr) # for next slide...
l_walk1 = l %>% filter(All > 10) # fails
library(sf)
## Linking to GEOS 3.5.1, GDAL 2.2.2, proj.4 4.9.2
l_sf = st_as_sf(l) plot(l_sf[6])
l_walk2 = l_sf %>% filter(foot > 9) plot(l_sf[6]) plot(l_walk2, add = T)
## Warning in plot.sf(l_walk2, add = T): ignoring all but the first attribute
## Warning in classInt::classIntervals(na.omit(values), min(nbreaks, n.unq), : ## n same as number of different finite values\neach different finite value is ## a separate class
l_sf$distsf = as.numeric(st_length(l_sf)) l_drive_short2 = l_sf %>% filter(distsf < 1000) %>% filter(car_driver > foot)
library(tmap)
tmap_mode("view")
## tmap mode set to interactive viewing
qtm(l_drive_short2)
It is clear that geographical research can have large policy impacts.
But many questions remain:
r.lovelace@leeds.ac.uk or @robinlovelaceLovelace, Robin, Anna Goodman, Rachel Aldred, Nikolai Berkoff, Ali Abbas, and James Woodcock. 2017. “The Propensity to Cycle Tool: An Open Source Online System for Sustainable Transport Planning.” Journal of Transport and Land Use 10 (1). doi:10.5198/jtlu.2016.862.
Reclus, Elisée. 2013. Anarchy, Geography, Modernity: Selected Writings of Elisée Reclus. Edited by John Clark and Camille Martin. Oakland, CA: PM Press.