1 Introduction

The PCT is not only a web tool, it is a research and open data project that has resulted in many megabytes of valuable data (Lovelace et al. 2017). In this training session we hope you will learn how to download and use these open datasets. This may be of use to anyone interested in data driven planning for sustainable and active travel futures.

This guide supports workshops on advanced usage and development of the Propensity to Cycle Tool (PCT).

Beginner and intermediate PCT events focus on using the PCT via the web application hosted at www.pct.bike and the data provided by the PCT in QGIS.

The focus here is on analysing cycling potential in the open source statistical programming language R. We use R because the PCT was developed in, and can be extended with, R code. Using open source software with a command-line interface reduces barriers to entry, enabling the development of open access transport models for more citizen-led and participatory transport planning, including integration with the A/B Street city simulation and editing software (Lovelace 2021).

To view a video of our previous advanced training workshop at the Cycle Active City 2021 Conference, see https://www.youtube.com/watch?v=OiLzjrBMQmU.

To see the ‘marked up’ contents of the vignette (with results evaluated) see here.

2 Preparation

If you are new to R, you should install R and RStudio before the course. For instructions on that, see the download links at cran.r-project.org and RStudio.com.

R is a powerful statistical programming language for data science and a wide range of other applications and, like any language, takes time to learn. To get started we recommend the following free resources:

If you want to calculate cycle routes from within R, you are recommended to sign-up for a CycleStreets API key. See here to apply and see here for instructions on creating a ‘environment variable’ (recommended for experienced R users only).

It may also be worth taking a read about the PCT if you’re not familiar with it before the course starts.

2.1 Prior reading

In addition to computer hardware (a laptop) and software (an up-to-date R set-up and experience using R) pre-requisites, you should have read, or at least have working knowledge of the contents of, the following publications, all of which are freely available online:

2.2 Prerequisites

To ensure your computer is ready for the course, you should be able to run the following lines of R code on your computer:

install.packages("remotes")
pkgs = c(
  "cyclestreets",
  "mapview",
  "pct",
  "sf",
  "stats19",
  "stplanr",
  "tidyverse",
  "devtools"
)
remotes::install_cran(pkgs)
# remotes::install_github("ITSLeeds/pct")

To test your computer is ready to work with PCT data in R, you can also try running the code hosted at https://raw.githubusercontent.com/ITSLeeds/pct/master/inst/test-setup.R to check everything is working:

source("https://github.com/ITSLeeds/pct/raw/master/inst/test-setup.R") 

If you have any questions before the workshop, feel free to ask a question on the package’s issue tracker (requires a GitHub login): https://github.com/itsleeds/pct/issues

3 Agenda

Preliminary timings:

The guide covers:

4 Getting and exploring PCT data

In this section you will learn about the open datasets provided by the PCT project and how to use them. While the most common use of the PCT is via the interactive web application hosted at www.pct.bike, there is much value in downloading the data, e.g. to identify existing cycling infrastructure in close proximity to routes with high potential, and to help identify roads in need of interventions from a safety perspective, using data from the constantly evolving and community-driven global geographic database OpenStreetMap (OSM) (Barrington-Leigh and Millard-Ball 2017).

In this session, which assumes you have experience using R, you will learn how to:

4.1 Getting PCT data from the PCT website

In this example we will use data from North Yorkshire, a mixed region containing urban areas such as York and many rural areas. You can use the PCT, which works at the regional level, for North Yorkshire or any other region by clicking on the area you’re interested in on the main map at https://www.pct.bike. If you know the URL of the region you’re interested in, you can navigate straight there, in this case by typing in or clicking on the link https://www.pct.bike/m/?r=north-yorkshire.

From there you will see a map showing the region. Before you download and use PCT data, it is worth exploring it on the PCT web app.

Exercise: explore the current level and distribution of cycling:

  • Explore different data layers contained in the PCT by selecting different options from the dropdown menus on the right.
  • Look at the different types of Cycling Flows options and consider: which visualisation layer is most useful?

4.1.1 Using ‘Freeze Lines’

You can use the little-known ‘Freeze Lines’ functionality in the PCT’s web app to identify the zone origin and destinations of trips that would use improvements in a particular place. You can do this by selecting the Fast Routes option from the Cycling Flows menu, zooming into the area of interest, and then clicking on the Freeze Lines checkbox to prevent the selected routes from moving when you zoom back out.

  • Use this technique to find the areas that would benefit from improved cycling provision on Clifton bridge, 1 km northwest from central York over the River Ouse (see result in Figure 4.1)
Areas that may benefit from improved cycle provision on Clifton Bridge, according to the PCT.

Figure 4.1: Areas that may benefit from improved cycle provision on Clifton Bridge, according to the PCT.

4.2 Downloading data from the PCT in GeoJSON form

On the PCT web app Click on the Region data tab, shown in the top of Figure 4.1, just beneath the ‘north’ in the URL. You should see a web page like that shown in Figure 4.2, which highlights the Region data table alongside the Map, Region stats, National Data, Manual, and About page links.

The Region data tab in the PCT.

Figure 4.2: The Region data tab in the PCT.

  • Download the Zones (LSOA) dataset in geojson format

Data downloaded in this way can be imported into GIS software such as QGIS, for analysis and visualisation. However, the PCT was built in R so the best way to understand and modify the results is using R, or a similar language for data analysis. The subsequent sections demonstrate using R to access, analyse, visualise and model datasets provided by the pct package.

4.3 Getting PCT data with R

We will get the same PCT datasets as in previous sections but using the R interface. If you have not already done so, you will need to install the R packages we will use for this section (and the next) by typing and executing the following command in the R console: install.packages("pct", "sf", "dplyr", "tmap").

  • After you have the necessary packages installed, the first stage is to load the packages we will use:
library(pct)
library(sf)          # key package for working with spatial vector data
library(tidyverse)   # in the tidyverse
library(tmap)        # installed alongside mapview
tmap_options(check.and.fix = TRUE) # tmap setting

The pct package has been developed specifically for use with PCT data. To learn more about this package, see https://itsleeds.github.io/pct/.

  • We are now ready to use R to download PCT data. The following commands set the name of the region we are interested in (to avoid re-typing it many times) and download commute data for this region, in the four main forms used in the PCT:
region_name = "north-yorkshire"
zones_all = get_pct_zones(region_name)
lines_all = get_pct_lines(region_name)
# note: the next command may take a few seconds
routes_all = get_pct_routes_fast(region_name)
rnet_all = get_pct_rnet(region_name)
  • Check the downloads worked by plotting them:
plot(zones_all$geometry)
plot(lines_all$geometry, col = "blue", add = TRUE)
plot(routes_all$geometry, col = "green", add = TRUE)
plot(rnet_all$geometry, col = "red", lwd = sqrt(rnet_all$bicycle), add = TRUE)

4.4 Getting school route network data

The PCT provides a school route network layer that can be especially important when planning cycling interventions in residential areas (Goodman et al. 2019). Due to the sensitive nature of school data, we cannot make route or OD data level data available. However, the PCT provides travel to school data at zone and route network levels, as shown in Figure 4.3. (Note: to get this data from the PCT website you must select School travel in the Trip purpose menu before clicking on Region data.)

  • Get schools data from the PCT with the following commands
zones_school = get_pct_zones(region = region_name, purpose = "school")
rnet_school = get_pct_rnet(region = region_name, purpose = "school")

As we will see in Section 6, combining school and commute network data can result in a more comprehensive network.

Open access data on cycling to school potential from the PCT, at zone (left) and route network (right) levels. These datasets can support planning interventions, especially 'safe routes to school' and interventions in residential areas. To see the source code that generates these plots, see the 'source' link at the top of the page.Open access data on cycling to school potential from the PCT, at zone (left) and route network (right) levels. These datasets can support planning interventions, especially 'safe routes to school' and interventions in residential areas. To see the source code that generates these plots, see the 'source' link at the top of the page.

Figure 4.3: Open access data on cycling to school potential from the PCT, at zone (left) and route network (right) levels. These datasets can support planning interventions, especially ‘safe routes to school’ and interventions in residential areas. To see the source code that generates these plots, see the ‘source’ link at the top of the page.

Exercise: Explore the datasets you have downloaded. Use functions such as plot() or qtm() to visualise these datasets, and try out different colour schemes

5 Modelling change

This section is designed for people with experience with the PCT and cycling uptake estimates who want to learn more about how uptake models work and how to generate new scenarios of change. Reproducible and open R code will be used to demonstrate the concepts so knowledge of R or other programming languages is recommended but not essential, as there will be conceptual exercises covering the factors linked to mode shift. In it you will:

5.1 PCT scenarios

One of the benefits of the PCT is its ability to generate scenarios that model where people might cycle in future. Several cycling uptake scenarios are included on the PCT website. We also have R functions for these scenarios. For example, the PCT’s ‘Government Target’ scenario allows us to calculate the cycling uptake that would be required to correspond to a scenario in which we meet the government’s aim to double cycling levels by 2025, using a 2013 baseline.

The following code chunk uses the R function uptake_pct_govtarget_2020() (from the pct package) to recreate this ‘Government Target’ scenario.

lines_all$pcycle = lines_all$bicycle / lines_all$all
lines_all$euclidean_distance = as.numeric(sf::st_length(lines_all))
lines_all$pcycle_govtarget = uptake_pct_govtarget_2020(
  distance = lines_all$rf_dist_km,
  gradient = lines_all$rf_avslope_perc
  ) * 100 + lines_all$pcycle

Exercise: Generate a ‘Go Dutch’ scenario for North Yorkshire using the function uptake_pct_godutch(): (Hint: the process is very similar to that used to generate the ‘Government Target’ scenario)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.505   6.881  20.750  22.367  36.265  56.052
Percent cycling currently (left) and under a 'Go Dutch' scenario (right) in the North Yorkshire.Percent cycling currently (left) and under a 'Go Dutch' scenario (right) in the North Yorkshire.

(#fig:dutch_pcycle)Percent cycling currently (left) and under a ‘Go Dutch’ scenario (right) in the North Yorkshire.

  • Think of alternative scenarios that would be useful for your work
  • Advanced: look at the source code of the function pct_uptake_godutch() - how could it be modified?

5.2 Developing new scenarios of change

Let’s develop a simple model representing the government’s aim, that “half of all journeys in towns and cities will be cycled or walked” by 2030. We will assume that this means that all journeys made in urban areas, as defined by the Office for National Statistics, will be made by these active modes. We only have commute data in the data we downloaded, but this is a good proxy for mode share overall.

The first stage is to identify urban areas in North Yorkshire. We use data from the House of Commons Research Briefing on City and Town Classifications to define areas based on their town/city status. The code chunk below shows the benefits of R in terms of being able to get and join data onto the route data we have been using:

# Get data on the urban_rural status of LSOA zones
urban_rural = readr::read_csv("https://researchbriefings.files.parliament.uk/documents/CBP-8322/oa-classification-csv.csv")
ggplot(urban_rural) +
  geom_bar(aes(citytownclassification)) +
  coord_flip()

# Join this with the PCT commute data that we previously downloaded
urban_rural = rename(urban_rural, geo_code = lsoa_code)
zones_all_joined = left_join(zones_all, urban_rural)
routes_all_joined = left_join(routes_all, urban_rural, by = c("geo_code1" = "geo_code"))
tm_shape(zones_all_joined) +
  tm_polygons("citytownclassification")
Classification of areas in Great Britain (left) and North Yorkshire (right).Classification of areas in Great Britain (left) and North Yorkshire (right).

Figure 5.1: Classification of areas in Great Britain (left) and North Yorkshire (right).

After the classification dataset has been joined, the proportion of trips made by walking and cycling in towns and cities across North Yorkshire can be calculated as follows.

# Select only zones for which the field `citytownclassification` contains the word "Town" or "City"
routes_towns = routes_all_joined %>% 
  filter(grepl(pattern = "Town|City", x = citytownclassification)) 
round(sum(routes_towns$foot + routes_towns$bicycle) / sum(routes_towns$all) * 100)
## [1] 34

Currently, only around 34% of commute trips in the region’s ‘town’ areas are made by walking and cycling (27% across all zones in North Yorkshire, and a much lower proportion in terms of distance). We explore this in more detail by looking at the relationship between trip distance and mode share for existing commuter journeys, as shown in Figure 5.2 (a).

We will create a scenario representing the outcome of policies that incentivise people to replace car trips with walking and cycling. This focuses on the red boxes in Figure 5.2. In this scenario, we replace 50% of car trips of less than 1 km with walking, and replace 10% of car trips of 1-2 km length with walking. Many of the remaining car trips will be replaced by cycling, with the percentages of trips that switch for each OD determined by the uptake function in the Go Dutch Scenario of the PCT. The results of this scenario are shown in Figure 5.2 (b).

# Reduce the number of transport mode categories 
routes_towns_recode = routes_towns %>% 
  mutate(public_transport = train_tube + bus,
         car = car_driver + car_passenger,
         other = taxi_other + motorbike 
         ) %>% 
  dplyr::select(-car_driver, -car_passenger, -train_tube, -bus) 

# Set distance bands to use in the bar charts
routes_towns_recode$dist_bands = cut(x = routes_towns_recode$rf_dist_km, breaks = c(0, 1, 3, 6, 10, 15, 20, 30, 1000), include.lowest = TRUE)

# Set the colours to use in the bar charts
col_modes = c("#fe5f55", "grey", "#ffd166", "#90be6d", "#457b9d") 

# Plot bar chart showing modal share by distance band for existing journeys 
base_results = routes_towns_recode %>%
  sf::st_drop_geometry() %>% 
  dplyr::select(dist_bands, car, other, public_transport, bicycle, foot) %>% 
  tidyr::pivot_longer(cols = matches("car|other|publ|cy|foot"), names_to = "mode") %>% 
  mutate(mode = factor(mode, levels = c("car", "other", "public_transport", "bicycle", "foot"), ordered = TRUE)) %>% 
  group_by(dist_bands, mode) %>% 
  summarise(Trips = sum(value))
g1 = ggplot(base_results) +
  geom_col(aes(dist_bands, Trips, fill = mode)) +
  scale_fill_manual(values = col_modes) + ylab("Trips")
g1

# Create the new scenario: 
# First we replace some car journeys with walking, then replace some of the
# remaining car journeys with cycling
routes_towns_recode_go_active = routes_towns_recode %>% 
  mutate(
    foot_increase_proportion = case_when(
      # specifies that 50% of car journeys <1km in length will be replaced with walking
      rf_dist_km < 1 ~ 0.5, 
      # specifies that 10% of car journeys 1-2km in length will be replaced with walking
      rf_dist_km >= 1 & rf_dist_km < 2 ~ 0.1, 
      TRUE ~ 0
      ),
    # Specify the Go Dutch scenario we will use to replace remaining car trips with cycling
    bicycle_increase_proportion = uptake_pct_godutch_2020(distance = rf_dist_km, gradient = rf_avslope_perc), 
    # Make the changes specified above
    car_reduction = car * foot_increase_proportion,
    car = car - car_reduction,
    foot = foot + car_reduction,
    car_reduction = car * bicycle_increase_proportion,
    car = car - car_reduction,
    bicycle = bicycle + car_reduction
    )

# Plot bar chart showing how modal share has changed in our new scenario
active_results = routes_towns_recode_go_active %>%
  sf::st_drop_geometry() %>% 
  dplyr::select(dist_bands, car, other, public_transport, bicycle, foot) %>% 
  tidyr::pivot_longer(cols = matches("car|other|publ|cy|foot"), names_to = "mode") %>% 
  mutate(mode = factor(mode, levels = c("car", "other", "public_transport", "bicycle", "foot"), ordered = TRUE)) %>% 
  group_by(dist_bands, mode) %>% 
  summarise(Trips = sum(value))
g2 = ggplot(active_results) +
  geom_col(aes(dist_bands, Trips, fill = mode)) +
  scale_fill_manual(values = col_modes) + ylab("Trips")
g2
Relationship between distance (x axis) and mode share (y axis) in towns and cities in North Yorkshire. (a) left: existing mode shares; (b) right: mode shares under high active travel uptake scenario.Relationship between distance (x axis) and mode share (y axis) in towns and cities in North Yorkshire. (a) left: existing mode shares; (b) right: mode shares under high active travel uptake scenario.

Figure 5.2: Relationship between distance (x axis) and mode share (y axis) in towns and cities in North Yorkshire. (a) left: existing mode shares; (b) right: mode shares under high active travel uptake scenario.

Exercise: Instead of a scenario in which all types of car journey (i.e. both car drivers and car passengers) are replaced by walking or cycling, can you create a scenario in which solely journeys by car drivers are replaced by walking or cycling? The scenario we just created applies only to urban areas - can you adapt it so that the same changes in walking and cycling uptake are applied across the whole of North Yorkshire, including both urban and rural areas?

The scenario outlined above may sound ambitious, but it only just meets the government’s aim for walking and cycling to account for 50% of trips in Town and Cities, at least when looking exclusively at single stage commutes in a single region. Furthermore, while the scenario represents a ~200% (3 fold) increase in the total distance travelled by active modes, it only results in a 17% reduction in car km driven in towns. The overall impact on energy use, resource consumption and emissions is much lower for the region overall, including rural areas.

In the context of the government’s aim of fully decarbonising the economy by 2050, the analysis above suggests that more stringent measures focussing on long distance trips, which account for the majority of emissions, may be needed. However, it is still useful to see where there is greatest potential for car trips to be replaced by walking and cycling, as shown in Figure 5.3.

Illustration of route network based on car trips that could be replaced by bicycle trips, based on Census data on car trips to work and the Go Dutch uptake function used in the PCT.Illustration of route network based on car trips that could be replaced by bicycle trips, based on Census data on car trips to work and the Go Dutch uptake function used in the PCT.

Figure 5.3: Illustration of route network based on car trips that could be replaced by bicycle trips, based on Census data on car trips to work and the Go Dutch uptake function used in the PCT.

6 Joining commute and school data

The PCT is not limited to commuter data only, it also provides a range of school data for each region in England and Wales to be downloaded with relative ease. In the example below, we add a purpose to the get_pct_rnet() function of school. This allows us to get estimates of cycling potential on the road network for school trips, commuter trips, and school and commuter trips combined. Note in the figure below that the combined route network provides a more comprehensive (yet still incomplete) overview of cycling potential in the study region.

# get pct rnet data for schools
rnet_school = get_pct_rnet(region = region_name, purpose = "school")
rnet_school = subset(rnet_school, select = -c(`cambridge_slc`)) # subset columns for bind
rnet_all = subset(rnet_all, select = -c(`ebike_slc`,`gendereq_slc`,`govnearmkt_slc`)) # subset columns for bind 

rnet_school_commute = rbind(rnet_all,rnet_school) # bind commute and schools rnet data
rnet_school_commute$duplicated_geometries = duplicated(rnet_school_commute$geometry) # find duplicated geometries
rnet_school_commute$geometry_txt = sf::st_as_text(rnet_school_commute$geometry)

rnet_combined = rnet_school_commute %>% 
  group_by(geometry_txt) %>% # group by geometry
  summarise(across(bicycle:dutch_slc, sum, na.rm = TRUE)) # and summarise route network which is not a duplicate
Comparison of commute, school, and combined commute *and* school route networkworks, under the Go Dutch scenario.

Figure 6.1: Comparison of commute, school, and combined commute and school route networkworks, under the Go Dutch scenario.

References

Barrington-Leigh, Christopher, and Adam Millard-Ball. 2017. “The World’s User-Generated Road Map Is More Than 80% Complete.” PLOS ONE 12 (8): e0180698. https://doi.org/10.1371/journal.pone.0180698.
Goodman, Anna, Ilan Fridman Rojas, James Woodcock, Rachel Aldred, Nikolai Berkoff, Malcolm Morgan, Ali Abbas, and Robin Lovelace. 2019. “Scenarios of Cycling to School in England, and Associated Health and Carbon Impacts: Application of the Propensity to Cycle Tool.” Journal of Transport & Health 12 (March): 263–78. https://doi.org/10.1016/j.jth.2019.01.008.
Lovelace, Robin. 2021. “Open Source Tools for Geographic Analysis in Transport Planning.” Journal of Geographical Systems, January. https://doi.org/10.1007/s10109-020-00342-2.
Lovelace, Robin, Anna Goodman, Rachel Aldred, Nikolai Berkoff, Ali Abbas, and James Woodcock. 2017. “The Propensity to Cycle Tool: An Open Source Online System for Sustainable Transport Planning.” Journal of Transport and Land Use 10 (1). https://doi.org/10.5198/jtlu.2016.862.
Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. Geocomputation with R. CRC Press. https://geocompr.robinlovelace.net/.