Using {sf} to calculate catchment areas

knitr::opts_chunk$set(echo = TRUE)

suppressPackageStartupMessages({
  library(tidyverse)
  library(readxl)
  library(glue, include.only = "glue_data")
  library(scales, exclude = c("discard", "col_factor"))
  library(sf)
  library(leaflet)
  library(leaflet.extras)
})

Firstly, I have prepared a dataset containing the 23 A&E departments that are Adult Major Trauma Centre’s (MTC’s) in England and Wales. This dataset is an {sf} object which contains a special column called geometry - this column contains the locations of the hospitals. The data is sourced from this, the Organisation Data Service (ODS) and the NHS Postcode Database.

When we print this table out we get a bit of information telling us how many features (rows) and fields (columns) there are, along with what the type of the geometry is (in this case, we have points). We also get the “bounding box” (bbox) - this is like a rectangle that encloses all of our geometries. Finally, we get the coordinate reference system (CRS) that is being used.

Below this we see a table of data - like any other dataframe in R.

major_trauma_centres <- read_sf(
  paste0(
    "https://gist.githubusercontent.com/tomjemmett/",
    "ac111a045e1b0a60854d3d2c0fb0baa8/raw/",
    "a1a0fb359d1bc764353e8d55c9d804f47a95bfe4/",
    "major_trauma_centres.geojson"
  ),
  agr = "identity"
)
major_trauma_centres

## Simple feature collection with 23 features and 2 fields
## Attribute-geometry relationship: 0 constant, 0 aggregate, 2 identity
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -4.113684 ymin: 50.41672 xmax: 0.1391145 ymax: 54.98021
## geographic CRS: WGS 84
## # A tibble: 23 x 3
##    name                                           org_id                geometry
##    <chr>                                          <chr>              <POINT [°]>
##  1 ADDENBROOKE'S HOSPITAL                         RGT01     (0.1391145 52.17374)
##  2 DERRIFORD HOSPITAL                             RK950     (-4.113684 50.41672)
##  3 HULL ROYAL INFIRMARY                           RWA01    (-0.3581464 53.74411)
##  4 JAMES COOK UNIVERSITY HOSPITAL                 RTDCR     (-1.214805 54.55176)
##  5 JOHN RADCLIFFE HOSPITAL                        RTH08     (-1.219806 51.76387)
##  6 KING'S COLLEGE HOSPITAL (DENMARK HILL)         RJZ01   (-0.09391574 51.46808)
##  7 LEEDS GENERAL INFIRMARY                        RR801     (-1.551744 53.80145)
##  8 NORTHERN GENERAL HOSPITAL                      RHQNG      (-1.455966 53.4098)
##  9 NOTTINGHAM UNIVERSITY HOSPITALS NHS TRUST - Q~ RX1RA      (-1.185957 52.9438)
## 10 QUEEN ELIZABETH HOSPITAL                       RRK02     (-1.938476 52.45318)
## # ... with 13 more rows

This dataframe can be used like any other dataframe in R, for example, we can use dplyr to filter rows out we aren’t interested in, or join to another dataframe to bring in extra information.

For example, we can load the LSOA population estimates and join it to the LSOA boundaries file from the ONS geoportal. If you aren’t familiar with Output Areas see the ONS Census Geography documentation.

First, let’s load the population estimate file.

# download the LSOA population estimates from ons website: for some reason they
# provide this download as an excel file in a zip file, so we need to download
# the zip then extract the file, but we don't need to keep the zip after. withr
# handles this temporary like file for us
if (!file.exists("SAPE22DT2-mid-2019-lsoa-syoa-estimates-unformatted.xlsx")) {
  withr::local_file("lsoa_pop_est.zip", {
    download.file(
      paste0(
        "https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/",
        "populationandmigration/populationestimates/datasets/",
        "lowersuperoutputareamidyearpopulationestimates/mid2019sape22dt2/",
        "sape22dt2mid2019lsoasyoaestimatesunformatted.zip"
      ),
      "lsoa_pop_est.zip",
      mode = "wb"
    )
    unzip("lsoa_pop_est.zip")
  })
}

lsoa_pop_estimates <- read_excel(
  "SAPE22DT2-mid-2019-lsoa-syoa-estimates-unformatted.xlsx",
  "Mid-2019 Persons",
  skip = 3
) %>%
  select(LSOA11CD = `LSOA Code`, pop = `All Ages`)

head(lsoa_pop_estimates)

## # A tibble: 6 x 2
##   LSOA11CD    pop
##   <chr>     <dbl>
## 1 E01011949  1954
## 2 E01011950  1257
## 3 E01011951  1209
## 4 E01011952  1740
## 5 E01011953  2033
## 6 E01011954  2210

Now we can load the boundary data and join to our population estimates.

lsoa_boundaries <- read_sf(
  "https://opendata.arcgis.com/datasets/e9d10c36ebed4ff3865c4389c2c98827_0.geojson"
) %>%
  # there are other fields in the lsoa data, but we only need the LSOA11CD field
  select(LSOA11CD) %>%
  inner_join(lsoa_pop_estimates, by = "LSOA11CD") %>%
  # this helps when combining different sf objects and get's rid of some messages
  st_set_agr(c(LSOA11CD = "identity", pop = "aggregate"))

We can now use ggplot to create a simple plot showing the populations of the LSOA’s and add points to show the MTC’s.

ggplot() +
  # first plot the lsoa boundaries and colour by the population
  geom_sf(data = lsoa_boundaries, aes(fill = pop), colour = NA) +
  geom_sf(data = major_trauma_centres) +
  scale_fill_distiller(type = "div", palette = "Spectral",
                       # tidy up labels in the legend, display as thousands
                       labels = number_format(accuracy = 1e-1, scale = 1e-3, suffix = "k")) +
  theme_void() +
  theme(legend.position = c(0, 0.98),
        legend.justification = c(0, 1)) +
  labs(fill = "Population (thousands)")

We can now try to calculate the population estimates for each of the MTC’s. We can take advantage of {sf} to perform a spatial join between our two datasets. A spatial join is similar to a join in a regular Sql database, except instead of joining on columns we join on geospatial features.

In this case we want to take each LSOA and find which MTC it is closest to, so we can use the st_nearest_feature predicate function with st_join. This will give us a dataset containing all of the rows from lsoa_boundaries, but augmented with the columns from major_trauma_centres.

We can then use standard {dplyr} functions to group by the MTC name and org_id fields, and calculate the sum of the pop column.

{sf} is clever - it knows that when we are summarising a geospatial table we need to combine the geometries somehow. What it will do is call st_union and return us a single geometry per MTC.

One thing we need to do before joining our data though is to transform our data temporarily to a different CRS. Our data currently is using latitude and longitude in a spherical projection, but st_nearest_points assumes that the points are planar. That is, it assumes that the world is flat. This can lead to incorrect results.

But, we can transform the data to use the British National Grid. This instead specifies points as how far east and north from an origin. This has a CRS number of 27700. Once we are done summarising our data we can again project back to the previous CRS (4326). This article explains these numbers in a bit more detail.

mtcv_pop <- lsoa_boundaries %>%
  st_transform(crs = 27700) %>%
  # assign each LSOA to a single, closest, MTC
  st_join(
    st_transform(major_trauma_centres, crs = 27700),
    join = st_nearest_feature
  ) %>%
  # now, summarise our data frame based on MTC's to find the total population
  group_by(name, org_id) %>%
  # note: summarise will automatically call st_union on the geometries
  # this gives us the lsoa's as a combined multipolygon
  summarise(across(pop, sum), .groups = "drop") %>%
  st_transform(crs = 4326)

We can now visualise this map. This time we will use the {leaflet} package to create an interactive html map.

First we need a function for colouring the areas.

pal_fun <- colorNumeric("Spectral", NULL, n = 7, reverse = TRUE)

As this is an interactive map we can add popups to the area’s to make it easier to identify which MTC we are looking at and to show us the estimated population.

p_popup <- glue_data(mtcv_pop,
                     "<strong>{name}</strong> ({org_id})",
                     "Estimated Population: {comma(pop)}",
                     .sep = "\n")

Now we can create our map

mtcv_pop %>%
  leaflet() %>%
  # add the areas that have been assigned to each MTC
  addPolygons(stroke = TRUE,
              color = "black",
              opacity = 1,
              weight = 1,
              fillColor = ~pal_fun(pop),
              fillOpacity = 1,
              popup = p_popup) %>%
  # add markers for the MTC's
  addCircleMarkers(data = major_trauma_centres,
                   color = "black",
                   stroke = TRUE,
                   fillColor = "white",
                   weight = 1,
                   radius = 3,
                   opacity = 1,
                   fillOpacity = 1) %>%
  setMapWidgetStyle(list(background= "white"))

This method certainly is not perfect. For example, we can see that the closest MTC for most of Somerset is Cardiff, but in order to reach Cardiff you need to cross the Bristol Channel! Not so easy by a Land Ambulance!

A better approach would be to use some method to calculate travel time from each LSOA to the MTC’s, an example of using the Google Maps Travel Time API is covered here. However, as a quick and dirty approach this method doesn’t produce too bad results.

Using {sf} to calculate catchment areas

Tom Jemmett

2020-02-19