Data visualisation in R: extras

Preamble
Background
Hex maps
Bivariate maps
Annotation

Preamble

This page contains the course material for a workshop hosted jointly between the Methods@Manchester, the UK Data Service and the University of Leeds. All material and associated scripts are available on GitHub.

Background

This worksheet covers some additional mapping skills in R that you may find useful.

First, we take a look at how to create ‘hex maps’. Rather than visualise the original boundaries of say, neighbourhoods or countries, we use regular shapes, such as hexagons and squares, to represent each polygon. This is particularly useful when we are mapping regions consisting of areas which vary considerably in their shape and size. For some additional background information, you may want to read this article, which uses some examples in R, including the one we demonstrate here.

Secondly, we show how to make bivariate choropleth maps. Research problems in criminology often involve modelling or testing the relationship between two variables. For instance, we might ask: “what is the relationship between neighbourhood deprivation and crime?”. Building on some of the skills we picked up earlier, we demonstrate a method to visualise two variables simultaneously in the same map.

Thirdly, we cover annotations. So far, we’ve relied on standard components (e.g. X-axis, Y-axis and the legend) to give readers the pointers needed to interpret our graphics. Sometimes, however, you might want to draw people’s attention to specific areas or add clarification to the base graphic using annotations such as text, lines or arrows.

Hex maps

To make a regular hex grid, we are going to make use of a package called geogrid. So, ensure you have the package installed using install.packages("geogrid") and loaded using library(geogrid)

library(geogrid)

This is by no means the only way of creating hex maps. Your choice of method might depend on the data (e.g. how many areas you have, their size and shape) and what you are trying to achieve (e.g. maintaining spatial clustering). Different methods perform differently with respect to how readers interpret the underlying information, so it’s important to consider the different options available. Unfortunately, what looks good is not necessarily the best option when it comes to conveying information accessibly and accurately!

With that in mind, let’s get the neighbourhood (LSOA) level data from Manchester loaded in from earlier, check how many areas we have, and remind ourselves of what it looks like. Remember, you’re more than welcome to use your own data for this. But, for the purposes of this example, I would recommend using a study area with a relatively small number of areas, unless you have a very good computer!

# Load data.
manc_sf <- st_read(dsn = "data/burglary_lsoa.shp")

# Ensure BNG CRS.
manc_sf <- st_transform(manc_sf, crs = 27700)

# Number of LSOAs.
nrow(manc_sf) # N = 282

# Plot raw boundaries with deprivation deciles.
ggplot(data = manc_sf) +
  geom_sf(mapping = aes(fill = as.factor(IMDdeci)))

As we’ve already discussed, you will notice that some LSOAs are excessively large, while others are almost invisible due to their small size. This issue can be particularly salient when it comes to mapping deprivation using LSOAs, which are designed to be fairly uniform by population size, because highly deprived areas are often densely populated. This is where regular grids come in.

The geogrid method works by first creating a regular set of hexagons or squares which match the study region in terms of the number of units (i.e., N = 282) and overall shape (i.e., the outer boundaries of Manchester). We do this using the calculate_grid() function. Note that we specify the shape as hexagonal (the other option is ‘regular’ for squares), and we set the seed (in this case 1612 so that the example is reproducible.

manc_grid <- calculate_grid(shape = manc_sf, grid_type = "hexagonal", seed = 1612)

We can see what this output looks this using the plot() as the manc_grid object is a geogrid-specific class (not sf).

plot(manc_grid)

Now we have our regular grid, we need to assign the original LSOAs to each hexagon. Currently it’s just an empty shape. This assignment step is pretty important. As we can see from the above visual using the raw boundaries, Manchester has quite a distinct spatial patterning of deprivation. So, ideally we’d want to maintain this spatial patterning with the hex map. To assign the ‘real’ LSOAs to the regular grid, we can use the function assign_polygons(). There’s a good explanation (and links to further resources) of the assignment method in the package documentation. Crudely, the algorithm is attempting to assign LSOAs to the grid in a manner which minimises the distance between the location of the original LSOA and the new hexagon.

Note that depending on the data and your computer this assignment can take quite a long time. If it’s taking too long, we have saved a copy of the output as a shapefile (‘geogrid_manc.shp’). So, instead, of running the below code chunk, you can just load in the data as we have previously using st_read(). Generally speaking, the smaller and simpler the data, the quicker this assignment process will take. On my fairly standard laptop the below task took 5-10 minutes.

manc_grid_results_sf <- assign_polygons(shape = manc_sf, new_polygons = manc_grid)

Each LSOA from the original data has now been assigned to a hexagon in the regular grid, along with the corresponding variables. So, we can now visualize the deprivation deciles using the new arrangement. Here, we plot them side-by-side to compare, making use of the cowplot package introduced earlier, and some small additions that we have already covered.

depriv_orig <- ggplot(data = manc_sf) +
  geom_sf(mapping = aes(fill = as.factor(IMDdeci))) +
  scale_fill_viridis_d(option = "magma") +
  labs(fill = NULL) + 
  theme(legend.position = "bottom")

depriv_grid <- ggplot(data = manc_grid_results_sf) +
  geom_sf(mapping = aes(fill = as.factor(IMDdeci))) +
  scale_fill_viridis_d(option = "magma") +
  labs(fill = NULL) + 
  theme(legend.position = "bottom")

plot_grid(depriv_orig, depriv_grid, labels = c("Raw boundaries", "Regular grid"), scale = 0.9)

How do the two maps compare? Has the spatial patterning been maintained? Are we representing the underlying data accurately? What areas are problematic? It is possible that we could improve things by altering some of the parameters in geogrid (e.g. using different seed). We might also consider alternatives, such as hexograms which have been shown to out perform tiled grids in some scenarios.

To me, the new map successfully captures the important information. We can clearly identify the widespread deprivation to the east of the city centre, and the pockets of wealthier areas in the south. But, you might disagree! And importantly, whether the regular grid is suitable or not will very much depend on the purpose of the visualisation, and who the audience is.

Bivariate maps

A fairly new package called biscale can be used to create bivariate maps in conjunction with sf, which we’ve covered already. As usual, ensure you have this extra package installed and loaded using install.packages() and library().

library(biscale)

The first thing we need to do is create a new classification for the two variables we want to visualise. For this example, we will continue to use our Manchester data (manc_sf), but as always, please feel free to replicate this example using you’re own data. We’ll begin be loading in the data, just in case you haven’t run through hex map stuff above. Just for fun, we’re going to use the geogrid version of LSOAs rather than the raw boundaries.

manc_grid_results_sf <- st_read("data/geogrid_manc.shp")

Next, we’ll create the classes. Recall that we have information about burglary and deprivation for each LSOA in the city of Manchester. In my data, we have brglry_ (the burglary count for January 2017) and incscor (the income component of the deprivation measure). What might the relationship be between these variables, and how might it vary spatially across Manchester?

Using the bi_class() function, we can specify the sf object, the variables we are interested in, the style of classification (i.e. how to classify the data), and the number of dimensions (i.e. 2x2 or 3x3). Because the package creates a new variable bi_class by default, we can simply assign the output to the existing sf object.

manc_grid_results_sf <- bi_class(manc_grid_results_sf, x = incscor, y = brglry_, style = "quantile", dim = 3)

Here, we chose a quantile classification, but this decision is not trivial. In reality, we would probably explore a few different options (e.g. jenks) depending on the distribution of the data and the purpose of the visualisation. You can read a bit more about classification in maps on the Axis Maps website.

We can now pretty much mimic our existing skills in ggplot2 to create a map of this variable. One change is the use of a different aesthetic layer, in this case, bi_scale_fill() instead of something like scale_fill_brewer(). The use of this new layer is not strictly necessarily, but it does make it easier to choose and customise your palettes and overall visual.

ggplot(data = manc_grid_results_sf) +
  geom_sf(mapping = aes(fill = bi_class)) +
  bi_scale_fill(pal = "GrPink", dim = 3)

As it stands, interpreting this legend is not exactly straightforward. We might guess that “1-1” represents LSOAs with low burglary and a low income deprivation score, and vice-versa for “3-3” but it’s certainly not intuitive or particularly clear at the moment. With a few more tweaks using the biscale package we can generate a meaningful legend, and stick it together with this map.

First, we re-plot the map without a legend, and apply the ‘void’ theme to create a blank slate for our new legend. We can use bi_legend() to generate the custom legend. Note that this should match the details we specify in bi_scale_fill(). We assign each of these new graphics to objects.

my_hexmap <- ggplot(data = manc_grid_results_sf) +
  geom_sf(mapping = aes(fill = bi_class), show.legend = FALSE) +
  bi_scale_fill(pal = "GrPink", dim = 3) +
  theme_void()

my_legend <- bi_legend(pal = "GrPink",
                       dim = 3,
                       xlab = "More income deprivation",
                       ylab = "More burglary",
                       size = 9)

The final step is to arrange the two objects using ggdraw(). You might find this function a bit fiddly. That’s because it is, and I am sorry. However, it is a useful way of generating completely customisable arrangements and graphics. We accept the default positioning of the map in my_hexmap but then manually select the location and dimensions of the legend object my_legend. The x-value (from 0 to 1) specifies horizontal, and the y-value (from 0 to 1) specifies the vertical. So for instance, x = 0.5, y = 0.5 will place the object in the middle of the graphic.

ggdraw() +
  draw_plot(my_hexmap) +
  draw_plot(my_legend,
            x = 0.1, y = 0.65, width = 0.2, height = 0.2,
            scale = 1.5)

So, we’ve now created a hex map visualising the relationship between income deprivation and burglary victimisation in Manchester. I think we can all agree that visualisations like this can tell powerful and accessible stories about crime and inequality. But we should also remain critical and self-aware of the decisions and techniques underlying the visual. We might consider the following questions:

What story are we trying tell?
Are we conveying the underlying data accurately?
How might people (experts and non-experts) interpret the visual?
What might the shortcomings be?
What are the implications for the reader?
How might we mitigate against potential shortcomings?

Annotation

There are a few ways to annotate your ggplot2 outputs, but one common method is by using an annotation layer. Using this layer, we can manually add text and lines (amongst other things) to our ggplot2 graphics.

Here, we add a text label with an accompanying arrow to the above map by way of an example to readers, helping them understand the legend. Annotations are arranged on existing plots by specifying x-y coordinates. Often, this will directly correspond to the axis in your graphic, but for maps, it can be used as above (i,e, x-y scales from 0 to 1).

To run the below code, we assume that the above map has been saved to an object named p1.

p1 +
  annotate(geom = "text",
           x = 0.75, y = 0.3,
           label = "Low income deprivation \n and low burglary",
           size = 3) +
  annotate(geom = "curve",
           x = 0.65, xend = 0.54, y = 0.3, yend = 0.4,
           curvature = -0.3,
           arrow = arrow(length = unit(1, "mm"))) +
  annotate(geom = "text",
           x = 0.8, y = 0.8,
           label = "High income deprivation \n and high burglary",
           size = 3) +
    annotate(geom = "curve",
           x = 0.76, xend = 0.65, y = 0.75, yend = 0.65,
           curvature = -0.1,
           arrow = arrow(length = unit(1, "mm")))