Introduction

We are lucky to have population estimates for London going back to the 1801 Census, and finding more effective ways to visualise this data has been a long-standing obsession of mine.

I initially started writing this notebook with the aim of creating an animated ‘bar chart race’ of borough populations over time, in the style of some brilliant recent charts by John Burn-Murdoch and Tom Forth but using R (specifically the wondrous gganimate package by Thomas Lin Pedersen). That turned out to be a hot mess so I’ve ended up with a less dynamic chart with a fixed view, which I think probably works better for a case where we are interested in the overall pattern of values between categories at each point in time.

We’ll be using the new pivot functions in tidyr as explained here. They’re in the development version at the time of writing but won’t necessarily be in either that or the CRAN version by the time you read this, so buyer beware.

Setup

First let’s load the packages we’ll be using.

# devtools::install_github("tidyverse/tidyr") # uncomment if you need to install the dev version
library(tidyr)
library(dplyr)
library(readr)
library(stringr)
library(ggplot2)
library(gganimate)
library(ggthemes)

Import and wrangle the data

Our data is a London Datastore file containing the population of each borough at each Census between 1801 and 2011.

It’s in ‘wide’ format so we use the new pivot_long function to get it into a tidy format we can work with.

data_long <- data %>%
  pivot_long(cols = -c(`Area Code`, `Area Name`),
             names_to = "Year",
             values_to = "Persons")

Let’s strip the text out of our year column and convert it to a numeric variable.

data_long$Year <- as.numeric(str_sub(data_long$Year, start = 9, end = 12))

We’d like to identify which boroughs are in Inner or Outer London, so let’s download a spreadsheet which includes that breakdown, filter it and join it on.

download.file(url="https://files.datapress.com/london/dataset/london-borough-profiles/2017-01-26T18:50:00/london-borough-profiles.xlsx", destfile="london-borough-profiles.xlsx", mode = "wb")
lookup <- readxl::read_excel("london-borough-profiles.xlsx", 
                             sheet = "Data", range = "C3:D35", col_names = FALSE)
names(lookup) <- c("Area Name", "Inner-Outer")
data_long <- data_long %>% 
  left_join(lookup, by = "Area Name")

The key comparison I’d like to make is between boroughs that were relatively high density in 1801 and those that were lower density. This is roughly correlated with distance from the city centre. We therefore need to download a file containing each borough’s land area (in hectares) and calculate the density as population in 1801 divided by area. I’m going to assume the area in 1801 was the same as in 2011, as there hasn’t been much land reclamation in London over the years.

area <- read_csv("https://data.london.gov.uk/download/land-area-and-population-density-ward-and-borough/77e9257d-ad9d-47aa-aeed-59a00741f301/housing-density-borough.csv") %>%
  filter(Year == 2011) %>%
  select(c("Name", "Hectares" = "Inland_Area _Hectares"))

data_long <- data_long %>%
  left_join(area, by = c("Area Name" = "Name"))

# Calculate density
data_long <- data_long %>%
  mutate(Density = Persons / Hectares)

# Create a table ranking the boroughs by density in 1801
rank_1801 <- data_long %>% 
  filter(Year == 1801) %>%
  mutate(Rank_1801 = min_rank(desc(Density))) %>%
  select(c(`Area Name`, Rank_1801))

# Join that table on to our original data
data_long <- data_long %>%
  left_join(rank_1801, by = c("Area Name")) 

Animated chart

Now let’s create our animated chart. This shows how densities fell in central and Inner London and rose in Outer boroughs as successive waves of transport improvements allowed more Londoners to live further out, and how population growth resumed more or less everywhere in the 90s and into the 21st century.

anim <- data_long %>%
  ggplot(aes(x = reorder(`Area Name`, -Rank_1801), y = Persons, fill = `Inner-Outer`)) +
  geom_bar(stat = 'identity') +
  scale_y_continuous(labels = scales::comma) +
  coord_flip() +
  labs(title="London borough population, {round(frame_along,0)}",
       subtitle = "Ordered by population density in 1801",
       caption = "Chart by @geographyjim, data from London Datastore, gganimate package by @thomasp85",
       x = "",
       y = "",
       fill = "") +
  transition_reveal(along = Year) +
  ease_aes("linear") +
  scale_color_fivethirtyeight() +
  theme_fivethirtyeight() 

animate(nframes = 200, anim, start_pause = 10, end_pause = 50, height = 500)

That’s pretty good, but ideally I’d like the moving bars to leave behind a ‘trace’ of their peak value, perhaps in the form of a line. The only way I could work out of doing that involved using a point that appears from teh start, which wasn’t quite the effect I was looking for.

maxdata <- data_long %>%
  group_by(`Area Name`) %>%
  mutate(MaxPop = max(Persons))

animax <- data_long %>%
  ggplot(aes(x = reorder(`Area Name`, -Rank_1801), y = Persons, fill = `Inner-Outer`)) +
  geom_bar(stat = 'identity') +
  scale_y_continuous(labels = scales::comma) +
  stat_summary(data = maxdata,
             aes(x = reorder(`Area Name`, -Rank_1801), y = MaxPop),
             fun.y = "mean", colour = "grey", size = 2, geom = "point") +
  coord_flip() +
  labs(title="London borough population, {round(frame_along,0)}",
       subtitle = "Ordered by population density in 1801",
       x = "",
       y = "") +
  transition_reveal(along = Year) +
  ease_aes("linear") +
  scale_color_fivethirtyeight() +
  theme_fivethirtyeight() 

animate(nframes = 200, animax, start_pause = 10, end_pause = 50, height = 500)