Introduction

This note uses use the nomisr package to extract data on workforce jobs in London and the other regions of England, and then uses ggplot2 to create an animated line chart. For the data extraction part I’ll broadly follow the steps set out in this article by Evan Odell.

Load packages

To begin with we load the packages we need: nomisr, tidyverse (for various data munging tasks, and for visualisation using ggplot2), anytime (for parsing textual dates into R-friendly dates), gganimate (for adding animation to plots) and kableExtra for styling tables.

library(nomisr)
library(tidyverse)
library(anytime)
devtools::install_github('thomasp85/gganimate')
library(gganimate)
library(kableExtra)

Finding data

To find the data we want we can use the nomis_search function.

nomis_search(name = "*workforce*")
## # A tibble: 4 x 14
##   agencyid id      uri     version annotations.annotat~ components.attrib~
## * <chr>    <chr>   <chr>     <dbl> <list>               <list>            
## 1 NOMIS    NM_5_1  Nm-5d1        1 <data.frame [5 x 2]> <data.frame [7 x ~
## 2 NOMIS    NM_52_1 Nm-52d1       1 <data.frame [6 x 2]> <data.frame [7 x ~
## 3 NOMIS    NM_130~ Nm-130~       1 <data.frame [8 x 2]> <data.frame [7 x ~
## 4 NOMIS    NM_131~ Nm-131~       1 <data.frame [8 x 2]> <data.frame [7 x ~
## # ... with 8 more variables: components.dimension <list>,
## #   components.primarymeasure.conceptref <chr>,
## #   components.timedimension.codelist <chr>,
## #   components.timedimension.conceptref <chr>, description.value <chr>,
## #   description.lang <chr>, name.value <chr>, name.lang <chr>

Getting information about datasets

The description.value field should be a decent guide to what’s in each dataset. Using the dataset ID you can then dig into the annotations and components to find out more about what it contains.

nomis_data_info("NM_130_1") %>%
  unnest(components.dimension) %>%
  glimpse() 
## Observations: 5
## Variables: 14
## $ agencyid                             <chr> "NOMIS", "NOMIS", "NOMIS"...
## $ id                                   <chr> "NM_130_1", "NM_130_1", "...
## $ uri                                  <chr> "Nm-130d1", "Nm-130d1", "...
## $ version                              <dbl> 1, 1, 1, 1, 1
## $ components.primarymeasure.conceptref <chr> "OBS_VALUE", "OBS_VALUE",...
## $ components.timedimension.codelist    <chr> "CL_130_1_TIME", "CL_130_...
## $ components.timedimension.conceptref  <chr> "TIME", "TIME", "TIME", "...
## $ description.value                    <chr> "This dataset provides qu...
## $ description.lang                     <chr> "en", "en", "en", "en", "en"
## $ name.value                           <chr> "workforce jobs by indust...
## $ name.lang                            <chr> "en", "en", "en", "en", "en"
## $ codelist                             <chr> "CL_130_1_GEOGRAPHY", "CL...
## $ conceptref                           <chr> "GEOGRAPHY", "INDUSTRY", ...
## $ isfrequencydimension                 <chr> NA, NA, NA, NA, "true"

You can also use nomis_overview() to, well, get an overview.

nomis_overview("NM_130_1") %>%
  unnest(name) %>%
  glimpse()
## Observations: 20
## Variables: 2
## $ value <list> [[<c("1", "3"), c("NM_130_1", "NM_130_3"), c("standard:...
## $ name  <chr> "analyses", "analysisname", "analysisnumber", "contact",...

Getting ‘concepts’

To understand exactly what to query, it’s useful to explore the ‘concepts’ of a dataset.

nomis_get_metadata(id = "NM_130_1")
## # A tibble: 5 x 3
##   codelist           conceptref isfrequencydimension
## * <chr>              <chr>      <chr>               
## 1 CL_130_1_GEOGRAPHY GEOGRAPHY  false               
## 2 CL_130_1_INDUSTRY  INDUSTRY   false               
## 3 CL_130_1_ITEM      ITEM       false               
## 4 CL_130_1_MEASURES  MEASURES   false               
## 5 CL_130_1_FREQ      FREQ       true

You can then specify a particular concept to get first the types of values and then the list of values for each type.

nomis_get_metadata(id = "NM_130_1", concept = "GEOGRAPHY", type = "type")
## # A tibble: 2 x 3
##   id      label.en  description.en
##   <chr>   <chr>     <chr>         
## 1 TYPE480 regions   regions       
## 2 TYPE499 countries countries
nomis_get_metadata(id = "NM_130_1", concept = "GEOGRAPHY", type = "TYPE480")
## # A tibble: 12 x 4
##    id         parentCode label.en                 description.en          
##    <chr>      <chr>      <chr>                    <chr>                   
##  1 2013265921 2092957699 North East               North East              
##  2 2013265922 2092957699 North West               North West              
##  3 2013265923 2092957699 Yorkshire and The Humber Yorkshire and The Humber
##  4 2013265924 2092957699 East Midlands            East Midlands           
##  5 2013265925 2092957699 West Midlands            West Midlands           
##  6 2013265926 2092957699 East                     East                    
##  7 2013265927 2092957699 London                   London                  
##  8 2013265928 2092957699 South East               South East              
##  9 2013265929 2092957699 South West               South West              
## 10 2013265930 2092957700 Wales                    Wales                   
## 11 2013265931 2092957701 Scotland                 Scotland                
## 12 2013265932 2092957702 Northern Ireland         Northern Ireland

Get data

Armed with the info we’ve gleaned, it’s finally time to download some data!

d <- nomis_get_data(id = "NM_130_1", 
                    time = c("first", "latest"), # download the entire time period
                    geography = NULL, # all geographies
                    measures = "20100", # value
                    item = "1") # total workforce jobs

We can use the anytime packaage to parse the month variable (as it saves us having to exactly specify the conversion to use).

d$Date <- anydate(d$DATE)

Visualisation

Now it’s time to illustrate this data with some visuals. First, let’s create a filtered and tweaked version of our data, limiting it to only English regions.

data <- d %>%
  filter(GEOGRAPHY_TYPE == "regions" & !is.na(OBS_VALUE)) %>%
  filter(GEOGRAPHY_NAME != c("Wales", "Scotland", "Northern Ireland")) 
data$region <- as.factor(data$GEOGRAPHY_NAME)

Now let’s plot it! I decided to go for an animated line chart here (it took me a while to work out that for a line chart you should use transition_reveal as that retains the line for previous dates).

ggplot(data = data, aes(x = Date, y = OBS_VALUE, colour = region, group=region)) +
  geom_line() +
  geom_segment(aes(xend = max(Date), yend = OBS_VALUE), linetype = 2, colour = 'grey') + 
  geom_text(aes(x = max(Date), label = region), hjust = 0) + 
  geom_point() + 
  scale_y_continuous(labels = scales::comma) +
  expand_limits(y = 0) + # Ensure the y axis includes zero
  theme_minimal() +
  labs(title = "Workforce jobs by region, 1996-2018",
       subtitle = "Chart by @geographyjim, data from ONS via Nomis",
       x = "Year",
       y="") +
  guides(colour = FALSE) + # drop the legend
  transition_reveal(Date) +
  coord_cartesian(clip = 'off') + 
  theme(plot.margin = margin(5.5, 120, 5.5, 5.5)) # create space on the right for labels

There are a number of interesting trends that jump out and are worth exploring further. First, it looks like London saw the biggest increase in jobs over this period, but we can test that by calculating a table of percentage increases by region between the first and last dates.

change <- data %>% 
  group_by(region) %>%
  summarise(first = first(OBS_VALUE), last = last(OBS_VALUE)) %>%
  mutate(change = last/first - 1) %>%
  arrange(desc(change))

Now we can plot the percentage change by region.

change %>%
  ggplot(aes(x=reorder(region, change), y=change)) +
  geom_col() +
  coord_flip() +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Percentage change in workforce jobs by region, 1996 to 2018",
       x = "",
       y = "Percentage change")

Next I’d like to create a table comparing the change in the number of jobs in London with the change in the rest of England put together.

data %>%
  mutate(London = recode(region, "London" = "London", .default = "Rest of England")) %>%
  group_by(London, Date) %>%
  summarise(sum = sum(OBS_VALUE)) %>%
  group_by(London) %>%
  summarise(first = first(sum), last = last(sum)) %>%
  mutate(change = last - first) %>%
  arrange(desc(change)) %>%
  kable() %>%
  kable_styling()
London first last change
Rest of England 19717021 24103524 4386503
London 3974885 5983205 2008320

So London accounted for around one third of all the jobs growth in England over this period.

Finally, let’s make a line chart comparing jobs growth rates in London and the rest of England over time. The first step is to calculate the growth rates.

annual_growth <- data %>% 
  mutate(London = recode(region, "London" = "London", .default = "Rest of England")) %>%
  group_by(London, Date) %>%
  summarise(sum = sum(OBS_VALUE)) %>%
  group_by(London) %>%
  mutate(annual_change = (sum / lag(sum,n = 4)) - 1)

And now let’s plot it.

annual_growth %>% 
  ggplot(aes(x = Date, y = annual_change, colour = London)) +
  geom_line() +
  scale_y_continuous(labels = scales::percent) +
  guides(colour = guide_legend("", reverse=TRUE)) +
  theme_minimal() +
  labs(title = "Annual change in number of workforce jobs, 1997-2018",
       subtitle = "Chart by @geographyjim, data from ONS via Nomis",
       x = "Year",
       y="")