This note uses use the nomisr package to extract data on workforce jobs in London and the other regions of England, and then uses ggplot2 to create an animated line chart. For the data extraction part I’ll broadly follow the steps set out in this article by Evan Odell.
To begin with we load the packages we need: nomisr, tidyverse (for various data munging tasks, and for visualisation using ggplot2), anytime (for parsing textual dates into R-friendly dates), gganimate (for adding animation to plots) and kableExtra for styling tables.
library(nomisr)
library(tidyverse)
library(anytime)
devtools::install_github('thomasp85/gganimate')
library(gganimate)
library(kableExtra)
To find the data we want we can use the nomis_search function.
nomis_search(name = "*workforce*")
## # A tibble: 4 x 14
## agencyid id uri version annotations.annotat~ components.attrib~
## * <chr> <chr> <chr> <dbl> <list> <list>
## 1 NOMIS NM_5_1 Nm-5d1 1 <data.frame [5 x 2]> <data.frame [7 x ~
## 2 NOMIS NM_52_1 Nm-52d1 1 <data.frame [6 x 2]> <data.frame [7 x ~
## 3 NOMIS NM_130~ Nm-130~ 1 <data.frame [8 x 2]> <data.frame [7 x ~
## 4 NOMIS NM_131~ Nm-131~ 1 <data.frame [8 x 2]> <data.frame [7 x ~
## # ... with 8 more variables: components.dimension <list>,
## # components.primarymeasure.conceptref <chr>,
## # components.timedimension.codelist <chr>,
## # components.timedimension.conceptref <chr>, description.value <chr>,
## # description.lang <chr>, name.value <chr>, name.lang <chr>
The description.value field should be a decent guide to what’s in each dataset. Using the dataset ID you can then dig into the annotations and components to find out more about what it contains.
nomis_data_info("NM_130_1") %>%
unnest(components.dimension) %>%
glimpse()
## Observations: 5
## Variables: 14
## $ agencyid <chr> "NOMIS", "NOMIS", "NOMIS"...
## $ id <chr> "NM_130_1", "NM_130_1", "...
## $ uri <chr> "Nm-130d1", "Nm-130d1", "...
## $ version <dbl> 1, 1, 1, 1, 1
## $ components.primarymeasure.conceptref <chr> "OBS_VALUE", "OBS_VALUE",...
## $ components.timedimension.codelist <chr> "CL_130_1_TIME", "CL_130_...
## $ components.timedimension.conceptref <chr> "TIME", "TIME", "TIME", "...
## $ description.value <chr> "This dataset provides qu...
## $ description.lang <chr> "en", "en", "en", "en", "en"
## $ name.value <chr> "workforce jobs by indust...
## $ name.lang <chr> "en", "en", "en", "en", "en"
## $ codelist <chr> "CL_130_1_GEOGRAPHY", "CL...
## $ conceptref <chr> "GEOGRAPHY", "INDUSTRY", ...
## $ isfrequencydimension <chr> NA, NA, NA, NA, "true"
You can also use nomis_overview() to, well, get an overview.
nomis_overview("NM_130_1") %>%
unnest(name) %>%
glimpse()
## Observations: 20
## Variables: 2
## $ value <list> [[<c("1", "3"), c("NM_130_1", "NM_130_3"), c("standard:...
## $ name <chr> "analyses", "analysisname", "analysisnumber", "contact",...
To understand exactly what to query, it’s useful to explore the ‘concepts’ of a dataset.
nomis_get_metadata(id = "NM_130_1")
## # A tibble: 5 x 3
## codelist conceptref isfrequencydimension
## * <chr> <chr> <chr>
## 1 CL_130_1_GEOGRAPHY GEOGRAPHY false
## 2 CL_130_1_INDUSTRY INDUSTRY false
## 3 CL_130_1_ITEM ITEM false
## 4 CL_130_1_MEASURES MEASURES false
## 5 CL_130_1_FREQ FREQ true
You can then specify a particular concept to get first the types of values and then the list of values for each type.
nomis_get_metadata(id = "NM_130_1", concept = "GEOGRAPHY", type = "type")
## # A tibble: 2 x 3
## id label.en description.en
## <chr> <chr> <chr>
## 1 TYPE480 regions regions
## 2 TYPE499 countries countries
nomis_get_metadata(id = "NM_130_1", concept = "GEOGRAPHY", type = "TYPE480")
## # A tibble: 12 x 4
## id parentCode label.en description.en
## <chr> <chr> <chr> <chr>
## 1 2013265921 2092957699 North East North East
## 2 2013265922 2092957699 North West North West
## 3 2013265923 2092957699 Yorkshire and The Humber Yorkshire and The Humber
## 4 2013265924 2092957699 East Midlands East Midlands
## 5 2013265925 2092957699 West Midlands West Midlands
## 6 2013265926 2092957699 East East
## 7 2013265927 2092957699 London London
## 8 2013265928 2092957699 South East South East
## 9 2013265929 2092957699 South West South West
## 10 2013265930 2092957700 Wales Wales
## 11 2013265931 2092957701 Scotland Scotland
## 12 2013265932 2092957702 Northern Ireland Northern Ireland
Armed with the info we’ve gleaned, it’s finally time to download some data!
d <- nomis_get_data(id = "NM_130_1",
time = c("first", "latest"), # download the entire time period
geography = NULL, # all geographies
measures = "20100", # value
item = "1") # total workforce jobs
We can use the anytime packaage to parse the month variable (as it saves us having to exactly specify the conversion to use).
d$Date <- anydate(d$DATE)
Now it’s time to illustrate this data with some visuals. First, let’s create a filtered and tweaked version of our data, limiting it to only English regions.
data <- d %>%
filter(GEOGRAPHY_TYPE == "regions" & !is.na(OBS_VALUE)) %>%
filter(GEOGRAPHY_NAME != c("Wales", "Scotland", "Northern Ireland"))
data$region <- as.factor(data$GEOGRAPHY_NAME)
Now let’s plot it! I decided to go for an animated line chart here (it took me a while to work out that for a line chart you should use transition_reveal as that retains the line for previous dates).
ggplot(data = data, aes(x = Date, y = OBS_VALUE, colour = region, group=region)) +
geom_line() +
geom_segment(aes(xend = max(Date), yend = OBS_VALUE), linetype = 2, colour = 'grey') +
geom_text(aes(x = max(Date), label = region), hjust = 0) +
geom_point() +
scale_y_continuous(labels = scales::comma) +
expand_limits(y = 0) + # Ensure the y axis includes zero
theme_minimal() +
labs(title = "Workforce jobs by region, 1996-2018",
subtitle = "Chart by @geographyjim, data from ONS via Nomis",
x = "Year",
y="") +
guides(colour = FALSE) + # drop the legend
transition_reveal(Date) +
coord_cartesian(clip = 'off') +
theme(plot.margin = margin(5.5, 120, 5.5, 5.5)) # create space on the right for labels
There are a number of interesting trends that jump out and are worth exploring further. First, it looks like London saw the biggest increase in jobs over this period, but we can test that by calculating a table of percentage increases by region between the first and last dates.
change <- data %>%
group_by(region) %>%
summarise(first = first(OBS_VALUE), last = last(OBS_VALUE)) %>%
mutate(change = last/first - 1) %>%
arrange(desc(change))
Now we can plot the percentage change by region.
change %>%
ggplot(aes(x=reorder(region, change), y=change)) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = scales::percent) +
labs(title = "Percentage change in workforce jobs by region, 1996 to 2018",
x = "",
y = "Percentage change")
Next I’d like to create a table comparing the change in the number of jobs in London with the change in the rest of England put together.
data %>%
mutate(London = recode(region, "London" = "London", .default = "Rest of England")) %>%
group_by(London, Date) %>%
summarise(sum = sum(OBS_VALUE)) %>%
group_by(London) %>%
summarise(first = first(sum), last = last(sum)) %>%
mutate(change = last - first) %>%
arrange(desc(change)) %>%
kable() %>%
kable_styling()
| London | first | last | change |
|---|---|---|---|
| Rest of England | 19717021 | 24103524 | 4386503 |
| London | 3974885 | 5983205 | 2008320 |
So London accounted for around one third of all the jobs growth in England over this period.
Finally, let’s make a line chart comparing jobs growth rates in London and the rest of England over time. The first step is to calculate the growth rates.
annual_growth <- data %>%
mutate(London = recode(region, "London" = "London", .default = "Rest of England")) %>%
group_by(London, Date) %>%
summarise(sum = sum(OBS_VALUE)) %>%
group_by(London) %>%
mutate(annual_change = (sum / lag(sum,n = 4)) - 1)
And now let’s plot it.
annual_growth %>%
ggplot(aes(x = Date, y = annual_change, colour = London)) +
geom_line() +
scale_y_continuous(labels = scales::percent) +
guides(colour = guide_legend("", reverse=TRUE)) +
theme_minimal() +
labs(title = "Annual change in number of workforce jobs, 1997-2018",
subtitle = "Chart by @geographyjim, data from ONS via Nomis",
x = "Year",
y="")