This exercise replicates and modifies plots from a New York Times article published after the June 2021 jobs report.
The jobs report data from BLS are originally in a tough to viz format for most software. They are “wide” (the timeseries runs across the rows) rather than “long” (each series running down the columns). I generally prefer to wrangle data like this in R over any other software. This document demonstrates how to take the raw data extracted from BLS here and reshape them for visualization in R (with examples) or any software you like. At the end, I organize the data in event time format so that you can recreate and modify the visualization from the New York Times.
The chunk below displays the required packages for replicating this code in R.
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.2 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
##
## Attaching package: 'flextable'
## The following object is masked from 'package:purrr':
##
## compose
I’ve extracted the original non-farm employment series, with monthly periodicity, and all available metrics (raw numbers, changes, percentage changes) from BLS and saved it as an Excel file. The code below reads in just the section of the Excel workbook that contains data and displays the first 6 rows and 8 columsn to demonstrate the problem we’re trying to fix. In total, the data contain 200 columns.
jobs <- read_excel("bls_nonfarm_emp.xlsx", range = "A4:GR11")
head(jobs) %>%
select(1:8) %>%
flextable()
Series ID | View Description | Jan
| Feb
| Mar
| Apr
| May
| Jun
|
CES0000000001 | Original Data Value | 132,774.0 | 133,032.0 | 133,156.0 | 133,518.0 | 133,690.0 | 133,942.0 |
CES0000000001 | 1-Month Net Change | 150.0 | 258.0 | 124.0 | 362.0 | 172.0 | 252.0 |
CES0000000001 | 3-Month Net Change | 327.0 | 529.0 | 532.0 | 744.0 | 658.0 | 786.0 |
CES0000000001 | 12-Month Net Change | 2,005.0 | 2,207.0 | 2,014.0 | 2,107.0 | 1,996.0 | 2,149.0 |
CES0000000001 | 1-Month Percent Change | 0.1 | 0.2 | 0.1 | 0.3 | 0.1 | 0.2 |
CES0000000001 | 3-Month Percent Change | 0.2 | 0.4 | 0.4 | 0.6 | 0.5 | 0.6 |
The code below will pull all of the values that go across the rows and place them in 1 column called value with the corresponding date in a column called month.
jobs <- jobs %>%
pivot_longer(cols = c(3:200), names_to = "month", values_to = "value")
Now we want each of the series types to be in its own column, but before we do that, it will help to clean up the series names so that they don’t contains spaces or start with numbers. This will ensure they aren’t a pain to work with when they become column names.
jobs <- jobs %>%
mutate(series = case_when(`View Description` == "Original Data Value" ~ "number",
`View Description` == "1-Month Net Change" ~ "change_1mth",
`View Description` == "3-Month Net Change" ~ "change_3mth",
`View Description` == "12-Month Net Change" ~ "change_12mth",
`View Description` == "1-Month Percent Change" ~ "pchange_1mth",
`View Description` == "3-Month Percent Change" ~ "pchange_3mth",
`View Description` == "12-Month Percent Change" ~ "pchange_12mth")) %>%
select(!`Series ID` & !`View Description`) %>%
pivot_wider(names_from = series, values_from = value)
To create the dataframes, we first need to make sure R is recognizing month as a date. Right now, it thinks it’s a character variable. To do this I use the my() function from the lubridate package. my() is for parsing dates that have a month year structure.
jobs <- jobs %>%
mutate(month = my(month))
Now we can create two separate copies of the data, use the interval function from the lubridate package to create our event time variable (months.since) in each, then append the dataframes using bind_rows for plotting. You could also join the dataframes on the months.since rather than bind_rows but ggplot works best with data in long format so this will make for a simpler ggplot call. The first 6 rows and last 6 rows and some key columns of the data are displayed to show the end result of the work we’ve done.
recession08 <- jobs %>%
mutate(recession = "Great Recession",
months.since = interval(mdy(01012008), month) %/% months(1)) %>%
filter(month < mdy(03012020))
recession20 <- jobs %>%
mutate(recession = "Pandemic",
months.since = interval(mdy(03012020), month) %/% months(1)) %>%
filter(month > mdy(01012008))
jobsplotdata <- bind_rows(recession08, recession20)
head(jobsplotdata) %>%
select(month, number, change_1mth, pchange_1mth, recession, months.since) %>%
flextable()
month | number | change_1mth | pchange_1mth | recession | months.since |
2005-01-01 | 132,774 | 150 | 0.1 | Great Recession | -36 |
2005-02-01 | 133,032 | 258 | 0.2 | Great Recession | -35 |
2005-03-01 | 133,156 | 124 | 0.1 | Great Recession | -34 |
2005-04-01 | 133,518 | 362 | 0.3 | Great Recession | -33 |
2005-05-01 | 133,690 | 172 | 0.1 | Great Recession | -32 |
2005-06-01 | 133,942 | 252 | 0.2 | Great Recession | -31 |
tail(jobsplotdata) %>%
select(month, number, change_1mth, pchange_1mth, recession, months.since) %>%
flextable()
month | number | change_1mth | pchange_1mth | recession | months.since |
2021-01-01 | 142,736 | 233 | 0.2 | Pandemic | 10 |
2021-02-01 | 143,272 | 536 | 0.4 | Pandemic | 11 |
2021-03-01 | 144,057 | 785 | 0.5 | Pandemic | 12 |
2021-04-01 | 144,326 | 269 | 0.2 | Pandemic | 13 |
2021-05-01 | 144,909 | 583 | 0.4 | Pandemic | 14 |
2021-06-01 | 145,759 | 850 | 0.6 | Pandemic | 15 |
The code below exports the plotting data to Excel for use in other software using openxlsx
jobswb <- createWorkbook("bls_nonfarm_emp_clean.xlsx")
addWorksheet(jobswb, "jobs report data")
writeDataTable(jobswb,
sheet = "jobs report data",
x = jobsplotdata,
startRow = 3,
tableStyle = "TableStyleMedium9")
saveWorkbook(jobswb,
file = "bls_nonfarm_emp_clean.xlsx",
overwrite = TRUE)
#Get the number of jobs at the start of the Great Recession and store as a number, not dataframe
month1_great <- as.double(jobsplotdata %>%
filter(months.since == 0 & recession == "Great Recession") %>%
select(number))
#Same for pandemic
month1_pand <- as.double(jobsplotdata %>%
filter(months.since == 0 & recession == "Pandemic") %>%
select(number))
#Create the percentage change using the starting values from above and plot
jobsplotdata %>%
filter((recession == "Great Recession" & months.since >=0 & months.since < 25)|(recession == "Pandemic" & months.since >= 0)) %>%
mutate(pct_change = case_when(recession == "Great Recession" ~ 100*(number-month1_great)/month1_great,
recession == "Pandemic" ~ 100*(number-month1_pand)/month1_pand)) %>%
ggplot(aes(x = months.since, y = pct_change, color = recession)) +
geom_line(size = 2) +
theme_classic() +
scale_color_manual(values = c("black", "gold")) +
geom_hline(yintercept = 0, color = "gray") +
labs(x = "Months since recession began",
y = "",
title = "Percent change in jobs since beginning of recession")
jobsplotdata %>%
filter((recession == "Great Recession" & months.since >=0 & months.since < 25)|(recession == "Pandemic" & months.since >= 0)) %>%
ggplot(aes(x = months.since, y = number, color = recession)) +
geom_line(size = 2) +
theme_classic() +
scale_color_manual(values = c("black", "gold")) +
labs(x = "Months since recession began",
y = "",
title = "Total jobs since beginning of recession")
month1_great <- as.double(jobsplotdata %>%
filter(months.since == 0 & recession == "Great Recession") %>%
select(number))
#Same for pandemic
month1_pand <- as.double(jobsplotdata %>%
filter(months.since == 0 & recession == "Pandemic") %>%
select(number))
#Create the percentage change using the starting values from above and plot
jobsplotdata %>%
filter((recession == "Great Recession" & months.since >=0 & months.since < 25)|(recession == "Pandemic" & months.since >= 0)) %>%
mutate(change = case_when(recession == "Great Recession" ~ number-month1_great,
recession == "Pandemic" ~ number-month1_pand)) %>%
ggplot(aes(x = months.since, y = change, color = recession)) +
geom_line(size = 2) +
theme_classic() +
scale_color_manual(values = c("black", "gold")) +
geom_hline(yintercept = 0, color = "gray") +
labs(x = "Months since recession began",
y = "",
title = "Change in jobs since beginning of recession")
jobsplotdata %>%
filter((recession == "Great Recession" & months.since >=0 & months.since < 25)|(recession == "Pandemic" & months.since >= 0)) %>%
ggplot(aes(x = months.since, y = change_1mth, color = recession)) +
geom_line(size = 2) +
theme_classic() +
scale_color_manual(values = c("black", "gold")) +
labs(x = "Months since recession began",
y = "",
title = "Month over month change in jobs since beginning of recession")
jobsplotdata %>%
filter((recession == "Great Recession" & months.since >=-24 & months.since < 25)|(recession == "Pandemic" & months.since >= -24)) %>%
ggplot(aes(x = months.since, y = number, color = recession)) +
geom_line(size = 2) +
theme_classic() +
scale_color_manual(values = c("black", "gold")) +
labs(x = "Months since recession began",
y = "",
title = "Total jobs before and after recessions")