Overview & Dependencies

This exercise replicates and modifies plots from a New York Times article published after the June 2021 jobs report.

The jobs report data from BLS are originally in a tough to viz format for most software. They are “wide” (the timeseries runs across the rows) rather than “long” (each series running down the columns). I generally prefer to wrangle data like this in R over any other software. This document demonstrates how to take the raw data extracted from BLS here and reshape them for visualization in R (with examples) or any software you like. At the end, I organize the data in event time format so that you can recreate and modify the visualization from the New York Times.

The chunk below displays the required packages for replicating this code in R.

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

## 
## Attaching package: 'flextable'

## The following object is masked from 'package:purrr':
## 
##     compose

Data Preparation

Read Data

I’ve extracted the original non-farm employment series, with monthly periodicity, and all available metrics (raw numbers, changes, percentage changes) from BLS and saved it as an Excel file. The code below reads in just the section of the Excel workbook that contains data and displays the first 6 rows and 8 columsn to demonstrate the problem we’re trying to fix. In total, the data contain 200 columns.

jobs <- read_excel("bls_nonfarm_emp.xlsx", range = "A4:GR11")
head(jobs) %>% 
  select(1:8) %>%
  flextable()

Series ID	View Description	Jan 2005	Feb 2005	Mar 2005	Apr 2005	May 2005	Jun 2005
CES0000000001	Original Data Value	132,774.0	133,032.0	133,156.0	133,518.0	133,690.0	133,942.0
CES0000000001	1-Month Net Change	150.0	258.0	124.0	362.0	172.0	252.0
CES0000000001	3-Month Net Change	327.0	529.0	532.0	744.0	658.0	786.0
CES0000000001	12-Month Net Change	2,005.0	2,207.0	2,014.0	2,107.0	1,996.0	2,149.0
CES0000000001	1-Month Percent Change	0.1	0.2	0.1	0.3	0.1	0.2
CES0000000001	3-Month Percent Change	0.2	0.4	0.4	0.6	0.5	0.6

Reshape from Wide to Long

The code below will pull all of the values that go across the rows and place them in 1 column called value with the corresponding date in a column called month.

jobs <- jobs %>%
  pivot_longer(cols = c(3:200), names_to = "month", values_to = "value")

Reshape from long to a little bit wider

Now we want each of the series types to be in its own column, but before we do that, it will help to clean up the series names so that they don’t contains spaces or start with numbers. This will ensure they aren’t a pain to work with when they become column names.

jobs <- jobs %>%
  mutate(series = case_when(`View Description` == "Original Data Value" ~ "number",
                            `View Description` == "1-Month Net Change" ~ "change_1mth",
                            `View Description` == "3-Month Net Change" ~ "change_3mth",
                            `View Description` == "12-Month Net Change" ~ "change_12mth",
                            `View Description` == "1-Month Percent Change" ~ "pchange_1mth",
                            `View Description` == "3-Month Percent Change" ~ "pchange_3mth",
                            `View Description` == "12-Month Percent Change" ~ "pchange_12mth")) %>%
  select(!`Series ID` & !`View Description`) %>%
  pivot_wider(names_from = series, values_from = value)

Create Separate Data Frames for each Recession and the Event Time Variable

To create the dataframes, we first need to make sure R is recognizing month as a date. Right now, it thinks it’s a character variable. To do this I use the my() function from the lubridate package. my() is for parsing dates that have a month year structure.

jobs <- jobs %>%
  mutate(month = my(month))

Now we can create two separate copies of the data, use the interval function from the lubridate package to create our event time variable (months.since) in each, then append the dataframes using bind_rows for plotting. You could also join the dataframes on the months.since rather than bind_rows but ggplot works best with data in long format so this will make for a simpler ggplot call. The first 6 rows and last 6 rows and some key columns of the data are displayed to show the end result of the work we’ve done.

recession08 <- jobs %>%
  mutate(recession = "Great Recession",
         months.since = interval(mdy(01012008), month) %/% months(1)) %>%
  filter(month < mdy(03012020))

recession20 <- jobs %>%
  mutate(recession = "Pandemic",
         months.since = interval(mdy(03012020), month) %/% months(1)) %>%
  filter(month > mdy(01012008))

jobsplotdata <- bind_rows(recession08, recession20)

head(jobsplotdata) %>% 
  select(month, number, change_1mth, pchange_1mth, recession, months.since) %>%
  flextable()

month	number	change_1mth	pchange_1mth	recession	months.since
2005-01-01	132,774	150	0.1	Great Recession	-36
2005-02-01	133,032	258	0.2	Great Recession	-35
2005-03-01	133,156	124	0.1	Great Recession	-34
2005-04-01	133,518	362	0.3	Great Recession	-33
2005-05-01	133,690	172	0.1	Great Recession	-32
2005-06-01	133,942	252	0.2	Great Recession	-31

tail(jobsplotdata) %>% 
  select(month, number, change_1mth, pchange_1mth, recession, months.since) %>%
  flextable()

month	number	change_1mth	pchange_1mth	recession	months.since
2021-01-01	142,736	233	0.2	Pandemic	10
2021-02-01	143,272	536	0.4	Pandemic	11
2021-03-01	144,057	785	0.5	Pandemic	12
2021-04-01	144,326	269	0.2	Pandemic	13
2021-05-01	144,909	583	0.4	Pandemic	14
2021-06-01	145,759	850	0.6	Pandemic	15

Write Data

The code below exports the plotting data to Excel for use in other software using openxlsx

jobswb <- createWorkbook("bls_nonfarm_emp_clean.xlsx")
addWorksheet(jobswb, "jobs report data")
writeDataTable(jobswb,
               sheet = "jobs report data",
               x = jobsplotdata,
               startRow = 3,
               tableStyle = "TableStyleMedium9")

saveWorkbook(jobswb, 
             file = "bls_nonfarm_emp_clean.xlsx",
             overwrite = TRUE)

Visualization

Replicate the plot from the article

#Get the number of jobs at the start of the Great Recession and store as a number, not dataframe
month1_great <- as.double(jobsplotdata %>%
  filter(months.since == 0 & recession == "Great Recession") %>%
  select(number))

#Same for pandemic
month1_pand <- as.double(jobsplotdata %>%
  filter(months.since == 0 & recession == "Pandemic") %>%
  select(number))

#Create the percentage change using the starting values from above and plot
jobsplotdata %>%
  filter((recession == "Great Recession" & months.since >=0 & months.since < 25)|(recession == "Pandemic" & months.since >= 0)) %>%
  mutate(pct_change = case_when(recession == "Great Recession" ~ 100*(number-month1_great)/month1_great,
                                recession == "Pandemic" ~ 100*(number-month1_pand)/month1_pand)) %>%
  ggplot(aes(x = months.since, y = pct_change, color = recession)) +
  geom_line(size = 2) +
  theme_classic() +
  scale_color_manual(values = c("black", "gold")) +
  geom_hline(yintercept = 0, color = "gray") +
  labs(x = "Months since recession began",
       y = "",
       title = "Percent change in jobs since beginning of recession")

Plot the raw jobs numbers

jobsplotdata %>%
  filter((recession == "Great Recession" & months.since >=0 & months.since < 25)|(recession == "Pandemic" & months.since >= 0)) %>%
  ggplot(aes(x = months.since, y = number, color = recession)) +
  geom_line(size = 2) +
  theme_classic() +
  scale_color_manual(values = c("black", "gold")) +
  labs(x = "Months since recession began",
       y = "",
       title = "Total jobs since beginning of recession")

Plot the difference in jobs relative to baseline (not percent difference)

month1_great <- as.double(jobsplotdata %>%
  filter(months.since == 0 & recession == "Great Recession") %>%
  select(number))

#Same for pandemic
month1_pand <- as.double(jobsplotdata %>%
  filter(months.since == 0 & recession == "Pandemic") %>%
  select(number))

#Create the percentage change using the starting values from above and plot
jobsplotdata %>%
  filter((recession == "Great Recession" & months.since >=0 & months.since < 25)|(recession == "Pandemic" & months.since >= 0)) %>%
  mutate(change = case_when(recession == "Great Recession" ~ number-month1_great,
                                recession == "Pandemic" ~ number-month1_pand)) %>%
  ggplot(aes(x = months.since, y = change, color = recession)) +
  geom_line(size = 2) +
  theme_classic() +
  scale_color_manual(values = c("black", "gold")) +
  geom_hline(yintercept = 0, color = "gray") +
  labs(x = "Months since recession began",
       y = "",
       title = "Change in jobs since beginning of recession")

Month over Month Changes

jobsplotdata %>%
  filter((recession == "Great Recession" & months.since >=0 & months.since < 25)|(recession == "Pandemic" & months.since >= 0)) %>%
  ggplot(aes(x = months.since, y = change_1mth, color = recession)) +
  geom_line(size = 2) +
  theme_classic() +
  scale_color_manual(values = c("black", "gold")) +
  labs(x = "Months since recession began",
       y = "",
       title = "Month over month change in jobs since beginning of recession")

Plot the raw jobs numbers with 24 month pretrends

jobsplotdata %>%
  filter((recession == "Great Recession" & months.since >=-24 & months.since < 25)|(recession == "Pandemic" & months.since >= -24)) %>%
  ggplot(aes(x = months.since, y = number, color = recession)) +
  geom_line(size = 2) +
  theme_classic() +
  scale_color_manual(values = c("black", "gold")) +
  labs(x = "Months since recession began",
       y = "",
       title = "Total jobs before and after recessions")

Jobs Report Plot Makeovers