Looping through BLS Data

This will explain the process for retrieving BLS data and looping through that data to store processed data. This is a common workflow in New York - pre-processing charts for every combination of industries and areas in the state, or creating automatic comparisons of New York to all other states when state CES data is released. This will demonstrate a comprehensive data pull of CES data from the BLS, a simple loop on a simple chart, and then demonstrate some more complex loops.

Setting Parameters

I like to define things I might want to change in the code near the top of the script so it is seasy to find. Here, we’ll specify that we are working with data for New York.

state_name_filter <- "New York"
#state_name_list <- c("New York, Pennsylvania")

Getting the Data

Fortunately, BLSloadR makes it easy to get the full CES data for every state. Let’s do it! I like to store my downloaded data in a separate object in R so that if I accidentally break my data as I’m analyzing it I don’t have to re-download the data later.

if (nchar(Sys.getenv("BLS_USER_AGENT"))==0){
  Sys.setenv(BLS_USER_AGENT = "kevin.phelps@labor.ny.gov")
}

if (length(ls(pattern="ces_data"))==0 & 
    length(ls(pattern="ces_download"))==0){
  ces_download <- get_ces()
  head(ces_download)
} else if (length(ls(pattern="ces_data"))!=0 &
           length(ls(pattern="ces_download"))==0){
  ces_download <- ces_data
  head(ces_download)
} else {
 head(ces_download)
}
##               series_id   value state_code area_code supersector_code
##                  <char>   <num>     <char>    <char>           <char>
## 1: SMS01000000000000001 1635500         01     00000               00
## 2: SMS01000000000000001 1633500         01     00000               00
## 3: SMS01000000000000001 1632000         01     00000               00
## 4: SMS01000000000000001 1643700         01     00000               00
## 5: SMS01000000000000001 1648900         01     00000               00
## 6: SMS01000000000000001 1650500         01     00000               00
##    industry_code data_type_code seasonal industry_name state_name area_name
##           <char>         <char>   <char>        <char>     <char>    <char>
## 1:      00000000             01        S Total Nonfarm    Alabama Statewide
## 2:      00000000             01        S Total Nonfarm    Alabama Statewide
## 3:      00000000             01        S Total Nonfarm    Alabama Statewide
## 4:      00000000             01        S Total Nonfarm    Alabama Statewide
## 5:      00000000             01        S Total Nonfarm    Alabama Statewide
## 6:      00000000             01        S Total Nonfarm    Alabama Statewide
##    data_type_text supersector_name       date
##            <char>           <char>     <Date>
## 1:  All Employees    Total Nonfarm 1990-01-01
## 2:  All Employees    Total Nonfarm 1990-02-01
## 3:  All Employees    Total Nonfarm 1990-03-01
## 4:  All Employees    Total Nonfarm 1990-04-01
## 5:  All Employees    Total Nonfarm 1990-05-01
## 6:  All Employees    Total Nonfarm 1990-06-01

Preparing the Data

When doing data analysis, we often want to know things like the change, the percentage change, over different time periods like month and year, and maybe compare to some historical periods. Let’s do that! The key thing to know here is how to group your data. Grouping acts like little miniature distinct tables and controls the scope of your comparisons. When working with a large dataset, this is what allows us to calculate comparisons for distinct groups of data within the full data frame, all at once.

When working with groups, you need to pay attention to what makes observationss within your data distinct - what are the different aspects of the data you are describing? In CES, we have things like the period (which month/year), the state, the area within a state, the industry, whether the data is seasonally adjusted or not, and the type of data measured (employment, employment diffusion, hourly wage, weekly earnings, hours worked). You will often want to group your data on all but one of these elements, with the one you do not use in the group being the one thing that makes the data within that group vary.

I’m also going to pre-filter the data a little bit, looking at just employment data and removing Puerto Rico and the U.S. Virgin Islands. The lag() function lets you compare to a prior observation in a series, which you can go once you specify the groups. As I mentioned above, I’ll create groups that leave just one variable out (the date) so that when I do a lag of 1 I am looking one month back; with a lag of 12 I am looking 1 2months back. In the next chunk, we’ll group the data a little differently to achieve a different effect.

One more thing… the CES download includes a LOT of old data. Since MSA data starts in 1990, let’s drop the old series from 1939-1989. It’s at least a small savings!

bls_ces_all_state <- ces_download |>
  # Remove the 1900s from the data
  filter(date >= "1990-01-01") |> 
  #Specify groups - state / state area / industry / seasonality
  group_by(state_code, area_code, industry_code, seasonal) |>  
  
  # Employment only, not Puerto Rico and U.S. Virgin Islands
  filter(data_type_text == "All Employees",
         !(state_name %in% c("Puerto Rico", "Virgin Islands"))) |>
  
  # Calculate comparisons
  mutate(
    prior_month = lag(value, 1),
    prior_year = lag(value, 12),
    otm_change = value - prior_month,
    oty_change = value - prior_year,
    otm_pct = otm_change / prior_month,
    oty_pct = oty_change / prior_year,
  ) |>
  
  # Group cleanup, to avoid weird issues down the line.
  ungroup() |>
  
  # Streamline what's in our table.
  select(date, state_name, area_name, industry_name, seasonal, data_type_text, value, prior_month:oty_pct)

head(bls_ces_all_state |> filter (date == "2023-02-01"))
## # A tibble: 6 Ɨ 13
##   date       state_name area_name industry_name   seasonal data_type_text  value
##   <date>     <chr>      <chr>     <chr>           <chr>    <chr>           <dbl>
## 1 2023-02-01 Alabama    Statewide Total Nonfarm   S        All Employees  2.15e6
## 2 2023-02-01 Alabama    Statewide Total Private   S        All Employees  1.75e6
## 3 2023-02-01 Alabama    Statewide Goods Producing S        All Employees  3.89e5
## 4 2023-02-01 Alabama    Statewide Service-Provid… S        All Employees  1.76e6
## 5 2023-02-01 Alabama    Statewide Private Servic… S        All Employees  1.36e6
## 6 2023-02-01 Alabama    Statewide Mining and Log… S        All Employees  9.4 e3
## # ℹ 6 more variables: prior_month <dbl>, prior_year <dbl>, otm_change <dbl>,
## #   oty_change <dbl>, otm_pct <dbl>, oty_pct <dbl>

Calculating Industry Shares

Now, I want to do some more comparison. I’d like to calculate the percent share of each industry relative to the total for that area. The are different ways to do this, but this is my favorite - just create a subset of the data, then join it back to your original data, which will effectively repeat that single element for multiple rows.

I will: 1. Pull a subset of the data for Total Nonfarm employment. 2. Keep only the columns I want to use to identify how to join the data back in (where should the new data be used). 3. Rename the data value column (ā€œvalueā€) to something unique. 4. Join that data back to my original data frame. 5. Do math comparing my original data value column to my newly added column.

Once we have the industry share, let’s also calculate how that share has changed over time by grouping and using lag() again.

nonfarm_all <- bls_ces_all_state |>
  filter(industry_name == "Total Nonfarm") |>
  select(date, state_name, area_name, seasonal, data_type_text, value) |>
  rename("total_nonfarm" = "value")

bls_ces_all_state <- bls_ces_all_state |>
  left_join(nonfarm_all) |>
  suppressMessages() |> 
  mutate(ind_share = value/total_nonfarm) |> 
  group_by(state_name, area_name, industry_name, seasonal) |> 
  arrange(date) |>
  mutate(
    decade_ago_share = lag(ind_share, 120),
    decade_share_change = ind_share - decade_ago_share
  ) |>
  ungroup()

head(bls_ces_all_state |> filter(industry_name == "Construction") |> filter (date == "2023-02-01"))
## # A tibble: 6 Ɨ 17
##   date       state_name area_name industry_name seasonal data_type_text  value
##   <date>     <chr>      <chr>     <chr>         <chr>    <chr>           <dbl>
## 1 2023-02-01 Alabama    Statewide Construction  S        All Employees  100100
## 2 2023-02-01 Alaska     Statewide Construction  S        All Employees   16400
## 3 2023-02-01 Arizona    Statewide Construction  S        All Employees  206200
## 4 2023-02-01 Arkansas   Statewide Construction  S        All Employees   61400
## 5 2023-02-01 California Statewide Construction  S        All Employees  910100
## 6 2023-02-01 Colorado   Statewide Construction  S        All Employees  183800
## # ℹ 10 more variables: prior_month <dbl>, prior_year <dbl>, otm_change <dbl>,
## #   oty_change <dbl>, otm_pct <dbl>, oty_pct <dbl>, total_nonfarm <dbl>,
## #   ind_share <dbl>, decade_ago_share <dbl>, decade_share_change <dbl>

Calculating Ranks

People often care a great deal not just how their state is performing, but how it is performing relative to other states. We can easily calculate and store those variables as well using groups. Remember that when grouping we want to group by everything that makes an observation unique except one dimension. Earlier, we left out the date, so that we could compare a unique area/industry over time. Now, we want to leave out the individual areas to rank them. We’ll do this twice - once for all the areas within a state, and once for all the states. We’ll subset out the data and then join it back together because we don’t want ā€œStatewideā€ to always rank #1 within a state.

area_ranks <- bls_ces_all_state |>
  filter(area_name != "Statewide") |>
  select(date, industry_name, seasonal, state_name, area_name, value, otm_pct, oty_pct, ind_share, decade_share_change) |> 
  group_by(date, industry_name, seasonal, state_name) |>
  mutate(
    area_rank_value = floor(rank(-value)),
    area_rank_otm_pct = floor(rank(-otm_pct)),
    area_rank_oty_pct = floor(rank(-oty_pct)),
    area_rank_share = floor(rank(-ind_share)),
    area_rank_decade_share_change = floor(rank(-decade_share_change))
  ) |>
  ungroup() |> 
  select(-c(value, otm_pct, oty_pct, ind_share, decade_share_change))

state_ranks <- bls_ces_all_state |>
  filter(area_name == "Statewide") |>
  select(date, industry_name, seasonal, state_name, value, otm_pct, oty_pct, ind_share, decade_share_change) |> 
  group_by(date, industry_name, seasonal) |>
  mutate(
    state_rank_value = floor(rank(-value)),
    state_rank_otm_pct = floor(rank(-otm_pct)),
    state_rank_oty_pct = floor(rank(-oty_pct)),
    state_rank_share = floor(rank(-ind_share)),
    state_rank_decade_share_change = floor(rank(-decade_share_change))
  ) |>
  ungroup() |> 
  select(-c(value, otm_pct, oty_pct, ind_share, decade_share_change))
  
  bls_ces_all_state <- bls_ces_all_state |> 
    left_join(area_ranks) |> 
    left_join(state_ranks) |> 
    suppressMessages()
  
head(bls_ces_all_state |> filter(industry_name == "Construction") |> filter (date == "2023-02-01", state_name == "New York"))  
## # A tibble: 3 Ɨ 27
##   date       state_name area_name   industry_name seasonal data_type_text  value
##   <date>     <chr>      <chr>       <chr>         <chr>    <chr>           <dbl>
## 1 2023-02-01 New York   Statewide   Construction  S        All Employees  393900
## 2 2023-02-01 New York   Statewide   Construction  U        All Employees  371400
## 3 2023-02-01 New York   Rochester,… Construction  U        All Employees   21100
## # ℹ 20 more variables: prior_month <dbl>, prior_year <dbl>, otm_change <dbl>,
## #   oty_change <dbl>, otm_pct <dbl>, oty_pct <dbl>, total_nonfarm <dbl>,
## #   ind_share <dbl>, decade_ago_share <dbl>, decade_share_change <dbl>,
## #   area_rank_value <dbl>, area_rank_otm_pct <dbl>, area_rank_oty_pct <dbl>,
## #   area_rank_share <dbl>, area_rank_decade_share_change <dbl>,
## #   state_rank_value <dbl>, state_rank_otm_pct <dbl>, state_rank_oty_pct <dbl>,
## #   state_rank_share <dbl>, state_rank_decade_share_change <dbl>

Charting

Now for the fun stuff! We have a lot of data, in the ballpark of 6 million rows of data as of the time of writing this. We need to put some of it to use. To do so, consider what we want to chart. Do we want to look at the change over time? Do we want to look at the current state of affairs? Depending on what we want to plot, different types of charts might be preferable.

We’ll start off looking at a manually-filtered example, then examine how we can scale that up to iterative looping.

I’m going to look at the Retail Trade industry in New York, starting in the year 2000. This first plot will use a few elements: - The x axis is the date and the y axis is value which in this case is ā€œAll Employeesā€. - Seasonal adjustment is indicated by different line colors. - Substate areas are split into different panels using facet_wrap() - The scales of the facets are allowed to vary so that smaller and larger areas show individual change. - The y axis will be scaled as a pretty number using scales::comma - Titles, captions, axes, and the color legend are modified using labs() - The theme of the plot is set to a simple black-and-white. - The legend is moved to the bottom for readability.

area_name_filter <- c("New York-Jersey City-White Plains, NY-NJ Metropolitan Division",
                      "New York-Newark-Jersey City, NY-NJ")

ny_retail <- bls_ces_all_state |> 
  filter(date >= "2000-01-01") |> 
  filter(state_name == "New York",
         industry_name == "Retail Trade",
         area_name != area_name_filter[1],
         area_name != area_name_filter[2])

ny_retail$area_name[grep(".+Metropolitan Division", ny_retail$area_name)] <-
  str_remove(ny_retail$area_name[grep(".+Metropolitan Division", ny_retail$area_name)],
              " Metropolitan Division")

ny_retail$area_name[grep(", NY.*", ny_retail$area_name)] <- 
  str_remove(ny_retail$area_name[grep(", NY.*", ny_retail$area_name)], ", NY.*")

ggplot(ny_retail, aes(x = date, y = value, color = seasonal)) + 
  geom_line() +
  facet_wrap(~ area_name, ncol = 3, scales = "free_y") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Retail Trade Employment in New York",
       x = "Monthly Estimates",
       y = "Total Employment",
       color = "Seasonal Adjustment",
       caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics") +
  theme_bw(base_size = 18) + 
  theme(
    legend.position = "bottom"
  )

Improving for Iteration

Looking at the chart, there are a few things we are doing which we can make dynamic. - We are manually filtering our data for a specific state and a specific industry. - We are then re-typing that state and industry name in our plot labels - From looking at this chart, it’s not clear when the chart was created because I did not include a date.

We can make this dynamic by pulling those elements out of our code and insert them into the plot. Remember, I already set a state_name_filter at the beginning of this code, so we’ll reuse it here. To create text strings, I’ll also start using paste0() and the format.Date() function to make some pretty text strings out of dates.

Finally, let’s save our output as an image using the ggsave() function.

industry_filter <- "Retail Trade"

filtered_data <- bls_ces_all_state |> 
  filter(date >= "2000-01-01") |> 
  filter(state_name == state_name_filter,
         industry_name == industry_filter,
         area_name != area_name_filter[1],
         area_name != area_name_filter[2])

filtered_data$area_name[grep(".+Metropolitan Division", filtered_data$area_name)] <-
  str_remove(
    filtered_data$area_name[grep(".+Metropolitan Division", filtered_data$area_name)],
              " Metropolitan Division")

filtered_data$area_name[grep(", NY.*", filtered_data$area_name)] <- 
  str_remove(filtered_data$area_name[grep(", NY.*", filtered_data$area_name)], ", NY.*")


earliest_date <- min(filtered_data$date) |> format.Date(format = "%B %Y")
latest_date <- max(filtered_data$date) |> format.Date(format = "%B %Y")

ggplot(filtered_data, aes(x = date, y = value, color = seasonal)) + 
  geom_line() +
  facet_wrap(~ area_name, ncol = 2, scales = "free_y") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = paste0(industry_filter, " Employment in New York"),
       subtitle = paste0("Data from ",earliest_date," to ",latest_date),
       x = "Monthly Estimates",
       y = "Total Employment",
       color = "Seasonal Adjustment",
       caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics") +
  theme_bw(base_size = 18) + 
  theme(
    legend.position = "bottom"
  )

# ggsave("images/Retail Trade Employment in New York.png", width = 12, height = 6.75, dpi = "retina", create.dir = TRUE)

Dynamic Saving

One last step to unlock iteration is to store our content. Sometimes you will want to iterate and save data frames. Here, we want to be able to save the output without having to type out a file name that’s distinctive each time. R can generate those for us. Let’s define a dynamic filename that will save our output in a unique folder for each month, which includes the industry and state for uniqueness.

Using create.dir = TRUE will be a big help, because if R needs to create a new folder when you run this next month with new data, it will do so automatically!

Iteration Unlocked

We now have all the elements we need to make this awesome. We are pulling in our data automatically from the BLS using BLSloadR. We have a plot that just needs an industry and state in order to run. And the output from that plot will automatically get saved in a unique directory and file name.

Now all we need is to be able to instruct R ā€œNow, do this same thing over and over while changing the input you use as the filter.ā€ We’ll start off by doing this with a for loop. It says, roughly, for a thing in a group of things, do this other thing.

I’ll use a typical conyention using i. i will be a sequence of numbers from 1 to the length of a vector containing a list of industries. It will start with the first item in the list, then the second item, then the third, and so on.

Here’s a simple example:

Now, let’s use that logic with a list of industries in the area we are looking at! I will use the exact same code as before, but instead of saying industry_filter <- "Retail Trade" I will set the filter to an item in the list of all industry_names. The only other change I will make is to store the ggplot as a part of a list of plots. When generating a chart inside a for loop, it won’t render pretty in this document, and I want you to see the work!

First, I will filter the data to New York for the 200+ time period. Then I will use pull() to pull out the industry name column. Finally, I will use unique() to get only one instance of each industry name in my list.

First, let’s just start with the first 5 industries to see if it works…

ny_only <- bls_ces_all_state |> 
  filter(date >= "2000-01-01",
         state_name == state_name_filter,
         area_name != area_name_filter[1],
         area_name != area_name_filter[2])

ny_only$area_name[grep(".+Metropolitan Division", ny_only$area_name)] <-
  str_remove(
    ny_only$area_name[grep(".+Metropolitan Division", ny_only$area_name)],
              " Metropolitan Division")

ny_only$area_name[grep(", NY.*", ny_only$area_name)] <- 
  str_remove(ny_only$area_name[grep(", NY.*", ny_only$area_name)], ", NY.*")

industry_names <- ny_only |> 
  pull(industry_name) |> 
  unique()

p <- list()

for(i in 1:5){
  
  industry_filter <- industry_names[i]
  
  filtered_data <- ny_only |> 
    filter(industry_name == industry_filter)
  
  earliest_date <- min(filtered_data$date) |> format.Date(format = "%B %Y")
  latest_date <- max(filtered_data$date) |> format.Date(format = "%B %Y")
  
  p[[i]] <- ggplot(filtered_data, aes(x = date, y = value, color = seasonal)) + 
    geom_line() +
    facet_wrap(~ area_name, ncol = 2, scales = "free_y") +
    scale_y_continuous(labels = scales::comma) +
    labs(title = paste0(industry_filter, " Employment in New York"),
         subtitle = paste0("Data from ",earliest_date," to ",latest_date),
         x = "Monthly Estimates",
         y = "Total Employment",
         color = "Seasonal Adjustment",
         caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics") +
    theme_bw(base_size=18) + 
    theme(
      legend.position = "bottom"
    )
  
#   dynamic_filename <- paste0("images/",latest_date,"/",industry_filter," Employment in ",state_name_filter,".png")
# 
#   ggsave(dynamic_filename, width = 12, height = 6.75, dpi = "retina", create.dir = TRUE)
#   
}

print(p[[3]])

print(p[[5]])

Excellent. Now… let’s do them all!

p <- list()

for(i in 1:length(industry_names)){
  
  industry_filter <- industry_names[i]
  
  filtered_data <- ny_only |> 
    filter(industry_name == industry_filter)
  
  earliest_date <- min(filtered_data$date) |> format.Date(format = "%B %Y")
  latest_date <- max(filtered_data$date) |> format.Date(format = "%B %Y")
  
  p[[i]] <- ggplot(filtered_data, aes(x = date, y = value, color = seasonal)) + 
    geom_line() +
    facet_wrap(~ area_name, ncol = 2, scales = "free_y") +
    scale_y_continuous(labels = scales::comma) +
    labs(title = paste0(industry_filter, " Employment in New York"),
         subtitle = paste0("Data from ",earliest_date," to ",latest_date),
         x = "Monthly Estimates",
         y = "Total Employment",
         color = "Seasonal Adjustment",
         caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics") +
    theme_bw(base_size=18) + 
    theme(
      legend.position = "bottom"
    )
  
  # dynamic_filename <- paste0("images/",latest_date,"/",industry_filter," Employment in ",state_name_filter,".png")
  # 
  # ggsave(dynamic_filename, width = 12, height = 6.75, dpi = "retina", create.dir = TRUE)
  
}

print(p[[10]])

print(p[[15]])

print(p[[25]])

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, 
                      fig.height = 7, fig.width=12, fig.retina = 1)
p <- list()

for(i in 1:length(industry_names)){
  
  industry_filter <- industry_names[i]
  
  filtered_data <- ny_only |> 
    filter(industry_name == industry_filter)
  
  earliest_date <- min(filtered_data$date) |> format.Date(format = "%B %Y")
  latest_date <- max(filtered_data$date) |> format.Date(format = "%B %Y")
  
  p[[i]] <- ggplot(filtered_data, aes(x = date, y = value, color = seasonal)) + 
    geom_line() +
    facet_wrap(~ area_name, ncol = 2, scales = "free_y") +
    scale_y_continuous(labels = scales::comma) +
    labs(title = paste0(industry_filter, " Employment in New York"),
         subtitle = paste0("Data from ",earliest_date," to ",latest_date),
         x = "Monthly Estimates",
         y = "Total Employment",
         color = "Seasonal Adjustment",
         caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics") +
    theme_bw(base_size=18) + 
    theme(
      legend.position = "bottom"
    )
  
  # dynamic_filename <- paste0("images/",latest_date,"/",industry_filter," Employment in ",state_name_filter,".png")
  # 
  # ggsave(dynamic_filename, width = 12, height = 6.75, dpi = "retina", create.dir = TRUE)
  
}

print(p[[35]])

print(p[[55]])

This might take a minute or two to run, but you have a comprehensive list of every unique industry in the state, dynamically generated and saved, and if you want to change the feel of the charts, you can do so in just ONE place.

But Wait, There’s More!

So far we have not really plumbed the depths of what this can unlock. We’re sticking to a single state, but we went through quite a lot of effort before to calculate a lot of data for a lot of states, and all we’ve done is plot employment levels. We’re now going to start adding a lot more polish, with remarkably little extra work. This is a foundation for a lot of potential variation, using similar concepts: pasting together text strings, iterating through the data, and saving the outputs in a consistent location.

Let’s start by backing away from the detailed industries and areas and refocus on seasonally adjusted total nonfarm employment at the statewide level in the most recent month. This will follow our typical iteration workflow. Start simple, then substitute in dynamic data for hard-coded filters.

If we simply chart this data, it will look like a bit of a mess at first. The state names are all alphabetical, and it tells us what we already know - the biggest states are, well, big. And note - here, I like to use my state names on the y axis so they print horizontally (easier to read!). With R, controlling your X and y axis is as easy as telling it where you want things to print.

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, 
                      fig.asp=1.25, fig.width=12, fig.retina = 1)

national_comps <- bls_ces_all_state |> 
  filter(area_name == "Statewide",
         industry_name == "Total Nonfarm",
         date == max(date),
         seasonal == "S"
  )

ggplot(national_comps, aes(x = value, y = state_name)) +
  geom_col() +
  scale_x_continuous(labels = comma) +
  labs(title = "Total Nonfarm Employment by State",
       x = "Employment",
       y = "State") +
  theme_bw(base_size=18)

Let’s make a couple improvements to this process. First, I want to highlight the state that we specified before in a new color. Second, I want to reorder the labels based on their value.

national_comps <- national_comps |> 
  mutate(is_chosen  = if_else(state_name == state_name_filter, state_name_filter, "Other States"))

ggplot(national_comps, aes(x = value, y = reorder(state_name, value), fill = is_chosen)) +
  geom_col() +
  scale_x_continuous(labels = comma) +
  labs(title = "Total Nonfarm Employment by State",
       x = "Employment",
       y = "State",
       fill = "Selected State",
       caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics") +
  theme_bw(base_size=18)

Better! This is more orderly. But we can add some more information. It’s not clear from this chart what our state’s employment level is or how it ranks. We can pull that information in and include it in the chart, like we did with dates before. We’ll more explicitly use functions from the scales package, which can transform 2105321 into 2,105,321, 0.0034 into 0.34%, and 3 into 3rd.

#Text Generation
state_emp_text <- national_comps |> filter(state_name == state_name_filter) |> pull(value) |> comma()
state_emp_rank_text <- national_comps |> filter(state_name == state_name_filter) |> pull(state_rank_value) |> ordinal()
date_text <- national_comps |> filter(state_name == state_name_filter) |> pull(date) |> format.Date("%B %Y")

title_text <- paste0("Total Nonfarm Employment in ",date_text)
subtitle_text <- paste0(state_name_filter,"'s Employment was ",state_emp_text," which ranked ",state_emp_rank_text )

ggplot(national_comps, aes(x = value, y = reorder(state_name, value), fill = is_chosen)) +
  geom_col() +
  scale_x_continuous(labels = comma) +
  labs(title = title_text,
       subtitle = subtitle_text,
       x = "Employment",
       y = "State",
       fill = "Selected State",
       caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics. Data is seasonally adjusted.") +
  theme_bw(base_size=18)

This is better still, as we have more details embedded in the chart. How about we iterate this too, for kicks. Once again, we’ll take our hard-coded filter (Total Nonfarm Employment) and make it dynamic based on the industries in the state, then loop through the states. I’m also going to drop the legend out - here, from other context I think it’s clear what we are highlighting, so I don’t think it adds value but does use a lot of space.

# Stage the data frame
national_comps <- bls_ces_all_state |> 
  filter(area_name == "Statewide",
         date == max(date),
         seasonal == "S"
  )|> 
  mutate(is_chosen  = if_else(state_name == state_name_filter, state_name_filter, "Other States"))

# Filter industry names to only industries in our selected state
industry_names <- national_comps |> 
  filter(state_name == state_name_filter) |> 
  pull(industry_name) |> 
  unique()

p <- list()

for(i in 1:length(industry_names)){
  
  industry_filter <- industry_names[i]
  
  filtered_data <- national_comps |> 
    filter(industry_name == industry_filter)
  
  #Text Generation
  state_emp_text <- filtered_data |> filter(state_name == state_name_filter) |> pull(value) |> comma()
  state_emp_rank_text <- filtered_data |> filter(state_name == state_name_filter) |> pull(state_rank_value) |> ordinal()
  date_text <- filtered_data |> filter(state_name == state_name_filter) |> pull(date) |> format.Date("%B %Y")
  
  title_text <- paste0(industry_filter," in ",date_text)
  subtitle_text <- paste0(state_name_filter,"'s Employment was ",state_emp_text," which ranked ",state_emp_rank_text )
  
  p[[i]] <- ggplot(filtered_data, aes(x = value, y = reorder(state_name, value), fill = is_chosen)) +
    geom_col() +
    scale_x_continuous(labels = comma) +
    labs(title = title_text,
         subtitle = subtitle_text,
         x = "Employment",
         y = "State",
         fill = "Selected State",
         caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics. Data is seasonally adjusted.") +
    theme_bw(base_size=18) +
    theme(
      legend.position = "none"
    )
  
  #   dynamic_filename <- paste0("images/",latest_date,"/Employment Comparisons/",industry_filter," Employment Level Comparison.png")
  # 
  # ggsave(filename = dynamic_filename, plot = p[[i]], width = 12, height = 6.75, dpi = "retina", create.dir = TRUE)
  
}

print(p[[27]])

We even managed to slip in a new folder for all of these images too. Let’s pivot slightly and look at growth rates. For comparison between areas, this can give us a better idea of relative performance. fortunately, we have all the rates pre-computed.We just need to tweak our code to look at the oty_pct instead of value. Let’s jump right in to the full loop.

# Stage the data frame
national_comps <- bls_ces_all_state |> 
  filter(area_name == "Statewide",
         date == max(date),
         seasonal == "S"
  )|> 
  mutate(is_chosen  = if_else(state_name == state_name_filter, state_name_filter, "Other States"))

# Filter industry names to only industries in our selected state
industry_names <- national_comps |> 
  filter(state_name == state_name_filter) |> 
  pull(industry_name) |> 
  unique()

q <- list()

for(i in 1:length(industry_names)){
  
  industry_filter <- industry_names[i]
  
  filtered_data <- national_comps |> 
    filter(industry_name == industry_filter)
  
  #Text Generation
  state_emp_text <- filtered_data |> filter(state_name == state_name_filter) |> pull(oty_pct) |> percent(accuracy = 0.1)
  state_emp_rank_text <- filtered_data |> filter(state_name == state_name_filter) |> pull(state_rank_oty_pct) |> ordinal()
  date_text <- filtered_data |> filter(state_name == state_name_filter) |> pull(date) |> format.Date("%B %Y")
  
  title_text <- paste0(industry_filter," in ",date_text)
  subtitle_text <- paste0(state_name_filter,"'s Annual Employment Change was ",state_emp_text," which ranked ",state_emp_rank_text )
  
  q[[i]] <- ggplot(filtered_data, aes(x = oty_pct, y = reorder(state_name, oty_pct), fill = is_chosen)) +
    geom_col() +
    scale_x_continuous(labels = comma) +
    labs(title = title_text,
         subtitle = subtitle_text,
         x = "Annual Employment Change",
         y = "State",
         fill = "Selected State",
         caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics. Data is seasonally adjusted.") +
    theme_bw(base_size=18) +
    theme(
      legend.position = "none"
    )
  
#     dynamic_filename <- paste0("images/",latest_date,"/Employment Comparisons/",industry_filter," Employment Growth Rate Comparison.png")
# 
#   ggsave(filename = dynamic_filename, plot = q[[i]], width = 12, height = 6.75, dpi = "retina", create.dir = TRUE)
#   
}

print(q[[27]])

Combining Plots

One final note… If you want an all-in-one option to show many different metrics at a glance, one way you can do this is by combining plots. You may have noticed I sneakily switched from p to q to store my plots in the last code chunk. This is because I now want to combine p (employment level plots) and q(employment groth rate plots) into a single image.