Preparing the Data
When doing data analysis, we often want to know things like the
change, the percentage change, over different time periods like month
and year, and maybe compare to some historical periods. Letās do that!
The key thing to know here is how to group your data. Grouping acts like
little miniature distinct tables and controls the scope of your
comparisons. When working with a large dataset, this is what allows us
to calculate comparisons for distinct groups of data within the full
data frame, all at once.
When working with groups, you need to pay attention to what makes
observationss within your data distinct - what are the different aspects
of the data you are describing? In CES, we have things like the period
(which month/year), the state, the area within a state, the industry,
whether the data is seasonally adjusted or not, and the type of data
measured (employment, employment diffusion, hourly wage, weekly
earnings, hours worked). You will often want to group your data on all
but one of these elements, with the one you do not use in the group
being the one thing that makes the data within that group vary.
Iām also going to pre-filter the data a little bit, looking at just
employment data and removing Puerto Rico and the U.S. Virgin Islands.
The lag() function lets you compare to a prior observation
in a series, which you can go once you specify the groups. As I
mentioned above, Iāll create groups that leave just one variable out
(the date) so that when I do a lag of 1 I am looking one month back;
with a lag of 12 I am looking 1 2months back. In the next chunk, weāll
group the data a little differently to achieve a different effect.
One more thing⦠the CES download includes a LOT of old data. Since
MSA data starts in 1990, letās drop the old series from 1939-1989. Itās
at least a small savings!
bls_ces_all_state <- ces_download |>
# Remove the 1900s from the data
filter(date >= "1990-01-01") |>
#Specify groups - state / state area / industry / seasonality
group_by(state_code, area_code, industry_code, seasonal) |>
# Employment only, not Puerto Rico and U.S. Virgin Islands
filter(data_type_text == "All Employees",
!(state_name %in% c("Puerto Rico", "Virgin Islands"))) |>
# Calculate comparisons
mutate(
prior_month = lag(value, 1),
prior_year = lag(value, 12),
otm_change = value - prior_month,
oty_change = value - prior_year,
otm_pct = otm_change / prior_month,
oty_pct = oty_change / prior_year,
) |>
# Group cleanup, to avoid weird issues down the line.
ungroup() |>
# Streamline what's in our table.
select(date, state_name, area_name, industry_name, seasonal, data_type_text, value, prior_month:oty_pct)
head(bls_ces_all_state |> filter (date == "2023-02-01"))
## # A tibble: 6 Ć 13
## date state_name area_name industry_name seasonal data_type_text value
## <date> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 2023-02-01 Alabama Statewide Total Nonfarm S All Employees 2.15e6
## 2 2023-02-01 Alabama Statewide Total Private S All Employees 1.75e6
## 3 2023-02-01 Alabama Statewide Goods Producing S All Employees 3.89e5
## 4 2023-02-01 Alabama Statewide Service-Provid⦠S All Employees 1.76e6
## 5 2023-02-01 Alabama Statewide Private Servic⦠S All Employees 1.36e6
## 6 2023-02-01 Alabama Statewide Mining and Log⦠S All Employees 9.4 e3
## # ā¹ 6 more variables: prior_month <dbl>, prior_year <dbl>, otm_change <dbl>,
## # oty_change <dbl>, otm_pct <dbl>, oty_pct <dbl>
Calculating Industry Shares
Now, I want to do some more comparison. Iād like to calculate the
percent share of each industry relative to the total for that area. The
are different ways to do this, but this is my favorite - just create a
subset of the data, then join it back to your original data, which will
effectively repeat that single element for multiple rows.
I will: 1. Pull a subset of the data for Total Nonfarm employment. 2.
Keep only the columns I want to use to identify how to join the data
back in (where should the new data be used). 3. Rename the data value
column (āvalueā) to something unique. 4. Join that data back to my
original data frame. 5. Do math comparing my original data value column
to my newly added column.
Once we have the industry share, letās also calculate how that share
has changed over time by grouping and using lag()
again.
nonfarm_all <- bls_ces_all_state |>
filter(industry_name == "Total Nonfarm") |>
select(date, state_name, area_name, seasonal, data_type_text, value) |>
rename("total_nonfarm" = "value")
bls_ces_all_state <- bls_ces_all_state |>
left_join(nonfarm_all) |>
suppressMessages() |>
mutate(ind_share = value/total_nonfarm) |>
group_by(state_name, area_name, industry_name, seasonal) |>
arrange(date) |>
mutate(
decade_ago_share = lag(ind_share, 120),
decade_share_change = ind_share - decade_ago_share
) |>
ungroup()
head(bls_ces_all_state |> filter(industry_name == "Construction") |> filter (date == "2023-02-01"))
## # A tibble: 6 Ć 17
## date state_name area_name industry_name seasonal data_type_text value
## <date> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 2023-02-01 Alabama Statewide Construction S All Employees 100100
## 2 2023-02-01 Alaska Statewide Construction S All Employees 16400
## 3 2023-02-01 Arizona Statewide Construction S All Employees 206200
## 4 2023-02-01 Arkansas Statewide Construction S All Employees 61400
## 5 2023-02-01 California Statewide Construction S All Employees 910100
## 6 2023-02-01 Colorado Statewide Construction S All Employees 183800
## # ā¹ 10 more variables: prior_month <dbl>, prior_year <dbl>, otm_change <dbl>,
## # oty_change <dbl>, otm_pct <dbl>, oty_pct <dbl>, total_nonfarm <dbl>,
## # ind_share <dbl>, decade_ago_share <dbl>, decade_share_change <dbl>
Calculating Ranks
People often care a great deal not just how their state is
performing, but how it is performing relative to other states. We can
easily calculate and store those variables as well using groups.
Remember that when grouping we want to group by everything that
makes an observation unique except one dimension. Earlier, we left out
the date, so that we could compare a unique area/industry over time.
Now, we want to leave out the individual areas to rank them. Weāll do
this twice - once for all the areas within a state, and once for all the
states. Weāll subset out the data and then join it back together because
we donāt want āStatewideā to always rank #1 within a state.
area_ranks <- bls_ces_all_state |>
filter(area_name != "Statewide") |>
select(date, industry_name, seasonal, state_name, area_name, value, otm_pct, oty_pct, ind_share, decade_share_change) |>
group_by(date, industry_name, seasonal, state_name) |>
mutate(
area_rank_value = floor(rank(-value)),
area_rank_otm_pct = floor(rank(-otm_pct)),
area_rank_oty_pct = floor(rank(-oty_pct)),
area_rank_share = floor(rank(-ind_share)),
area_rank_decade_share_change = floor(rank(-decade_share_change))
) |>
ungroup() |>
select(-c(value, otm_pct, oty_pct, ind_share, decade_share_change))
state_ranks <- bls_ces_all_state |>
filter(area_name == "Statewide") |>
select(date, industry_name, seasonal, state_name, value, otm_pct, oty_pct, ind_share, decade_share_change) |>
group_by(date, industry_name, seasonal) |>
mutate(
state_rank_value = floor(rank(-value)),
state_rank_otm_pct = floor(rank(-otm_pct)),
state_rank_oty_pct = floor(rank(-oty_pct)),
state_rank_share = floor(rank(-ind_share)),
state_rank_decade_share_change = floor(rank(-decade_share_change))
) |>
ungroup() |>
select(-c(value, otm_pct, oty_pct, ind_share, decade_share_change))
bls_ces_all_state <- bls_ces_all_state |>
left_join(area_ranks) |>
left_join(state_ranks) |>
suppressMessages()
head(bls_ces_all_state |> filter(industry_name == "Construction") |> filter (date == "2023-02-01", state_name == "New York"))
## # A tibble: 3 Ć 27
## date state_name area_name industry_name seasonal data_type_text value
## <date> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 2023-02-01 New York Statewide Construction S All Employees 393900
## 2 2023-02-01 New York Statewide Construction U All Employees 371400
## 3 2023-02-01 New York Rochester,⦠Construction U All Employees 21100
## # ā¹ 20 more variables: prior_month <dbl>, prior_year <dbl>, otm_change <dbl>,
## # oty_change <dbl>, otm_pct <dbl>, oty_pct <dbl>, total_nonfarm <dbl>,
## # ind_share <dbl>, decade_ago_share <dbl>, decade_share_change <dbl>,
## # area_rank_value <dbl>, area_rank_otm_pct <dbl>, area_rank_oty_pct <dbl>,
## # area_rank_share <dbl>, area_rank_decade_share_change <dbl>,
## # state_rank_value <dbl>, state_rank_otm_pct <dbl>, state_rank_oty_pct <dbl>,
## # state_rank_share <dbl>, state_rank_decade_share_change <dbl>
Charting
Now for the fun stuff! We have a lot of data, in the ballpark of 6
million rows of data as of the time of writing this. We need to put some
of it to use. To do so, consider what we want to chart. Do we want to
look at the change over time? Do we want to look at the current state of
affairs? Depending on what we want to plot, different types of charts
might be preferable.
Weāll start off looking at a manually-filtered example, then examine
how we can scale that up to iterative looping.
Iām going to look at the Retail Trade industry in New York, starting
in the year 2000. This first plot will use a few elements: - The x axis
is the date and the y axis is value which in this case is
āAll Employeesā. - Seasonal adjustment is indicated by different line
colors. - Substate areas are split into different panels using
facet_wrap() - The scales of the facets are allowed to vary
so that smaller and larger areas show individual change. - The y axis
will be scaled as a pretty number using scales::comma - Titles,
captions, axes, and the color legend are modified using
labs() - The theme of the plot is set to a simple
black-and-white. - The legend is moved to the bottom for
readability.
area_name_filter <- c("New York-Jersey City-White Plains, NY-NJ Metropolitan Division",
"New York-Newark-Jersey City, NY-NJ")
ny_retail <- bls_ces_all_state |>
filter(date >= "2000-01-01") |>
filter(state_name == "New York",
industry_name == "Retail Trade",
area_name != area_name_filter[1],
area_name != area_name_filter[2])
ny_retail$area_name[grep(".+Metropolitan Division", ny_retail$area_name)] <-
str_remove(ny_retail$area_name[grep(".+Metropolitan Division", ny_retail$area_name)],
" Metropolitan Division")
ny_retail$area_name[grep(", NY.*", ny_retail$area_name)] <-
str_remove(ny_retail$area_name[grep(", NY.*", ny_retail$area_name)], ", NY.*")
ggplot(ny_retail, aes(x = date, y = value, color = seasonal)) +
geom_line() +
facet_wrap(~ area_name, ncol = 3, scales = "free_y") +
scale_y_continuous(labels = scales::comma) +
labs(title = "Retail Trade Employment in New York",
x = "Monthly Estimates",
y = "Total Employment",
color = "Seasonal Adjustment",
caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics") +
theme_bw(base_size = 18) +
theme(
legend.position = "bottom"
)

Improving for Iteration
Looking at the chart, there are a few things we are doing which we
can make dynamic. - We are manually filtering our data for a specific
state and a specific industry. - We are then re-typing that state and
industry name in our plot labels - From looking at this chart, itās not
clear when the chart was created because I did not include a date.
We can make this dynamic by pulling those elements out of our code
and insert them into the plot. Remember, I already set a
state_name_filter at the beginning of this code, so weāll reuse it here.
To create text strings, Iāll also start using paste0() and
the format.Date() function to make some pretty text strings
out of dates.
Finally, letās save our output as an image using the
ggsave() function.
industry_filter <- "Retail Trade"
filtered_data <- bls_ces_all_state |>
filter(date >= "2000-01-01") |>
filter(state_name == state_name_filter,
industry_name == industry_filter,
area_name != area_name_filter[1],
area_name != area_name_filter[2])
filtered_data$area_name[grep(".+Metropolitan Division", filtered_data$area_name)] <-
str_remove(
filtered_data$area_name[grep(".+Metropolitan Division", filtered_data$area_name)],
" Metropolitan Division")
filtered_data$area_name[grep(", NY.*", filtered_data$area_name)] <-
str_remove(filtered_data$area_name[grep(", NY.*", filtered_data$area_name)], ", NY.*")
earliest_date <- min(filtered_data$date) |> format.Date(format = "%B %Y")
latest_date <- max(filtered_data$date) |> format.Date(format = "%B %Y")
ggplot(filtered_data, aes(x = date, y = value, color = seasonal)) +
geom_line() +
facet_wrap(~ area_name, ncol = 2, scales = "free_y") +
scale_y_continuous(labels = scales::comma) +
labs(title = paste0(industry_filter, " Employment in New York"),
subtitle = paste0("Data from ",earliest_date," to ",latest_date),
x = "Monthly Estimates",
y = "Total Employment",
color = "Seasonal Adjustment",
caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics") +
theme_bw(base_size = 18) +
theme(
legend.position = "bottom"
)

# ggsave("images/Retail Trade Employment in New York.png", width = 12, height = 6.75, dpi = "retina", create.dir = TRUE)
Dynamic Saving
One last step to unlock iteration is to store our content. Sometimes
you will want to iterate and save data frames. Here, we want to be able
to save the output without having to type out a file name thatās
distinctive each time. R can generate those for us. Letās define a
dynamic filename that will save our output in a unique folder for each
month, which includes the industry and state for uniqueness.
Using create.dir = TRUE will be a big help, because if R
needs to create a new folder when you run this next month with new data,
it will do so automatically!
Iteration Unlocked
We now have all the elements we need to make this awesome. We are
pulling in our data automatically from the BLS using BLSloadR. We have a
plot that just needs an industry and state in order to run. And the
output from that plot will automatically get saved in a unique directory
and file name.
Now all we need is to be able to instruct R āNow, do this same thing
over and over while changing the input you use as the filter.ā Weāll
start off by doing this with a for loop. It says,
roughly, for a thing in a group of things, do this other thing.
Iāll use a typical conyention using i. i will be a sequence of
numbers from 1 to the length of a vector containing a list of
industries. It will start with the first item in the list, then the
second item, then the third, and so on.
Hereās a simple example:
Now, letās use that logic with a list of industries in the area we
are looking at! I will use the exact same code as before, but
instead of saying industry_filter <- "Retail Trade" I
will set the filter to an item in the list of all industry_names. The
only other change I will make is to store the ggplot as a part of a list
of plots. When generating a chart inside a for loop, it wonāt render
pretty in this document, and I want you to see the work!
First, I will filter the data to New York for the 200+ time period.
Then I will use pull() to pull out the industry name
column. Finally, I will use unique() to get only one
instance of each industry name in my list.
First, letās just start with the first 5 industries to see if it
worksā¦
ny_only <- bls_ces_all_state |>
filter(date >= "2000-01-01",
state_name == state_name_filter,
area_name != area_name_filter[1],
area_name != area_name_filter[2])
ny_only$area_name[grep(".+Metropolitan Division", ny_only$area_name)] <-
str_remove(
ny_only$area_name[grep(".+Metropolitan Division", ny_only$area_name)],
" Metropolitan Division")
ny_only$area_name[grep(", NY.*", ny_only$area_name)] <-
str_remove(ny_only$area_name[grep(", NY.*", ny_only$area_name)], ", NY.*")
industry_names <- ny_only |>
pull(industry_name) |>
unique()
p <- list()
for(i in 1:5){
industry_filter <- industry_names[i]
filtered_data <- ny_only |>
filter(industry_name == industry_filter)
earliest_date <- min(filtered_data$date) |> format.Date(format = "%B %Y")
latest_date <- max(filtered_data$date) |> format.Date(format = "%B %Y")
p[[i]] <- ggplot(filtered_data, aes(x = date, y = value, color = seasonal)) +
geom_line() +
facet_wrap(~ area_name, ncol = 2, scales = "free_y") +
scale_y_continuous(labels = scales::comma) +
labs(title = paste0(industry_filter, " Employment in New York"),
subtitle = paste0("Data from ",earliest_date," to ",latest_date),
x = "Monthly Estimates",
y = "Total Employment",
color = "Seasonal Adjustment",
caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics") +
theme_bw(base_size=18) +
theme(
legend.position = "bottom"
)
# dynamic_filename <- paste0("images/",latest_date,"/",industry_filter," Employment in ",state_name_filter,".png")
#
# ggsave(dynamic_filename, width = 12, height = 6.75, dpi = "retina", create.dir = TRUE)
#
}
print(p[[3]])

print(p[[5]])

Excellent. Now⦠letās do them all!
p <- list()
for(i in 1:length(industry_names)){
industry_filter <- industry_names[i]
filtered_data <- ny_only |>
filter(industry_name == industry_filter)
earliest_date <- min(filtered_data$date) |> format.Date(format = "%B %Y")
latest_date <- max(filtered_data$date) |> format.Date(format = "%B %Y")
p[[i]] <- ggplot(filtered_data, aes(x = date, y = value, color = seasonal)) +
geom_line() +
facet_wrap(~ area_name, ncol = 2, scales = "free_y") +
scale_y_continuous(labels = scales::comma) +
labs(title = paste0(industry_filter, " Employment in New York"),
subtitle = paste0("Data from ",earliest_date," to ",latest_date),
x = "Monthly Estimates",
y = "Total Employment",
color = "Seasonal Adjustment",
caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics") +
theme_bw(base_size=18) +
theme(
legend.position = "bottom"
)
# dynamic_filename <- paste0("images/",latest_date,"/",industry_filter," Employment in ",state_name_filter,".png")
#
# ggsave(dynamic_filename, width = 12, height = 6.75, dpi = "retina", create.dir = TRUE)
}
print(p[[10]])

print(p[[15]])

print(p[[25]])

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE,
fig.height = 7, fig.width=12, fig.retina = 1)
p <- list()
for(i in 1:length(industry_names)){
industry_filter <- industry_names[i]
filtered_data <- ny_only |>
filter(industry_name == industry_filter)
earliest_date <- min(filtered_data$date) |> format.Date(format = "%B %Y")
latest_date <- max(filtered_data$date) |> format.Date(format = "%B %Y")
p[[i]] <- ggplot(filtered_data, aes(x = date, y = value, color = seasonal)) +
geom_line() +
facet_wrap(~ area_name, ncol = 2, scales = "free_y") +
scale_y_continuous(labels = scales::comma) +
labs(title = paste0(industry_filter, " Employment in New York"),
subtitle = paste0("Data from ",earliest_date," to ",latest_date),
x = "Monthly Estimates",
y = "Total Employment",
color = "Seasonal Adjustment",
caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics") +
theme_bw(base_size=18) +
theme(
legend.position = "bottom"
)
# dynamic_filename <- paste0("images/",latest_date,"/",industry_filter," Employment in ",state_name_filter,".png")
#
# ggsave(dynamic_filename, width = 12, height = 6.75, dpi = "retina", create.dir = TRUE)
}
print(p[[35]])

print(p[[55]])

This might take a minute or two to run, but you have a comprehensive
list of every unique industry in the state, dynamically generated and
saved, and if you want to change the feel of the charts, you can do so
in just ONE place.
But Wait, Thereās More!
So far we have not really plumbed the depths of what this can unlock.
Weāre sticking to a single state, but we went through quite a lot of
effort before to calculate a lot of data for a lot of states, and all
weāve done is plot employment levels. Weāre now going to start adding a
lot more polish, with remarkably little extra work. This is a foundation
for a lot of potential variation, using similar concepts: pasting
together text strings, iterating through the data, and saving the
outputs in a consistent location.
Letās start by backing away from the detailed industries and areas
and refocus on seasonally adjusted total nonfarm employment at the
statewide level in the most recent month. This will follow our typical
iteration workflow. Start simple, then substitute in dynamic data for
hard-coded filters.
If we simply chart this data, it will look like a bit of a mess at
first. The state names are all alphabetical, and it tells us what we
already know - the biggest states are, well, big. And note - here, I
like to use my state names on the y axis so they print horizontally
(easier to read!). With R, controlling your X and y axis is as easy as
telling it where you want things to print.
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE,
fig.asp=1.25, fig.width=12, fig.retina = 1)
national_comps <- bls_ces_all_state |>
filter(area_name == "Statewide",
industry_name == "Total Nonfarm",
date == max(date),
seasonal == "S"
)
ggplot(national_comps, aes(x = value, y = state_name)) +
geom_col() +
scale_x_continuous(labels = comma) +
labs(title = "Total Nonfarm Employment by State",
x = "Employment",
y = "State") +
theme_bw(base_size=18)

Letās make a couple improvements to this process. First, I want to
highlight the state that we specified before in a new color. Second, I
want to reorder the labels based on their value.
national_comps <- national_comps |>
mutate(is_chosen = if_else(state_name == state_name_filter, state_name_filter, "Other States"))
ggplot(national_comps, aes(x = value, y = reorder(state_name, value), fill = is_chosen)) +
geom_col() +
scale_x_continuous(labels = comma) +
labs(title = "Total Nonfarm Employment by State",
x = "Employment",
y = "State",
fill = "Selected State",
caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics") +
theme_bw(base_size=18)

Better! This is more orderly. But we can add some more information.
Itās not clear from this chart what our stateās employment level is or
how it ranks. We can pull that information in and include it in the
chart, like we did with dates before. Weāll more explicitly use
functions from the scales package, which can transform
2105321 into 2,105,321, 0.0034 into 0.34%, and 3 into 3rd.
#Text Generation
state_emp_text <- national_comps |> filter(state_name == state_name_filter) |> pull(value) |> comma()
state_emp_rank_text <- national_comps |> filter(state_name == state_name_filter) |> pull(state_rank_value) |> ordinal()
date_text <- national_comps |> filter(state_name == state_name_filter) |> pull(date) |> format.Date("%B %Y")
title_text <- paste0("Total Nonfarm Employment in ",date_text)
subtitle_text <- paste0(state_name_filter,"'s Employment was ",state_emp_text," which ranked ",state_emp_rank_text )
ggplot(national_comps, aes(x = value, y = reorder(state_name, value), fill = is_chosen)) +
geom_col() +
scale_x_continuous(labels = comma) +
labs(title = title_text,
subtitle = subtitle_text,
x = "Employment",
y = "State",
fill = "Selected State",
caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics. Data is seasonally adjusted.") +
theme_bw(base_size=18)

This is better still, as we have more details embedded in the chart.
How about we iterate this too, for kicks. Once again, weāll take our
hard-coded filter (Total Nonfarm Employment) and make it dynamic based
on the industries in the state, then loop through the states. Iām also
going to drop the legend out - here, from other context I think itās
clear what we are highlighting, so I donāt think it adds value but does
use a lot of space.
# Stage the data frame
national_comps <- bls_ces_all_state |>
filter(area_name == "Statewide",
date == max(date),
seasonal == "S"
)|>
mutate(is_chosen = if_else(state_name == state_name_filter, state_name_filter, "Other States"))
# Filter industry names to only industries in our selected state
industry_names <- national_comps |>
filter(state_name == state_name_filter) |>
pull(industry_name) |>
unique()
p <- list()
for(i in 1:length(industry_names)){
industry_filter <- industry_names[i]
filtered_data <- national_comps |>
filter(industry_name == industry_filter)
#Text Generation
state_emp_text <- filtered_data |> filter(state_name == state_name_filter) |> pull(value) |> comma()
state_emp_rank_text <- filtered_data |> filter(state_name == state_name_filter) |> pull(state_rank_value) |> ordinal()
date_text <- filtered_data |> filter(state_name == state_name_filter) |> pull(date) |> format.Date("%B %Y")
title_text <- paste0(industry_filter," in ",date_text)
subtitle_text <- paste0(state_name_filter,"'s Employment was ",state_emp_text," which ranked ",state_emp_rank_text )
p[[i]] <- ggplot(filtered_data, aes(x = value, y = reorder(state_name, value), fill = is_chosen)) +
geom_col() +
scale_x_continuous(labels = comma) +
labs(title = title_text,
subtitle = subtitle_text,
x = "Employment",
y = "State",
fill = "Selected State",
caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics. Data is seasonally adjusted.") +
theme_bw(base_size=18) +
theme(
legend.position = "none"
)
# dynamic_filename <- paste0("images/",latest_date,"/Employment Comparisons/",industry_filter," Employment Level Comparison.png")
#
# ggsave(filename = dynamic_filename, plot = p[[i]], width = 12, height = 6.75, dpi = "retina", create.dir = TRUE)
}
print(p[[27]])

We even managed to slip in a new folder for all of these
images too. Letās pivot slightly and look at growth rates. For
comparison between areas, this can give us a better idea of relative
performance. fortunately, we have all the rates pre-computed.We just
need to tweak our code to look at the oty_pct instead of value. Letās
jump right in to the full loop.
# Stage the data frame
national_comps <- bls_ces_all_state |>
filter(area_name == "Statewide",
date == max(date),
seasonal == "S"
)|>
mutate(is_chosen = if_else(state_name == state_name_filter, state_name_filter, "Other States"))
# Filter industry names to only industries in our selected state
industry_names <- national_comps |>
filter(state_name == state_name_filter) |>
pull(industry_name) |>
unique()
q <- list()
for(i in 1:length(industry_names)){
industry_filter <- industry_names[i]
filtered_data <- national_comps |>
filter(industry_name == industry_filter)
#Text Generation
state_emp_text <- filtered_data |> filter(state_name == state_name_filter) |> pull(oty_pct) |> percent(accuracy = 0.1)
state_emp_rank_text <- filtered_data |> filter(state_name == state_name_filter) |> pull(state_rank_oty_pct) |> ordinal()
date_text <- filtered_data |> filter(state_name == state_name_filter) |> pull(date) |> format.Date("%B %Y")
title_text <- paste0(industry_filter," in ",date_text)
subtitle_text <- paste0(state_name_filter,"'s Annual Employment Change was ",state_emp_text," which ranked ",state_emp_rank_text )
q[[i]] <- ggplot(filtered_data, aes(x = oty_pct, y = reorder(state_name, oty_pct), fill = is_chosen)) +
geom_col() +
scale_x_continuous(labels = comma) +
labs(title = title_text,
subtitle = subtitle_text,
x = "Annual Employment Change",
y = "State",
fill = "Selected State",
caption = "Data from the U.S. Bureau of Labor Statistics, Current Employment Statistics. Data is seasonally adjusted.") +
theme_bw(base_size=18) +
theme(
legend.position = "none"
)
# dynamic_filename <- paste0("images/",latest_date,"/Employment Comparisons/",industry_filter," Employment Growth Rate Comparison.png")
#
# ggsave(filename = dynamic_filename, plot = q[[i]], width = 12, height = 6.75, dpi = "retina", create.dir = TRUE)
#
}
print(q[[27]])

Combining Plots
One final note⦠If you want an all-in-one option to show many
different metrics at a glance, one way you can do this is by combining
plots. You may have noticed I sneakily switched from p to q to store my
plots in the last code chunk. This is because I now want to combine p
(employment level plots) and q(employment groth rate plots) into a
single image.