I chose a dataset from the Bureau of Labor Statistics. I copied the webpage table and pasted it into Excel to make it a csv.
Read in the data that is in csv file.
jobs <- read_csv("https://raw.githubusercontent.com/jacshap/Data607/refs/heads/main/Jacob%20Shapiro%20Labor%20Statistics%20Data.csv", col_names = TRUE, show_col_types = FALSE)
## New names:
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
names(jobs)[1] <- "State_and_area"
names(jobs)[2] <- "Civilian_Labor_Force"
knitr::kable(head(jobs,n=20))
| State_and_area | Civilian_Labor_Force | …3 | …4 | …5 | Unemployed | …7 | …8 | …9 | …10 | …11 | …12 | …13 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NA | July | NA | August | NA | Number | NA | NA | NA | Percent of labor force | NA | NA | NA |
| NA | 2024 | 2025 | 2024 | 2025(p) | July | NA | August | NA | July | NA | August | NA |
| NA | NA | NA | NA | NA | 2024 | 2025 | 2024 | 2025(p) | 2024 | 2025.0 | 2024 | 2025(p) |
| Alabama | 2,362,412 | 2377420 | 2,359,738 | 2,360,249 | 81,384 | 69419 | 82,758 | 63,784 | 3.4 | 2.9 | 3.5 | 2.7 |
| Anniston-Oxford | 49,862 | 50648 | 50,053 | 50,313 | 1,822 | 1673 | 1,899 | 1,501 | 3.7 | 3.3 | 3.8 | 3 |
| Auburn-Opelika | 96,216 | 98071 | 96,900 | 97,406 | 3,568 | 3006 | 3,436 | 2,496 | 3.7 | 3.1 | 3.5 | 2.6 |
| Birmingham | 577,261 | 578403 | 575,424 | 574,027 | 18,357 | 15570 | 18,971 | 14,568 | 3.2 | 2.7 | 3.3 | 2.5 |
| Daphne-Fairhope-Foley | 119,577 | 118612 | 118,098 | 117,059 | 3,490 | 2865 | 3,686 | 2,742 | 2.9 | 2.4 | 3.1 | 2.3 |
| Decatur | 76,452 | 77558 | 76,288 | 76,879 | 2,251 | 1865 | 2,319 | 1,723 | 2.9 | 2.4 | 3 | 2.2 |
| Dothan | 68,627 | 68570 | 68,313 | 67,924 | 2,287 | 1905 | 2,399 | 1,790 | 3.3 | 2.8 | 3.5 | 2.6 |
| Florence-Muscle Shoals | 69,398 | 70608 | 69,343 | 70,019 | 2,687 | 2325 | 2,591 | 2,119 | 3.9 | 3.3 | 3.7 | 3 |
| Gadsden | 42,402 | 42829 | 42,323 | 42,432 | 1,544 | 1298 | 1,590 | 1,160 | 3.6 | 3.0 | 3.8 | 2.7 |
| Huntsville | 277,426 | 282660 | 278,049 | 281,661 | 7,976 | 6807 | 8,141 | 6,262 | 2.9 | 2.4 | 2.9 | 2.2 |
| Mobile | 187,183 | 189179 | 187,364 | 188,356 | 7,379 | 6470 | 7,636 | 6,299 | 3.9 | 3.4 | 4.1 | 3.3 |
| Montgomery | 179,531 | 181500 | 179,537 | 180,198 | 6,113 | 5213 | 6,233 | 4,842 | 3.4 | 2.9 | 3.5 | 2.7 |
| Tuscaloosa | 126,027 | 128315 | 127,140 | 127,591 | 5,066 | 4631 | 4,894 | 3,906 | 4 | 3.6 | 3.8 | 3.1 |
| Alaska | 368,751 | 373580 | 365,359 | 371,377 | 16,237 | 16728 | 14,861 | 14,448 | 4.4 | 4.5 | 4.1 | 3.9 |
| Anchorage | 199,074 | 201954 | 199,228 | 201,580 | 8,300 | 8318 | 7,533 | 7,117 | 4.2 | 4.1 | 3.8 | 3.5 |
| Fairbanks-College | 45,696 | 46529 | 45,784 | 46,681 | 1,806 | 1763 | 1,609 | 1,467 | 4 | 3.8 | 3.5 | 3.1 |
| Arizona | 3,757,669 | 3816886 | 3,746,265 | 3,807,573 | 163,249 | 177712 | 156,530 | 181,566 | 4.3 | 4.7 | 4.2 | 4.8 |
Need to do a couple things. 1. Move cities/areas to separate column and have states in left column. 2. Fix first three rows to have appropriate titles. Will need to create columns for month and year.
Load list of states and make separate column based on that. Then make columns for Metro and State, then fill blank spaces and reorder.
States <- c(state.name)
non_states <- c("District of Columbia", "Puerto Rico")
States <- append(States, non_states)
States <- sort(States)
jobs <- jobs %>% mutate(Metro = ifelse(State_and_area %in% States, "", State_and_area))
jobs <- jobs %>% mutate(State = ifelse(State_and_area %in% States, State_and_area, NA))
jobs <- jobs %>% fill(State)
#jobs %>% mutate(Metro = ifelse(Metro =="", State, Metro))
jobs <- jobs[,-1]
jobs <- jobs %>% relocate(Metro)
jobs <- jobs %>% relocate(State)
names(jobs)[3] <- "Civilian_Labor_Force,2024,July"
names(jobs)[4] <- "Civilian_Labor_Force,2025,July"
names(jobs)[5] <- "Civilian_Labor_Force,2024,August"
names(jobs)[6] <- "Civilian_Labor_Force,2025,August"
names(jobs)[7] <- "Unemployed_Number,2024,July"
names(jobs)[8] <- "Unemployed_Number,2025,July"
names(jobs)[9] <- "Unemployed_Number,2024,August"
names(jobs)[10] <- "Unemployed_Number,2025,August"
names(jobs)[11] <- "Unemployed_Percentage,2024,July"
names(jobs)[12] <- "Unemployed_Percentage,2025,July"
names(jobs)[13] <- "Unemployed_Percentage,2024,August"
names(jobs)[14] <- "Unemployed_Percentage,2025,August"
jobs <- jobs %>% slice(-c(1:3))
#Need to make all columns same type. Let's go with doubles.
jobs %>% mutate(across('Civilian_Labor_Force,2024,July':'Unemployed_Percentage,2025,August', as.double))
## Warning: There were 6 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `across(...)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 5 remaining warnings.
## # A tibble: 445 × 14
## State Metro Civilian_Labor_Force…¹ Civilian_Labor_Force…²
## <chr> <chr> <dbl> <dbl>
## 1 Alabama "" NA 2377420
## 2 Alabama "Anniston-Oxford" NA 50648
## 3 Alabama "Auburn-Opelika" NA 98071
## 4 Alabama "Birmingham" NA 578403
## 5 Alabama "Daphne-Fairhope-Foley" NA 118612
## 6 Alabama "Decatur" NA 77558
## 7 Alabama "Dothan" NA 68570
## 8 Alabama "Florence-Muscle Shoal… NA 70608
## 9 Alabama "Gadsden" NA 42829
## 10 Alabama "Huntsville" NA 282660
## # ℹ 435 more rows
## # ℹ abbreviated names: ¹`Civilian_Labor_Force,2024,July`,
## # ²`Civilian_Labor_Force,2025,July`
## # ℹ 10 more variables: `Civilian_Labor_Force,2024,August` <dbl>,
## # `Civilian_Labor_Force,2025,August` <dbl>,
## # `Unemployed_Number,2024,July` <dbl>, `Unemployed_Number,2025,July` <dbl>,
## # `Unemployed_Number,2024,August` <dbl>, …
#Shows that character columns with commas in it won't turn into doubles nicely
#Remove commas and do pivot
jobs <- jobs %>% mutate(across(where(is.character), ~str_remove_all(., ",")))
jobs_long <- jobs %>% mutate(across('Civilian_Labor_Force,2024,July':'Unemployed_Percentage,2025,August', as.double)) %>% pivot_longer(cols = 'Civilian_Labor_Force,2024,July':'Unemployed_Percentage,2025,August',
names_to = 'Type',
values_to = "Amount")
jobs_long <- jobs_long %>% separate(Type, into = c("Type","Year","Month"), sep = ",")
knitr::kable(head(jobs_long, n=20))
| State | Metro | Type | Year | Month | Amount |
|---|---|---|---|---|---|
| Alabama | Civilian_Labor_Force | 2024 | July | 2362412.0 | |
| Alabama | Civilian_Labor_Force | 2025 | July | 2377420.0 | |
| Alabama | Civilian_Labor_Force | 2024 | August | 2359738.0 | |
| Alabama | Civilian_Labor_Force | 2025 | August | 2360249.0 | |
| Alabama | Unemployed_Number | 2024 | July | 81384.0 | |
| Alabama | Unemployed_Number | 2025 | July | 69419.0 | |
| Alabama | Unemployed_Number | 2024 | August | 82758.0 | |
| Alabama | Unemployed_Number | 2025 | August | 63784.0 | |
| Alabama | Unemployed_Percentage | 2024 | July | 3.4 | |
| Alabama | Unemployed_Percentage | 2025 | July | 2.9 | |
| Alabama | Unemployed_Percentage | 2024 | August | 3.5 | |
| Alabama | Unemployed_Percentage | 2025 | August | 2.7 | |
| Alabama | Anniston-Oxford | Civilian_Labor_Force | 2024 | July | 49862.0 |
| Alabama | Anniston-Oxford | Civilian_Labor_Force | 2025 | July | 50648.0 |
| Alabama | Anniston-Oxford | Civilian_Labor_Force | 2024 | August | 50053.0 |
| Alabama | Anniston-Oxford | Civilian_Labor_Force | 2025 | August | 50313.0 |
| Alabama | Anniston-Oxford | Unemployed_Number | 2024 | July | 1822.0 |
| Alabama | Anniston-Oxford | Unemployed_Number | 2025 | July | 1673.0 |
| Alabama | Anniston-Oxford | Unemployed_Number | 2024 | August | 1899.0 |
| Alabama | Anniston-Oxford | Unemployed_Number | 2025 | August | 1501.0 |
To get most recent let’s do analysis for August 2025.
Filter for August 2025, the two types in question, and that metro isn’t blank (those are overall state values).
I’m interpreting “difference” as magnitude of relative difference between the two numbers. Analysis will be for percentage - formula: Unemployed/Civilian Labor Force. The larger the number, the more relative unemployed people there are.
jobs_aug24 <- jobs_long %>% filter(Year == "2025", Month == "August", Type == "Unemployed_Number" | Type == "Civilian_Labor_Force", Metro != "")
#jobs_aug24 %>% group_by(Metro) %>% mutate(Unemployment_Rate = Type[Unemployed_Number]/Type[Civilian_Labor_Force])
#Realizing this is kind of dumb because would need to filter for both types and save as separate data frames then left join in order to be able to call each column name. This formula is the unemployment rate, which is already calculated by the bureau. Going to just reframe using that calculation.
jobs_aug24 <- jobs_long %>% filter(Year == "2025", Month == "August", Type == "Unemployed_Percentage", Metro != "")
jobs_aug24 <- jobs_aug24 %>% group_by(State) %>% mutate(Highest_Unemployment_Rate = max(Amount)) %>% filter(Amount==Highest_Unemployment_Rate) %>% arrange(desc(Amount))
knitr::kable(jobs_aug24)
| State | Metro | Type | Year | Month | Amount | Highest_Unemployment_Rate |
|---|---|---|---|---|---|---|
| California | El Centro | Unemployed_Percentage | 2025 | August | 21.5 | 21.5 |
| Arizona | Yuma | Unemployed_Percentage | 2025 | August | 18.2 | 18.2 |
| Puerto Rico | Mayaguez | Unemployed_Percentage | 2025 | August | 10.1 | 10.1 |
| Texas | Eagle Pass | Unemployed_Percentage | 2025 | August | 8.9 | 8.9 |
| New Jersey | Vineland | Unemployed_Percentage | 2025 | August | 8.5 | 8.5 |
| Oregon | Grants Pass | Unemployed_Percentage | 2025 | August | 7.8 | 7.8 |
| Florida | Wildwood-The Villages | Unemployed_Percentage | 2025 | August | 7.6 | 7.6 |
| Michigan | Saginaw | Unemployed_Percentage | 2025 | August | 6.9 | 6.9 |
| Illinois | Decatur | Unemployed_Percentage | 2025 | August | 6.2 | 6.2 |
| Indiana | Kokomo | Unemployed_Percentage | 2025 | August | 6.2 | 6.2 |
| Massachusetts | Springfield | Unemployed_Percentage | 2025 | August | 6.0 | 6.0 |
| Ohio | Weirton-Steubenville(2) | Unemployed_Percentage | 2025 | August | 6.0 | 6.0 |
| Delaware | Dover | Unemployed_Percentage | 2025 | August | 5.9 | 5.9 |
| South Carolina | Sumter | Unemployed_Percentage | 2025 | August | 5.9 | 5.9 |
| Pennsylvania | Johnstown | Unemployed_Percentage | 2025 | August | 5.8 | 5.8 |
| Virginia | Blacksburg-Christiansburg-Radford | Unemployed_Percentage | 2025 | August | 5.7 | 5.7 |
| Nevada | Las Vegas-Henderson-North Las Vegas | Unemployed_Percentage | 2025 | August | 5.6 | 5.6 |
| North Carolina | Rocky Mount | Unemployed_Percentage | 2025 | August | 5.5 | 5.5 |
| Washington | Yakima | Unemployed_Percentage | 2025 | August | 5.5 | 5.5 |
| Iowa | Cedar Rapids | Unemployed_Percentage | 2025 | August | 5.4 | 5.4 |
| Colorado | Pueblo | Unemployed_Percentage | 2025 | August | 5.2 | 5.2 |
| New York | New York-Newark-Jersey City | Unemployed_Percentage | 2025 | August | 5.2 | 5.2 |
| West Virginia | Wheeling | Unemployed_Percentage | 2025 | August | 5.0 | 5.0 |
| Louisiana | Hammond | Unemployed_Percentage | 2025 | August | 4.9 | 4.9 |
| Minnesota | Duluth | Unemployed_Percentage | 2025 | August | 4.8 | 4.8 |
| New Mexico | Las Cruces | Unemployed_Percentage | 2025 | August | 4.8 | 4.8 |
| Rhode Island | Providence-Warwick | Unemployed_Percentage | 2025 | August | 4.8 | 4.8 |
| Kentucky | Elizabethtown | Unemployed_Percentage | 2025 | August | 4.6 | 4.6 |
| Tennessee | Memphis | Unemployed_Percentage | 2025 | August | 4.6 | 4.6 |
| Georgia | Augusta-Richmond County | Unemployed_Percentage | 2025 | August | 4.5 | 4.5 |
| Kansas | Wichita | Unemployed_Percentage | 2025 | August | 4.5 | 4.5 |
| Missouri | St. Joseph | Unemployed_Percentage | 2025 | August | 4.5 | 4.5 |
| Missouri | St. Louis(1) | Unemployed_Percentage | 2025 | August | 4.5 | 4.5 |
| Maryland | Salisbury | Unemployed_Percentage | 2025 | August | 4.4 | 4.4 |
| Connecticut | Waterbury-Shelton | Unemployed_Percentage | 2025 | August | 4.3 | 4.3 |
| District of Columbia | Washington-Arlington-Alexandria | Unemployed_Percentage | 2025 | August | 4.3 | 4.3 |
| Arkansas | Hot Springs | Unemployed_Percentage | 2025 | August | 4.2 | 4.2 |
| Idaho | Coeur d’Alene | Unemployed_Percentage | 2025 | August | 4.2 | 4.2 |
| Idaho | Pocatello | Unemployed_Percentage | 2025 | August | 4.2 | 4.2 |
| Mississippi | Hattiesburg | Unemployed_Percentage | 2025 | August | 4.0 | 4.0 |
| Utah | Provo-Orem-Lehi | Unemployed_Percentage | 2025 | August | 3.9 | 3.9 |
| Wisconsin | Racine-Mount Pleasant | Unemployed_Percentage | 2025 | August | 3.9 | 3.9 |
| Oklahoma | Lawton | Unemployed_Percentage | 2025 | August | 3.8 | 3.8 |
| Alaska | Anchorage | Unemployed_Percentage | 2025 | August | 3.5 | 3.5 |
| Alabama | Mobile | Unemployed_Percentage | 2025 | August | 3.3 | 3.3 |
| Nebraska | Omaha | Unemployed_Percentage | 2025 | August | 3.2 | 3.2 |
| New Hampshire | Manchester-Nashua | Unemployed_Percentage | 2025 | August | 3.2 | 3.2 |
| Maine | Bangor | Unemployed_Percentage | 2025 | August | 3.1 | 3.1 |
| Maine | Lewiston-Auburn | Unemployed_Percentage | 2025 | August | 3.1 | 3.1 |
| Montana | Billings | Unemployed_Percentage | 2025 | August | 3.0 | 3.0 |
| Montana | Great Falls | Unemployed_Percentage | 2025 | August | 3.0 | 3.0 |
| Wyoming | Cheyenne | Unemployed_Percentage | 2025 | August | 2.9 | 2.9 |
| Hawaii | Kahului-Wailuku | Unemployed_Percentage | 2025 | August | 2.8 | 2.8 |
| North Dakota | Grand Forks | Unemployed_Percentage | 2025 | August | 2.8 | 2.8 |
| Vermont | Burlington-South Burlington | Unemployed_Percentage | 2025 | August | 2.5 | 2.5 |
| South Dakota | Sioux Falls | Unemployed_Percentage | 2025 | August | 2.2 | 2.2 |
4 states had ties for highest unemployment rate, which is why there’s 56 rows instead of 52.
Filter for whole states (no metros) for August 2025, then graph.
jobs_states <- jobs_long %>% filter(Year == "2025", Month == "August", Type == "Unemployed_Percentage", Metro == "")
ggplot(data = jobs_states, aes(x=reorder(State,Amount), y=Amount, fill=State)) +
geom_bar(stat="identity") +
labs(title = "Percent Unemployment by State", x = "State", y = "Percent Unemployment") +
coord_flip()+
theme_light()