Import wide text file

I chose a dataset from the Bureau of Labor Statistics. I copied the webpage table and pasted it into Excel to make it a csv.

Read in the data that is in csv file.

jobs <- read_csv("https://raw.githubusercontent.com/jacshap/Data607/refs/heads/main/Jacob%20Shapiro%20Labor%20Statistics%20Data.csv", col_names = TRUE, show_col_types = FALSE) 
## New names:
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
names(jobs)[1] <- "State_and_area"
names(jobs)[2] <- "Civilian_Labor_Force"

knitr::kable(head(jobs,n=20))
State_and_area Civilian_Labor_Force …3 …4 …5 Unemployed …7 …8 …9 …10 …11 …12 …13
NA July NA August NA Number NA NA NA Percent of labor force NA NA NA
NA 2024 2025 2024 2025(p) July NA August NA July NA August NA
NA NA NA NA NA 2024 2025 2024 2025(p) 2024 2025.0 2024 2025(p)
Alabama 2,362,412 2377420 2,359,738 2,360,249 81,384 69419 82,758 63,784 3.4 2.9 3.5 2.7
Anniston-Oxford 49,862 50648 50,053 50,313 1,822 1673 1,899 1,501 3.7 3.3 3.8 3
Auburn-Opelika 96,216 98071 96,900 97,406 3,568 3006 3,436 2,496 3.7 3.1 3.5 2.6
Birmingham 577,261 578403 575,424 574,027 18,357 15570 18,971 14,568 3.2 2.7 3.3 2.5
Daphne-Fairhope-Foley 119,577 118612 118,098 117,059 3,490 2865 3,686 2,742 2.9 2.4 3.1 2.3
Decatur 76,452 77558 76,288 76,879 2,251 1865 2,319 1,723 2.9 2.4 3 2.2
Dothan 68,627 68570 68,313 67,924 2,287 1905 2,399 1,790 3.3 2.8 3.5 2.6
Florence-Muscle Shoals 69,398 70608 69,343 70,019 2,687 2325 2,591 2,119 3.9 3.3 3.7 3
Gadsden 42,402 42829 42,323 42,432 1,544 1298 1,590 1,160 3.6 3.0 3.8 2.7
Huntsville 277,426 282660 278,049 281,661 7,976 6807 8,141 6,262 2.9 2.4 2.9 2.2
Mobile 187,183 189179 187,364 188,356 7,379 6470 7,636 6,299 3.9 3.4 4.1 3.3
Montgomery 179,531 181500 179,537 180,198 6,113 5213 6,233 4,842 3.4 2.9 3.5 2.7
Tuscaloosa 126,027 128315 127,140 127,591 5,066 4631 4,894 3,906 4 3.6 3.8 3.1
Alaska 368,751 373580 365,359 371,377 16,237 16728 14,861 14,448 4.4 4.5 4.1 3.9
Anchorage 199,074 201954 199,228 201,580 8,300 8318 7,533 7,117 4.2 4.1 3.8 3.5
Fairbanks-College 45,696 46529 45,784 46,681 1,806 1763 1,609 1,467 4 3.8 3.5 3.1
Arizona 3,757,669 3816886 3,746,265 3,807,573 163,249 177712 156,530 181,566 4.3 4.7 4.2 4.8

Need to do a couple things. 1. Move cities/areas to separate column and have states in left column. 2. Fix first three rows to have appropriate titles. Will need to create columns for month and year.

Split states and areas

Load list of states and make separate column based on that. Then make columns for Metro and State, then fill blank spaces and reorder.

States <- c(state.name)
non_states <- c("District of Columbia", "Puerto Rico")
States <- append(States, non_states)
States <- sort(States)

jobs <- jobs %>% mutate(Metro = ifelse(State_and_area %in% States, "", State_and_area))
jobs <- jobs %>% mutate(State = ifelse(State_and_area %in% States, State_and_area, NA))

jobs <- jobs %>% fill(State)
#jobs %>% mutate(Metro = ifelse(Metro =="", State, Metro))

jobs <- jobs[,-1]
jobs <- jobs %>% relocate(Metro)
jobs <- jobs %>% relocate(State)

Column titles and months/years

  1. Rename columns with “Label,Year,Month” so I can work with them.
  2. Delete rows 1-3, and then pivot longer per section.
  3. Split Year and Month to separate columns.
names(jobs)[3] <- "Civilian_Labor_Force,2024,July"
names(jobs)[4] <- "Civilian_Labor_Force,2025,July"
names(jobs)[5] <- "Civilian_Labor_Force,2024,August"
names(jobs)[6] <- "Civilian_Labor_Force,2025,August"
names(jobs)[7] <- "Unemployed_Number,2024,July"
names(jobs)[8] <- "Unemployed_Number,2025,July"
names(jobs)[9] <- "Unemployed_Number,2024,August"
names(jobs)[10] <- "Unemployed_Number,2025,August"
names(jobs)[11] <- "Unemployed_Percentage,2024,July"
names(jobs)[12] <- "Unemployed_Percentage,2025,July"
names(jobs)[13] <- "Unemployed_Percentage,2024,August"
names(jobs)[14] <- "Unemployed_Percentage,2025,August"

jobs <- jobs %>% slice(-c(1:3))

#Need to make all columns same type. Let's go with doubles.

jobs %>% mutate(across('Civilian_Labor_Force,2024,July':'Unemployed_Percentage,2025,August', as.double))
## Warning: There were 6 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `across(...)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 5 remaining warnings.
## # A tibble: 445 × 14
##    State   Metro                   Civilian_Labor_Force…¹ Civilian_Labor_Force…²
##    <chr>   <chr>                                    <dbl>                  <dbl>
##  1 Alabama ""                                          NA                2377420
##  2 Alabama "Anniston-Oxford"                           NA                  50648
##  3 Alabama "Auburn-Opelika"                            NA                  98071
##  4 Alabama "Birmingham"                                NA                 578403
##  5 Alabama "Daphne-Fairhope-Foley"                     NA                 118612
##  6 Alabama "Decatur"                                   NA                  77558
##  7 Alabama "Dothan"                                    NA                  68570
##  8 Alabama "Florence-Muscle Shoal…                     NA                  70608
##  9 Alabama "Gadsden"                                   NA                  42829
## 10 Alabama "Huntsville"                                NA                 282660
## # ℹ 435 more rows
## # ℹ abbreviated names: ¹​`Civilian_Labor_Force,2024,July`,
## #   ²​`Civilian_Labor_Force,2025,July`
## # ℹ 10 more variables: `Civilian_Labor_Force,2024,August` <dbl>,
## #   `Civilian_Labor_Force,2025,August` <dbl>,
## #   `Unemployed_Number,2024,July` <dbl>, `Unemployed_Number,2025,July` <dbl>,
## #   `Unemployed_Number,2024,August` <dbl>, …
#Shows that character columns with commas in it won't turn into doubles nicely

#Remove commas and do pivot
jobs <- jobs %>% mutate(across(where(is.character), ~str_remove_all(., ",")))

jobs_long <- jobs %>% mutate(across('Civilian_Labor_Force,2024,July':'Unemployed_Percentage,2025,August', as.double)) %>% pivot_longer(cols = 'Civilian_Labor_Force,2024,July':'Unemployed_Percentage,2025,August',
                      names_to = 'Type',
                      values_to = "Amount")
jobs_long <- jobs_long %>% separate(Type, into = c("Type","Year","Month"), sep = ",")

knitr::kable(head(jobs_long, n=20))
State Metro Type Year Month Amount
Alabama Civilian_Labor_Force 2024 July 2362412.0
Alabama Civilian_Labor_Force 2025 July 2377420.0
Alabama Civilian_Labor_Force 2024 August 2359738.0
Alabama Civilian_Labor_Force 2025 August 2360249.0
Alabama Unemployed_Number 2024 July 81384.0
Alabama Unemployed_Number 2025 July 69419.0
Alabama Unemployed_Number 2024 August 82758.0
Alabama Unemployed_Number 2025 August 63784.0
Alabama Unemployed_Percentage 2024 July 3.4
Alabama Unemployed_Percentage 2025 July 2.9
Alabama Unemployed_Percentage 2024 August 3.5
Alabama Unemployed_Percentage 2025 August 2.7
Alabama Anniston-Oxford Civilian_Labor_Force 2024 July 49862.0
Alabama Anniston-Oxford Civilian_Labor_Force 2025 July 50648.0
Alabama Anniston-Oxford Civilian_Labor_Force 2024 August 50053.0
Alabama Anniston-Oxford Civilian_Labor_Force 2025 August 50313.0
Alabama Anniston-Oxford Unemployed_Number 2024 July 1822.0
Alabama Anniston-Oxford Unemployed_Number 2025 July 1673.0
Alabama Anniston-Oxford Unemployed_Number 2024 August 1899.0
Alabama Anniston-Oxford Unemployed_Number 2025 August 1501.0

Analysis

Which metro in each state has the highest difference between “Civilian labor force” and “Unemployed”

To get most recent let’s do analysis for August 2025.

Filter for August 2025, the two types in question, and that metro isn’t blank (those are overall state values).

I’m interpreting “difference” as magnitude of relative difference between the two numbers. Analysis will be for percentage - formula: Unemployed/Civilian Labor Force. The larger the number, the more relative unemployed people there are.

jobs_aug24 <- jobs_long %>% filter(Year == "2025", Month == "August", Type == "Unemployed_Number" | Type == "Civilian_Labor_Force", Metro != "")

#jobs_aug24 %>% group_by(Metro) %>% mutate(Unemployment_Rate = Type[Unemployed_Number]/Type[Civilian_Labor_Force])
#Realizing this is kind of dumb because would need to filter for both types and save as separate data frames then left join in order to be able to call each column name. This formula is the unemployment rate, which is already calculated by the bureau. Going to just reframe using that calculation.

jobs_aug24 <- jobs_long %>% filter(Year == "2025", Month == "August", Type == "Unemployed_Percentage", Metro != "")

jobs_aug24 <- jobs_aug24 %>% group_by(State) %>% mutate(Highest_Unemployment_Rate = max(Amount)) %>% filter(Amount==Highest_Unemployment_Rate) %>% arrange(desc(Amount))

knitr::kable(jobs_aug24)
State Metro Type Year Month Amount Highest_Unemployment_Rate
California El Centro Unemployed_Percentage 2025 August 21.5 21.5
Arizona Yuma Unemployed_Percentage 2025 August 18.2 18.2
Puerto Rico Mayaguez Unemployed_Percentage 2025 August 10.1 10.1
Texas Eagle Pass Unemployed_Percentage 2025 August 8.9 8.9
New Jersey Vineland Unemployed_Percentage 2025 August 8.5 8.5
Oregon Grants Pass Unemployed_Percentage 2025 August 7.8 7.8
Florida Wildwood-The Villages Unemployed_Percentage 2025 August 7.6 7.6
Michigan Saginaw Unemployed_Percentage 2025 August 6.9 6.9
Illinois Decatur Unemployed_Percentage 2025 August 6.2 6.2
Indiana Kokomo Unemployed_Percentage 2025 August 6.2 6.2
Massachusetts Springfield Unemployed_Percentage 2025 August 6.0 6.0
Ohio Weirton-Steubenville(2) Unemployed_Percentage 2025 August 6.0 6.0
Delaware Dover Unemployed_Percentage 2025 August 5.9 5.9
South Carolina Sumter Unemployed_Percentage 2025 August 5.9 5.9
Pennsylvania Johnstown Unemployed_Percentage 2025 August 5.8 5.8
Virginia Blacksburg-Christiansburg-Radford Unemployed_Percentage 2025 August 5.7 5.7
Nevada Las Vegas-Henderson-North Las Vegas Unemployed_Percentage 2025 August 5.6 5.6
North Carolina Rocky Mount Unemployed_Percentage 2025 August 5.5 5.5
Washington Yakima Unemployed_Percentage 2025 August 5.5 5.5
Iowa Cedar Rapids Unemployed_Percentage 2025 August 5.4 5.4
Colorado Pueblo Unemployed_Percentage 2025 August 5.2 5.2
New York New York-Newark-Jersey City Unemployed_Percentage 2025 August 5.2 5.2
West Virginia Wheeling Unemployed_Percentage 2025 August 5.0 5.0
Louisiana Hammond Unemployed_Percentage 2025 August 4.9 4.9
Minnesota Duluth Unemployed_Percentage 2025 August 4.8 4.8
New Mexico Las Cruces Unemployed_Percentage 2025 August 4.8 4.8
Rhode Island Providence-Warwick Unemployed_Percentage 2025 August 4.8 4.8
Kentucky Elizabethtown Unemployed_Percentage 2025 August 4.6 4.6
Tennessee Memphis Unemployed_Percentage 2025 August 4.6 4.6
Georgia Augusta-Richmond County Unemployed_Percentage 2025 August 4.5 4.5
Kansas Wichita Unemployed_Percentage 2025 August 4.5 4.5
Missouri St. Joseph Unemployed_Percentage 2025 August 4.5 4.5
Missouri St. Louis(1) Unemployed_Percentage 2025 August 4.5 4.5
Maryland Salisbury Unemployed_Percentage 2025 August 4.4 4.4
Connecticut Waterbury-Shelton Unemployed_Percentage 2025 August 4.3 4.3
District of Columbia Washington-Arlington-Alexandria Unemployed_Percentage 2025 August 4.3 4.3
Arkansas Hot Springs Unemployed_Percentage 2025 August 4.2 4.2
Idaho Coeur d’Alene Unemployed_Percentage 2025 August 4.2 4.2
Idaho Pocatello Unemployed_Percentage 2025 August 4.2 4.2
Mississippi Hattiesburg Unemployed_Percentage 2025 August 4.0 4.0
Utah Provo-Orem-Lehi Unemployed_Percentage 2025 August 3.9 3.9
Wisconsin Racine-Mount Pleasant Unemployed_Percentage 2025 August 3.9 3.9
Oklahoma Lawton Unemployed_Percentage 2025 August 3.8 3.8
Alaska Anchorage Unemployed_Percentage 2025 August 3.5 3.5
Alabama Mobile Unemployed_Percentage 2025 August 3.3 3.3
Nebraska Omaha Unemployed_Percentage 2025 August 3.2 3.2
New Hampshire Manchester-Nashua Unemployed_Percentage 2025 August 3.2 3.2
Maine Bangor Unemployed_Percentage 2025 August 3.1 3.1
Maine Lewiston-Auburn Unemployed_Percentage 2025 August 3.1 3.1
Montana Billings Unemployed_Percentage 2025 August 3.0 3.0
Montana Great Falls Unemployed_Percentage 2025 August 3.0 3.0
Wyoming Cheyenne Unemployed_Percentage 2025 August 2.9 2.9
Hawaii Kahului-Wailuku Unemployed_Percentage 2025 August 2.8 2.8
North Dakota Grand Forks Unemployed_Percentage 2025 August 2.8 2.8
Vermont Burlington-South Burlington Unemployed_Percentage 2025 August 2.5 2.5
South Dakota Sioux Falls Unemployed_Percentage 2025 August 2.2 2.2

4 states had ties for highest unemployment rate, which is why there’s 56 rows instead of 52.

Graph of percent unemployment per state

Filter for whole states (no metros) for August 2025, then graph.

jobs_states <- jobs_long %>% filter(Year == "2025", Month == "August", Type == "Unemployed_Percentage", Metro == "")

ggplot(data = jobs_states, aes(x=reorder(State,Amount), y=Amount, fill=State)) + 
  geom_bar(stat="identity") +
  labs(title = "Percent Unemployment by State", x = "State", y = "Percent Unemployment") + 
  coord_flip()+
  theme_light()