Project 2 - Jacob (me) - Labor Statistics

Import wide text file

I chose a dataset from the Bureau of Labor Statistics. I copied the webpage table and pasted it into Excel to make it a csv.

Read in the data that is in csv file.

jobs <- read_csv("https://raw.githubusercontent.com/jacshap/Data607/refs/heads/main/Jacob%20Shapiro%20Labor%20Statistics%20Data.csv", col_names = TRUE, show_col_types = FALSE)

## New names:
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`

names(jobs)[1] <- "State_and_area"
names(jobs)[2] <- "Civilian_Labor_Force"

knitr::kable(head(jobs,n=20))

State_and_area	Civilian_Labor_Force	…3	…4	…5	Unemployed	…7	…8	…9	…10	…11	…12	…13
NA	July	NA	August	NA	Number	NA	NA	NA	Percent of labor force	NA	NA	NA
NA	2024	2025	2024	2025(p)	July	NA	August	NA	July	NA	August	NA
NA	NA	NA	NA	NA	2024	2025	2024	2025(p)	2024	2025.0	2024	2025(p)
Alabama	2,362,412	2377420	2,359,738	2,360,249	81,384	69419	82,758	63,784	3.4	2.9	3.5	2.7
Anniston-Oxford	49,862	50648	50,053	50,313	1,822	1673	1,899	1,501	3.7	3.3	3.8	3
Auburn-Opelika	96,216	98071	96,900	97,406	3,568	3006	3,436	2,496	3.7	3.1	3.5	2.6
Birmingham	577,261	578403	575,424	574,027	18,357	15570	18,971	14,568	3.2	2.7	3.3	2.5
Daphne-Fairhope-Foley	119,577	118612	118,098	117,059	3,490	2865	3,686	2,742	2.9	2.4	3.1	2.3
Decatur	76,452	77558	76,288	76,879	2,251	1865	2,319	1,723	2.9	2.4	3	2.2
Dothan	68,627	68570	68,313	67,924	2,287	1905	2,399	1,790	3.3	2.8	3.5	2.6
Florence-Muscle Shoals	69,398	70608	69,343	70,019	2,687	2325	2,591	2,119	3.9	3.3	3.7	3
Gadsden	42,402	42829	42,323	42,432	1,544	1298	1,590	1,160	3.6	3.0	3.8	2.7
Huntsville	277,426	282660	278,049	281,661	7,976	6807	8,141	6,262	2.9	2.4	2.9	2.2
Mobile	187,183	189179	187,364	188,356	7,379	6470	7,636	6,299	3.9	3.4	4.1	3.3
Montgomery	179,531	181500	179,537	180,198	6,113	5213	6,233	4,842	3.4	2.9	3.5	2.7
Tuscaloosa	126,027	128315	127,140	127,591	5,066	4631	4,894	3,906	4	3.6	3.8	3.1
Alaska	368,751	373580	365,359	371,377	16,237	16728	14,861	14,448	4.4	4.5	4.1	3.9
Anchorage	199,074	201954	199,228	201,580	8,300	8318	7,533	7,117	4.2	4.1	3.8	3.5
Fairbanks-College	45,696	46529	45,784	46,681	1,806	1763	1,609	1,467	4	3.8	3.5	3.1
Arizona	3,757,669	3816886	3,746,265	3,807,573	163,249	177712	156,530	181,566	4.3	4.7	4.2	4.8

Need to do a couple things. 1. Move cities/areas to separate column and have states in left column. 2. Fix first three rows to have appropriate titles. Will need to create columns for month and year.

Split states and areas

Load list of states and make separate column based on that. Then make columns for Metro and State, then fill blank spaces and reorder.

States <- c(state.name)
non_states <- c("District of Columbia", "Puerto Rico")
States <- append(States, non_states)
States <- sort(States)

jobs <- jobs %>% mutate(Metro = ifelse(State_and_area %in% States, "", State_and_area))
jobs <- jobs %>% mutate(State = ifelse(State_and_area %in% States, State_and_area, NA))

jobs <- jobs %>% fill(State)
#jobs %>% mutate(Metro = ifelse(Metro =="", State, Metro))

jobs <- jobs[,-1]
jobs <- jobs %>% relocate(Metro)
jobs <- jobs %>% relocate(State)

Column titles and months/years

Rename columns with “Label,Year,Month” so I can work with them.
Delete rows 1-3, and then pivot longer per section.
Split Year and Month to separate columns.

names(jobs)[3] <- "Civilian_Labor_Force,2024,July"
names(jobs)[4] <- "Civilian_Labor_Force,2025,July"
names(jobs)[5] <- "Civilian_Labor_Force,2024,August"
names(jobs)[6] <- "Civilian_Labor_Force,2025,August"
names(jobs)[7] <- "Unemployed_Number,2024,July"
names(jobs)[8] <- "Unemployed_Number,2025,July"
names(jobs)[9] <- "Unemployed_Number,2024,August"
names(jobs)[10] <- "Unemployed_Number,2025,August"
names(jobs)[11] <- "Unemployed_Percentage,2024,July"
names(jobs)[12] <- "Unemployed_Percentage,2025,July"
names(jobs)[13] <- "Unemployed_Percentage,2024,August"
names(jobs)[14] <- "Unemployed_Percentage,2025,August"

jobs <- jobs %>% slice(-c(1:3))

#Need to make all columns same type. Let's go with doubles.

jobs %>% mutate(across('Civilian_Labor_Force,2024,July':'Unemployed_Percentage,2025,August', as.double))

## Warning: There were 6 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `across(...)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 5 remaining warnings.

## # A tibble: 445 × 14
##    State   Metro                   Civilian_Labor_Force…¹ Civilian_Labor_Force…²
##    <chr>   <chr>                                    <dbl>                  <dbl>
##  1 Alabama ""                                          NA                2377420
##  2 Alabama "Anniston-Oxford"                           NA                  50648
##  3 Alabama "Auburn-Opelika"                            NA                  98071
##  4 Alabama "Birmingham"                                NA                 578403
##  5 Alabama "Daphne-Fairhope-Foley"                     NA                 118612
##  6 Alabama "Decatur"                                   NA                  77558
##  7 Alabama "Dothan"                                    NA                  68570
##  8 Alabama "Florence-Muscle Shoal…                     NA                  70608
##  9 Alabama "Gadsden"                                   NA                  42829
## 10 Alabama "Huntsville"                                NA                 282660
## # ℹ 435 more rows
## # ℹ abbreviated names: ¹`Civilian_Labor_Force,2024,July`,
## #   ²`Civilian_Labor_Force,2025,July`
## # ℹ 10 more variables: `Civilian_Labor_Force,2024,August` <dbl>,
## #   `Civilian_Labor_Force,2025,August` <dbl>,
## #   `Unemployed_Number,2024,July` <dbl>, `Unemployed_Number,2025,July` <dbl>,
## #   `Unemployed_Number,2024,August` <dbl>, …

#Shows that character columns with commas in it won't turn into doubles nicely

#Remove commas and do pivot
jobs <- jobs %>% mutate(across(where(is.character), ~str_remove_all(., ",")))

jobs_long <- jobs %>% mutate(across('Civilian_Labor_Force,2024,July':'Unemployed_Percentage,2025,August', as.double)) %>% pivot_longer(cols = 'Civilian_Labor_Force,2024,July':'Unemployed_Percentage,2025,August',
                      names_to = 'Type',
                      values_to = "Amount")
jobs_long <- jobs_long %>% separate(Type, into = c("Type","Year","Month"), sep = ",")

knitr::kable(head(jobs_long, n=20))

State	Metro	Type	Year	Month	Amount
Alabama		Civilian_Labor_Force	2024	July	2362412.0
Alabama		Civilian_Labor_Force	2025	July	2377420.0
Alabama		Civilian_Labor_Force	2024	August	2359738.0
Alabama		Civilian_Labor_Force	2025	August	2360249.0
Alabama		Unemployed_Number	2024	July	81384.0
Alabama		Unemployed_Number	2025	July	69419.0
Alabama		Unemployed_Number	2024	August	82758.0
Alabama		Unemployed_Number	2025	August	63784.0
Alabama		Unemployed_Percentage	2024	July	3.4
Alabama		Unemployed_Percentage	2025	July	2.9
Alabama		Unemployed_Percentage	2024	August	3.5
Alabama		Unemployed_Percentage	2025	August	2.7
Alabama	Anniston-Oxford	Civilian_Labor_Force	2024	July	49862.0
Alabama	Anniston-Oxford	Civilian_Labor_Force	2025	July	50648.0
Alabama	Anniston-Oxford	Civilian_Labor_Force	2024	August	50053.0
Alabama	Anniston-Oxford	Civilian_Labor_Force	2025	August	50313.0
Alabama	Anniston-Oxford	Unemployed_Number	2024	July	1822.0
Alabama	Anniston-Oxford	Unemployed_Number	2025	July	1673.0
Alabama	Anniston-Oxford	Unemployed_Number	2024	August	1899.0
Alabama	Anniston-Oxford	Unemployed_Number	2025	August	1501.0

Analysis

Which metro in each state has the highest difference between “Civilian labor force” and “Unemployed”

To get most recent let’s do analysis for August 2025.

Filter for August 2025, the two types in question, and that metro isn’t blank (those are overall state values).

I’m interpreting “difference” as magnitude of relative difference between the two numbers. Analysis will be for percentage - formula: Unemployed/Civilian Labor Force. The larger the number, the more relative unemployed people there are.

jobs_aug24 <- jobs_long %>% filter(Year == "2025", Month == "August", Type == "Unemployed_Number" | Type == "Civilian_Labor_Force", Metro != "")

#jobs_aug24 %>% group_by(Metro) %>% mutate(Unemployment_Rate = Type[Unemployed_Number]/Type[Civilian_Labor_Force])
#Realizing this is kind of dumb because would need to filter for both types and save as separate data frames then left join in order to be able to call each column name. This formula is the unemployment rate, which is already calculated by the bureau. Going to just reframe using that calculation.

jobs_aug24 <- jobs_long %>% filter(Year == "2025", Month == "August", Type == "Unemployed_Percentage", Metro != "")

jobs_aug24 <- jobs_aug24 %>% group_by(State) %>% mutate(Highest_Unemployment_Rate = max(Amount)) %>% filter(Amount==Highest_Unemployment_Rate) %>% arrange(desc(Amount))

knitr::kable(jobs_aug24)

State	Metro	Type	Year	Month	Amount	Highest_Unemployment_Rate
California	El Centro	Unemployed_Percentage	2025	August	21.5	21.5
Arizona	Yuma	Unemployed_Percentage	2025	August	18.2	18.2
Puerto Rico	Mayaguez	Unemployed_Percentage	2025	August	10.1	10.1
Texas	Eagle Pass	Unemployed_Percentage	2025	August	8.9	8.9
New Jersey	Vineland	Unemployed_Percentage	2025	August	8.5	8.5
Oregon	Grants Pass	Unemployed_Percentage	2025	August	7.8	7.8
Florida	Wildwood-The Villages	Unemployed_Percentage	2025	August	7.6	7.6
Michigan	Saginaw	Unemployed_Percentage	2025	August	6.9	6.9
Illinois	Decatur	Unemployed_Percentage	2025	August	6.2	6.2
Indiana	Kokomo	Unemployed_Percentage	2025	August	6.2	6.2
Massachusetts	Springfield	Unemployed_Percentage	2025	August	6.0	6.0
Ohio	Weirton-Steubenville(2)	Unemployed_Percentage	2025	August	6.0	6.0
Delaware	Dover	Unemployed_Percentage	2025	August	5.9	5.9
South Carolina	Sumter	Unemployed_Percentage	2025	August	5.9	5.9
Pennsylvania	Johnstown	Unemployed_Percentage	2025	August	5.8	5.8
Virginia	Blacksburg-Christiansburg-Radford	Unemployed_Percentage	2025	August	5.7	5.7
Nevada	Las Vegas-Henderson-North Las Vegas	Unemployed_Percentage	2025	August	5.6	5.6
North Carolina	Rocky Mount	Unemployed_Percentage	2025	August	5.5	5.5
Washington	Yakima	Unemployed_Percentage	2025	August	5.5	5.5
Iowa	Cedar Rapids	Unemployed_Percentage	2025	August	5.4	5.4
Colorado	Pueblo	Unemployed_Percentage	2025	August	5.2	5.2
New York	New York-Newark-Jersey City	Unemployed_Percentage	2025	August	5.2	5.2
West Virginia	Wheeling	Unemployed_Percentage	2025	August	5.0	5.0
Louisiana	Hammond	Unemployed_Percentage	2025	August	4.9	4.9
Minnesota	Duluth	Unemployed_Percentage	2025	August	4.8	4.8
New Mexico	Las Cruces	Unemployed_Percentage	2025	August	4.8	4.8
Rhode Island	Providence-Warwick	Unemployed_Percentage	2025	August	4.8	4.8
Kentucky	Elizabethtown	Unemployed_Percentage	2025	August	4.6	4.6
Tennessee	Memphis	Unemployed_Percentage	2025	August	4.6	4.6
Georgia	Augusta-Richmond County	Unemployed_Percentage	2025	August	4.5	4.5
Kansas	Wichita	Unemployed_Percentage	2025	August	4.5	4.5
Missouri	St. Joseph	Unemployed_Percentage	2025	August	4.5	4.5
Missouri	St. Louis(1)	Unemployed_Percentage	2025	August	4.5	4.5
Maryland	Salisbury	Unemployed_Percentage	2025	August	4.4	4.4
Connecticut	Waterbury-Shelton	Unemployed_Percentage	2025	August	4.3	4.3
District of Columbia	Washington-Arlington-Alexandria	Unemployed_Percentage	2025	August	4.3	4.3
Arkansas	Hot Springs	Unemployed_Percentage	2025	August	4.2	4.2
Idaho	Coeur d’Alene	Unemployed_Percentage	2025	August	4.2	4.2
Idaho	Pocatello	Unemployed_Percentage	2025	August	4.2	4.2
Mississippi	Hattiesburg	Unemployed_Percentage	2025	August	4.0	4.0
Utah	Provo-Orem-Lehi	Unemployed_Percentage	2025	August	3.9	3.9
Wisconsin	Racine-Mount Pleasant	Unemployed_Percentage	2025	August	3.9	3.9
Oklahoma	Lawton	Unemployed_Percentage	2025	August	3.8	3.8
Alaska	Anchorage	Unemployed_Percentage	2025	August	3.5	3.5
Alabama	Mobile	Unemployed_Percentage	2025	August	3.3	3.3
Nebraska	Omaha	Unemployed_Percentage	2025	August	3.2	3.2
New Hampshire	Manchester-Nashua	Unemployed_Percentage	2025	August	3.2	3.2
Maine	Bangor	Unemployed_Percentage	2025	August	3.1	3.1
Maine	Lewiston-Auburn	Unemployed_Percentage	2025	August	3.1	3.1
Montana	Billings	Unemployed_Percentage	2025	August	3.0	3.0
Montana	Great Falls	Unemployed_Percentage	2025	August	3.0	3.0
Wyoming	Cheyenne	Unemployed_Percentage	2025	August	2.9	2.9
Hawaii	Kahului-Wailuku	Unemployed_Percentage	2025	August	2.8	2.8
North Dakota	Grand Forks	Unemployed_Percentage	2025	August	2.8	2.8
Vermont	Burlington-South Burlington	Unemployed_Percentage	2025	August	2.5	2.5
South Dakota	Sioux Falls	Unemployed_Percentage	2025	August	2.2	2.2

4 states had ties for highest unemployment rate, which is why there’s 56 rows instead of 52.

Graph of percent unemployment per state

Filter for whole states (no metros) for August 2025, then graph.

jobs_states <- jobs_long %>% filter(Year == "2025", Month == "August", Type == "Unemployed_Percentage", Metro == "")

ggplot(data = jobs_states, aes(x=reorder(State,Amount), y=Amount, fill=State)) + 
  geom_bar(stat="identity") +
  labs(title = "Percent Unemployment by State", x = "State", y = "Percent Unemployment") + 
  coord_flip()+
  theme_light()