The accompanying Excel file provides data on the current distribution of Infrastructure Investment and Jobs Act (IIJA) funding across U.S. States and Territories.
This analysis aims to explore the following questions through data visualizations:
U.S. territories such as Puerto Rico, Guam, and American Samoa are excluded from this analysis due to their unique political status. These territories lack voting representation in Congress and do not participate in presidential elections, limiting their influence on federal policy and funding decisions.
This exclusion is grounded in the Insular Cases—early 20th-century Supreme Court rulings that determined the U.S. Constitution does not fully apply to territories. As legal scholar Christina Duffy Burnett notes:
“The Insular Cases installed a doctrine of territorial deannexation… shaping life for more than four million U.S. citizens in Puerto Rico and the other remaining nonstate possessions.”
As a result, territories are often excluded from federal funding formulas, making them unsuitable for inclusion in an analysis focused on population-based or politically influenced funding equity.
Installed libraries:
suppressPackageStartupMessages({
library(dplyr)
library(readr)
library(readxl)
library(tidyverse)
library(ggplot2)
library(scales)
library(tidyr)
})
Load the IIJA funding data into a dataframe and display its contents:
iija_data <- read_excel("IIJA FUNDING AS OF MARCH 2023.xlsx")
print(iija_data)
## # A tibble: 57 Ă— 2
## `State, Teritory or Tribal Nation` `Total (Billions)`
## <chr> <dbl>
## 1 ALABAMA 3
## 2 ALASKA 3.7
## 3 AMERICAN SAMOA 0.0686
## 4 ARIZONA 3.5
## 5 ARKANSAS 2.8
## 6 CALIFORNIA 18.4
## 7 COLORADO 3.2
## 8 CONNECTICUT 2.5
## 9 DELEWARE 0.792
## 10 DISTRICT OF COLUMBIA 1.1
## # ℹ 47 more rows
Clean the IIJA funding data by renaming columns and standardizing state names:
# Clean IIJA funding data
iija_data <- read_excel("IIJA FUNDING AS OF MARCH 2023.xlsx")
colnames(iija_data) <- c("State", "Total")
iija_data$State <- trimws(iija_data$State)
iija_data$State <- tools::toTitleCase(tolower(iija_data$State))
print(iija_data)
## # A tibble: 57 Ă— 2
## State Total
## <chr> <dbl>
## 1 Alabama 3
## 2 Alaska 3.7
## 3 American Samoa 0.0686
## 4 Arizona 3.5
## 5 Arkansas 2.8
## 6 California 18.4
## 7 Colorado 3.2
## 8 Connecticut 2.5
## 9 Deleware 0.792
## 10 District of Columbia 1.1
## # ℹ 47 more rows
Load and clean the 2023 population data from the U.S. Census Bureau. This includes filtering for state-level entries, selecting relevant columns, renaming them, and formatting state names:
# Load and clean population data
population_data <- read_csv("population-2023.csv", show_col_types = FALSE)
## New names:
## • `` -> `...459`
print(population_data)
## # A tibble: 54 Ă— 459
## GEO_ID NAME S0101_C01_001E S0101_C01_001M S0101_C01_002E S0101_C01_002M
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Geography Geog… Estimate!!Tot… Margin of Err… Estimate!!Tot… Margin of Err…
## 2 0100000US Unit… 334914896 ***** 18333697 19410
## 3 0400000US01 Alab… 5108468 ***** 288019 3288
## 4 0400000US02 Alas… 733406 ***** 45211 1366
## 5 0400000US04 Ariz… 7431344 ***** 391142 1573
## 6 0400000US05 Arka… 3067732 ***** 176908 2503
## 7 0400000US06 Cali… 38965193 ***** 2086820 3832
## 8 0400000US08 Colo… 5877610 ***** 303775 2170
## 9 0400000US09 Conn… 3617176 ***** 180561 829
## 10 0400000US10 Dela… 1031890 ***** 54398 312
## # ℹ 44 more rows
## # ℹ 453 more variables: S0101_C01_003E <chr>, S0101_C01_003M <chr>,
## # S0101_C01_004E <chr>, S0101_C01_004M <chr>, S0101_C01_005E <chr>,
## # S0101_C01_005M <chr>, S0101_C01_006E <chr>, S0101_C01_006M <chr>,
## # S0101_C01_007E <chr>, S0101_C01_007M <chr>, S0101_C01_008E <chr>,
## # S0101_C01_008M <chr>, S0101_C01_009E <chr>, S0101_C01_009M <chr>,
## # S0101_C01_010E <chr>, S0101_C01_010M <chr>, S0101_C01_011E <chr>, …
clean_population <- population_data %>%
filter(grepl("^0400000US", GEO_ID)) %>%
select(NAME, S0101_C01_001E) %>%
rename(State = NAME, Population_2023 = S0101_C01_001E) %>%
mutate(State = trimws(State),
State = tools::toTitleCase(tolower(State)))
print(clean_population)
## # A tibble: 52 Ă— 2
## State Population_2023
## <chr> <chr>
## 1 Alabama 5108468
## 2 Alaska 733406
## 3 Arizona 7431344
## 4 Arkansas 3067732
## 5 California 38965193
## 6 Colorado 5877610
## 7 Connecticut 3617176
## 8 Delaware 1031890
## 9 District of Columbia 678972
## 10 Florida 22610726
## # ℹ 42 more rows
Merge the cleaned IIJA funding and population datasets, then compute per capita funding for each state:
# Merge and calculate per capita funding
merged_data <- left_join(iija_data, clean_population, by = "State") %>%
mutate(Funding_Per_Capita = (Total * 1e9) / as.numeric(Population_2023)) %>%
filter(!is.na(Funding_Per_Capita)) # Remove rows with NA values
print(merged_data)
## # A tibble: 51 Ă— 4
## State Total Population_2023 Funding_Per_Capita
## <chr> <dbl> <chr> <dbl>
## 1 Alabama 3 5108468 587.
## 2 Alaska 3.7 733406 5045.
## 3 Arizona 3.5 7431344 471.
## 4 Arkansas 2.8 3067732 913.
## 5 California 18.4 38965193 472.
## 6 Colorado 3.2 5877610 544.
## 7 Connecticut 2.5 3617176 691.
## 8 District of Columbia 1.1 678972 1620.
## 9 Florida 8.2 22610726 363.
## 10 Georgia 5 11029227 453.
## # ℹ 41 more rows
Incorporate political alignment data from the MIT Election Data and Science Lab by identifying the winning party in each state during the 2020 U.S. presidential election, then merge it with the existing dataset
election_data <- read_csv("1976-2020-president.csv", show_col_types = FALSE)
print(election_data)
## # A tibble: 4,287 Ă— 15
## year state state_po state_fips state_cen state_ic office candidate
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 1976 ALABAMA AL 1 63 41 US PRESIDENT "CARTER, J…
## 2 1976 ALABAMA AL 1 63 41 US PRESIDENT "FORD, GER…
## 3 1976 ALABAMA AL 1 63 41 US PRESIDENT "MADDOX, L…
## 4 1976 ALABAMA AL 1 63 41 US PRESIDENT "BUBAR, BE…
## 5 1976 ALABAMA AL 1 63 41 US PRESIDENT "HALL, GUS"
## 6 1976 ALABAMA AL 1 63 41 US PRESIDENT "MACBRIDE,…
## 7 1976 ALABAMA AL 1 63 41 US PRESIDENT <NA>
## 8 1976 ALASKA AK 2 94 81 US PRESIDENT "FORD, GER…
## 9 1976 ALASKA AK 2 94 81 US PRESIDENT "CARTER, J…
## 10 1976 ALASKA AK 2 94 81 US PRESIDENT "MACBRIDE,…
## # ℹ 4,277 more rows
## # ℹ 7 more variables: party_detailed <chr>, writein <lgl>,
## # candidatevotes <dbl>, totalvotes <dbl>, version <dbl>, notes <lgl>,
## # party_simplified <chr>
# Filter for 2020 presidential election
election_2020 <- election_data %>%
filter(year == 2020, office == "US PRESIDENT")
print(election_2020)
## # A tibble: 547 Ă— 15
## year state state_po state_fips state_cen state_ic office candidate
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 2020 ALABAMA AL 1 63 41 US PRESIDENT "BIDEN, JO…
## 2 2020 ALABAMA AL 1 63 41 US PRESIDENT "TRUMP, DO…
## 3 2020 ALABAMA AL 1 63 41 US PRESIDENT "JORGENSEN…
## 4 2020 ALABAMA AL 1 63 41 US PRESIDENT <NA>
## 5 2020 ALASKA AK 2 94 81 US PRESIDENT "BIDEN, JO…
## 6 2020 ALASKA AK 2 94 81 US PRESIDENT "TRUMP, DO…
## 7 2020 ALASKA AK 2 94 81 US PRESIDENT "JORGENSEN…
## 8 2020 ALASKA AK 2 94 81 US PRESIDENT "DE LA FUE…
## 9 2020 ALASKA AK 2 94 81 US PRESIDENT "PIERCE, B…
## 10 2020 ALASKA AK 2 94 81 US PRESIDENT "JANOS, JA…
## # ℹ 537 more rows
## # ℹ 7 more variables: party_detailed <chr>, writein <lgl>,
## # candidatevotes <dbl>, totalvotes <dbl>, version <dbl>, notes <lgl>,
## # party_simplified <chr>
# Determine the winning party per state
state_winners <- election_2020 %>%
group_by(state) %>%
slice_max(order_by = candidatevotes, n = 1, with_ties = FALSE) %>%
mutate(Political_Alignment = case_when(
party_simplified == "DEMOCRAT" ~ "Democratic",
party_simplified == "REPUBLICAN" ~ "Republican",
TRUE ~ "Other"
)) %>%
select(State = state, Political_Alignment)
print(state_winners)
## # A tibble: 51 Ă— 2
## # Groups: State [51]
## State Political_Alignment
## <chr> <chr>
## 1 ALABAMA Republican
## 2 ALASKA Republican
## 3 ARIZONA Democratic
## 4 ARKANSAS Republican
## 5 CALIFORNIA Democratic
## 6 COLORADO Democratic
## 7 CONNECTICUT Democratic
## 8 DELAWARE Democratic
## 9 DISTRICT OF COLUMBIA Democratic
## 10 FLORIDA Republican
## # ℹ 41 more rows
state_winners$State <- tools::toTitleCase(tolower(trimws(state_winners$State)))
print(state_winners)
## # A tibble: 51 Ă— 2
## # Groups: State [51]
## State Political_Alignment
## <chr> <chr>
## 1 Alabama Republican
## 2 Alaska Republican
## 3 Arizona Democratic
## 4 Arkansas Republican
## 5 California Democratic
## 6 Colorado Democratic
## 7 Connecticut Democratic
## 8 Delaware Democratic
## 9 District of Columbia Democratic
## 10 Florida Republican
## # ℹ 41 more rows
# Merge with existing data
merged_data <- left_join(merged_data, state_winners, by = "State")
print(merged_data)
## # A tibble: 51 Ă— 5
## State Total Population_2023 Funding_Per_Capita Political_Alignment
## <chr> <dbl> <chr> <dbl> <chr>
## 1 Alabama 3 5108468 587. Republican
## 2 Alaska 3.7 733406 5045. Republican
## 3 Arizona 3.5 7431344 471. Democratic
## 4 Arkansas 2.8 3067732 913. Republican
## 5 California 18.4 38965193 472. Democratic
## 6 Colorado 3.2 5877610 544. Democratic
## 7 Connecticut 2.5 3617176 691. Democratic
## 8 District of Col… 1.1 678972 1620. Democratic
## 9 Florida 8.2 22610726 363. Republican
## 10 Georgia 5 11029227 453. Democratic
## # ℹ 41 more rows
merged_data <- merged_data %>%
filter(!is.na(Political_Alignment))
print(merged_data)
## # A tibble: 50 Ă— 5
## State Total Population_2023 Funding_Per_Capita Political_Alignment
## <chr> <dbl> <chr> <dbl> <chr>
## 1 Alabama 3 5108468 587. Republican
## 2 Alaska 3.7 733406 5045. Republican
## 3 Arizona 3.5 7431344 471. Democratic
## 4 Arkansas 2.8 3067732 913. Republican
## 5 California 18.4 38965193 472. Democratic
## 6 Colorado 3.2 5877610 544. Democratic
## 7 Connecticut 2.5 3617176 691. Democratic
## 8 District of Col… 1.1 678972 1620. Democratic
## 9 Florida 8.2 22610726 363. Republican
## 10 Georgia 5 11029227 453. Democratic
## # ℹ 40 more rows
Question 1: Is the funding distribution equitable when adjusted for the population of each State and Territory, or does it reveal potential bias?
To answer this question, I would like to do the following:
Used formula:
merged_data$Population_2023 / total_population
—
calculates the proportion of the total population that each state
represents.total_funding
gives the expected
funding if funds were distributed strictly based on population
share.# data is in numeric format for calculations
merged_data$Population_2023 <- as.numeric(merged_data$Population_2023)
# Calculate total population and total funding
merged_data$Total_Funding <- merged_data$Funding_Per_Capita * merged_data$Population_2023
total_population <- sum(merged_data$Population_2023)
total_funding <- sum(merged_data$Total_Funding)
# Calculate expected funding based on population share
merged_data$Expected_Funding <- (merged_data$Population_2023 / total_population) * total_funding
# Calculate discrepancy
merged_data$Funding_Discrepancy <- merged_data$Total_Funding - merged_data$Expected_Funding
# Sort by discrepancy
merged_data <- merged_data[order(-merged_data$Funding_Discrepancy), ]
# View top overfunded and underfunded states
head(merged_data[, c("State", "Funding_Discrepancy")])
## # A tibble: 6 Ă— 2
## State Funding_Discrepancy
## <chr> <dbl>
## 1 Alaska 3280880960.
## 2 Montana 2652632678.
## 3 Wyoming 1966229334.
## 4 Louisiana 1686242692.
## 5 New Mexico 1391701807.
## 6 North Dakota 1352010329.
tail(merged_data[, c("State", "Funding_Discrepancy")])
## # A tibble: 6 Ă— 2
## State Funding_Discrepancy
## <chr> <dbl>
## 1 New York -1084349827.
## 2 Georgia -1302865039.
## 3 North Carolina -1692150855.
## 4 Texas -3231701192.
## 5 California -3867413001.
## 6 Florida -4721336591.
# Format Funding_Discrepancy with commas
merged_data$Funding_Discrepancy_Formatted <- comma(merged_data$Funding_Discrepancy)
head(merged_data[, c("State", "Funding_Discrepancy_Formatted")])
## # A tibble: 6 Ă— 2
## State Funding_Discrepancy_Formatted
## <chr> <chr>
## 1 Alaska 3,280,880,960
## 2 Montana 2,652,632,678
## 3 Wyoming 1,966,229,334
## 4 Louisiana 1,686,242,692
## 5 New Mexico 1,391,701,807
## 6 North Dakota 1,352,010,329
tail(merged_data[, c("State", "Funding_Discrepancy_Formatted")])
## # A tibble: 6 Ă— 2
## State Funding_Discrepancy_Formatted
## <chr> <chr>
## 1 New York -1,084,349,827
## 2 Georgia -1,302,865,039
## 3 North Carolina -1,692,150,855
## 4 Texas -3,231,701,192
## 5 California -3,867,413,001
## 6 Florida -4,721,336,591
# Split into overfunded and underfunded
overfunded <- merged_data[merged_data$Funding_Discrepancy > 0, ]
underfunded <- merged_data[merged_data$Funding_Discrepancy < 0, ]
# Define custom colors
party_colors <- c("Republican" = "red", "Democratic" = "blue")
# Plot overfunded states
ggplot(overfunded, aes(x = reorder(State, Funding_Discrepancy), y = Funding_Discrepancy, fill = Political_Alignment)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = party_colors) +
coord_flip() +
scale_y_continuous(labels = comma) +
labs(title = "Overfunded States by Political Alignment",
x = "State",
y = "Excess Funding (USD)") +
theme_minimal()
# Plot underfunded states
ggplot(underfunded, aes(x = reorder(State, Funding_Discrepancy), y = Funding_Discrepancy, fill = Political_Alignment)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = party_colors) +
coord_flip() +
scale_y_continuous(labels = comma) +
labs(title = "Underfunded States by Political Alignment",
x = "State",
y = "Funding Shortfall (USD)") +
theme_minimal()
Findings
Overfunded States:
Underfunded States:
Is the Distribution Equitable?
No, the funding distribution is not equitable. When adjusted for population, the data reveals a systematic bias:
Possible Sources of Bias:
Question 2: Does the allocation appear to align with or favor the political interests of the Biden administration?
To answer this question, I am going to do the following:
# Create helper columns
merged_data <- merged_data %>%
mutate(
Overfunded = ifelse(Funding_Discrepancy > 0, 1, 0),
Underfunded = ifelse(Funding_Discrepancy < 0, 1, 0)
)
print(merged_data)
## # A tibble: 50 Ă— 11
## State Total Population_2023 Funding_Per_Capita Political_Alignment
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 Alaska 3.7 733406 5045. Republican
## 2 Montana 3.3 1132812 2913. Republican
## 3 Wyoming 2.3 584057 3938. Republican
## 4 Louisiana 4.3 4573749 940. Republican
## 5 New Mexico 2.6 2114371 1230. Democratic
## 6 North Dakota 1.8 783926 2296. Republican
## 7 Kentucky 3.9 4526154 862. Republican
## 8 Illinois 8.4 12549689 669. Democratic
## 9 Arkansas 2.8 3067732 913. Republican
## 10 West Virginia 2 1770071 1130. Republican
## # ℹ 40 more rows
## # ℹ 6 more variables: Total_Funding <dbl>, Expected_Funding <dbl>,
## # Funding_Discrepancy <dbl>, Funding_Discrepancy_Formatted <chr>,
## # Overfunded <dbl>, Underfunded <dbl>
# Summarize by political alignment
funding_summary <- merged_data %>%
group_by(Political_Alignment) %>%
summarise(
Overfunded = sum(Overfunded),
Underfunded = sum(Underfunded)
)
print(funding_summary)
## # A tibble: 2 Ă— 3
## Political_Alignment Overfunded Underfunded
## <chr> <dbl> <dbl>
## 1 Democratic 9 16
## 2 Republican 16 9
# Reshape for stacked bar chart
funding_long <- funding_summary %>%
pivot_longer(cols = c(Overfunded, Underfunded),
names_to = "Funding_Status",
values_to = "State_Count")
print(funding_long)
## # A tibble: 4 Ă— 3
## Political_Alignment Funding_Status State_Count
## <chr> <chr> <dbl>
## 1 Democratic Overfunded 9
## 2 Democratic Underfunded 16
## 3 Republican Overfunded 16
## 4 Republican Underfunded 9
# Plot stacked bar chart
ggplot(funding_long, aes(x = Political_Alignment, y = State_Count, fill = Funding_Status)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("Overfunded" = "darkgreen", "Underfunded" = "darkred")) +
labs(title = "Stacked Bar Chart: Overfunded vs Underfunded States by Political Alignment",
x = "Political Alignment",
y = "Number of States",
fill = "Funding Status") +
theme_minimal()
Observations
This pattern suggests that the funding allocation favors Republican states, not Democratic ones.
If the Biden administration were favoring its political allies, we would expect Democratic states to be more frequently overfunded. Since the opposite is true, the data suggests that the allocation does not reflect partisan favoritism toward the administration’s political interests — and may even reflect a bias in the opposite direction.
Final Answer to Question 2
Does the allocation appear to align with or favor the political interests of the Biden administration?
No, the allocation does not appear to align with or favor the political interests of the Biden administration.