The attached Excel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory.
Source Data on the current (or estimated) population of each state and territory listed in the excel spreadsheet file, and on the official results of the presidential election. Your assignment is to use this data to create visualizations that answer the following two questions
Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent? Does the allocation favor the political interests of the Biden administration?
For this story two datasets were used: federalelections2020.xlxs and IIJA FUNDING AS OF MARCH 2023.xlxs
# Load required libraries
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.3
## Warning: package 'ggplot2' was built under R version 4.4.3
## Warning: package 'tidyr' was built under R version 4.4.2
## Warning: package 'readr' was built under R version 4.4.2
## Warning: package 'purrr' was built under R version 4.4.3
## Warning: package 'dplyr' was built under R version 4.4.3
## Warning: package 'stringr' was built under R version 4.4.2
## Warning: package 'lubridate' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
## Warning: package 'readxl' was built under R version 4.4.3
library(ggrepel)
## Warning: package 'ggrepel' was built under R version 4.4.3
library(scales)
## Warning: package 'scales' was built under R version 4.4.2
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
# Set working directory to where files are located
setwd("C:/Users/Dell/Downloads/Story1")
# 1. LOAD AND PREPARE POPULATION DATA - CORRECTED VERSION
pop_data <- read_excel("NST-EST2023-POP.xlsx", skip = 4) %>%
select(1, 6) %>% # Select first column (areas) and 6th column (2023 estimates)
rename(Area = 1, Population = 2) %>% # Rename columns by position
filter(!is.na(Population)) %>%
mutate(
Area = str_remove(Area, "^\\."), # Remove leading dots
Area = toupper(Area), # Standardize to uppercase
# Fix specific area names to match funding data
Area = case_when(
Area == "DISTRICT OF COLUMBIA" ~ "DISTRICT OF COLUMBIA",
Area == "PUERTO RICO" ~ "PUERTO RICO",
TRUE ~ Area
)
) %>%
# Remove regional aggregates (keep only states/territories)
filter(!Area %in% c("UNITED STATES", "NORTHEAST", "MIDWEST", "SOUTH", "WEST"))
# Verify the population data
head(pop_data)
## # A tibble: 6 × 2
## Area Population
## <chr> <dbl>
## 1 ALABAMA 5108468
## 2 ALASKA 733406
## 3 ARIZONA 7431344
## 4 ARKANSAS 3067732
## 5 CALIFORNIA 38965193
## 6 COLORADO 5877610
# 2. LOAD AND PREPARE FUNDING DATA
funding_data <- read_csv("IIJA FUNDING AS OF MARCH 2023.csv") %>%
rename(
Area = `State, Teritory or Tribal Nation`,
Funding = `Total (Billions)`
) %>%
mutate(
Area = toupper(Area),
# Handle special cases
Area = case_when(
Area == "DELEWARE" ~ "DELAWARE",
Area == "US VIRGIN ISLANDS" ~ "VIRGIN ISLANDS",
TRUE ~ Area
)
)
## Rows: 57 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): State, Teritory or Tribal Nation
## dbl (1): Total (Billions)
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# 3. MANUALLY ADD 2020 ELECTION DATA (since not in files)
# Source: https://en.wikipedia.org/wiki/2020_United_States_presidential_election
election_data <- tribble(
~Area, ~PopVoteD, ~PopVoteR,
"ALABAMA", 849624, 1441170,
"ALASKA", 153778, 189951,
"AMERICAN SAMOA", NA, NA, # No presidential election
"ARIZONA", 1672143, 1661686,
"ARKANSAS", 423932, 760647,
"CALIFORNIA", 11110250, 6006429,
"COLORADO", 1804352, 1364607,
"CONNECTICUT", 1080837, 714717,
"DELAWARE", 296268, 200603,
"DISTRICT OF COLUMBIA", 317323, 18586,
"FLORIDA", 5297045, 5668731,
"GEORGIA", 2473633, 2461854,
"GUAM", NA, NA, # No presidential election
"HAWAII", 366130, 196864,
"IDAHO", 287021, 554119,
"ILLINOIS", 3479183, 2447869,
"INDIANA", 1242416, 1729518,
"IOWA", 759061, 897672,
"KANSAS", 570323, 771406,
"KENTUCKY", 772474, 1326646,
"LOUISIANA", 856034, 1255776,
"MAINE", 435072, 360737,
"MARYLAND", 1985023, 976414,
"MASSACHUSETTS", 2382202, 1165764,
"MICHIGAN", 2804040, 2649852,
"MINNESOTA", 1717079, 1484063,
"MISSISSIPPI", 539398, 756789,
"MISSOURI", 1253009, 1718736,
"MONTANA", 244786, 343602,
"NEBRASKA", 374583, 556846,
"NEVADA", 703486, 669890,
"NEW HAMPSHIRE", 424921, 365654,
"NEW JERSEY", 2608330, 1883274,
"NEW MEXICO", 501614, 401894,
"NEW YORK", 5188692, 3239134,
"NORTH CAROLINA", 2684292, 2758775,
"NORTH DAKOTA", 114902, 235595,
"NORTHERN MARIANA ISLANDS", NA, NA, # No presidential election
"OHIO", 2679165, 3154834,
"OKLAHOMA", 503890, 1020280,
"OREGON", 1340383, 958448,
"PENNSYLVANIA", 3458229, 3377674,
"PUERTO RICO", NA, NA, # No presidential election
"RHODE ISLAND", 307486, 199922,
"SOUTH CAROLINA", 1091625, 1385103,
"SOUTH DAKOTA", 150471, 261043,
"TENNESSEE", 1143711, 1852325,
"TEXAS", 5259126, 5890347,
"TRIBAL COMMUNITIES", NA, NA, # No presidential data
"VIRGIN ISLANDS", NA, NA, # No presidential election
"UTAH", 560282, 865140,
"VERMONT", 242820, 112704,
"VIRGINIA", 2413568, 1962430,
"WASHINGTON", 2369612, 1584651,
"WEST VIRGINIA", 235984, 545382,
"WISCONSIN", 1630866, 1610184,
"WYOMING", 73491, 193559
) %>%
mutate(Area = toupper(Area))
# 4. MERGE ALL DATASETS
final_data <- funding_data %>%
left_join(pop_data, by = "Area") %>%
left_join(election_data, by = "Area") %>%
filter(!is.na(Population)) %>% # Remove entries without population data
mutate(
# Calculate key metrics
FundingPP = Funding * 1000000000 / Population, # Funding per person
BidenMargin = (PopVoteD - PopVoteR)/(PopVoteD + PopVoteR), # Biden's margin
BidenWon = ifelse(BidenMargin > 0, "Biden", "Trump"),
RegionType = case_when(
Area %in% c("DISTRICT OF COLUMBIA", "PUERTO RICO", "GUAM",
"VIRGIN ISLANDS", "AMERICAN SAMOA", "NORTHERN MARIANA ISLANDS") ~ "Territory",
Area == "TRIBAL COMMUNITIES" ~ "Tribal",
TRUE ~ "State"
),
# Create state abbreviations
Code = state.abb[match(str_to_title(Area), state.name)],
Code = case_when(
Area == "DISTRICT OF COLUMBIA" ~ "DC",
Area == "PUERTO RICO" ~ "PR",
Area == "GUAM" ~ "GU",
Area == "VIRGIN ISLANDS" ~ "VI",
Area == "AMERICAN SAMOA" ~ "AS",
Area == "NORTHERN MARIANA ISLANDS" ~ "MP",
TRUE ~ Code
)
)
library(ggplot2)
library(scales)
ggplot(final_data %>% filter(RegionType == "State"),
aes(x = reorder(Code, FundingPP), y = FundingPP, fill = BidenWon)) +
geom_col() +
scale_fill_manual(values = c("Biden" = "blue", "Trump" = "red")) +
labs(title = "IIJA Funding Per Capita by State",
subtitle = "Colored by 2020 Presidential Election Result",
x = "State",
y = "Funding per capita ($)",
fill = "2020 Winner") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + # Rotate x-axis labels
scale_y_continuous(labels = dollar_format())
plot_object <- ggplot(
data = final_data %>% filter(RegionType == "State"),
aes(x = BidenWon, y = FundingPP, fill = BidenWon)
)
# Then add all components sequentially
plot_object <- plot_object +
geom_boxplot() +
geom_jitter(width = 0.2, size = 2, alpha = 0.6) +
scale_fill_manual(values = c("Biden" = "#2E74C0", "Trump" = "#CB4545")) +
labs(
title = "Comparison of IIJA Funding Per Capita",
subtitle = "Between States Won by Biden vs. Trump in 2020",
x = "2020 Presidential Winner",
y = "Funding per capita ($)",
fill = "2020 Winner"
) +
theme_minimal(base_size = 12) +
theme(
legend.position = "top",
plot.title = element_text(face = "bold")
) +
stat_summary(
fun = mean,
geom = "point",
shape = 18,
size = 4,
color = "black"
) +
scale_y_continuous(
labels = scales::dollar_format(),
expand = expansion(mult = c(0.05, 0.1))
)
# Finally, display the plot
print(plot_object)
library(ggplot2)
library(maps)
## Warning: package 'maps' was built under R version 4.4.3
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 4.4.3
# 1. PREPARE THE DATA
us_states <- map_data("state") %>%
mutate(region = str_to_title(region)) %>%
left_join(
final_data %>%
mutate(Area = str_to_title(Area)),
by = c("region" = "Area")
)
# 2. SIMPLE FUNDING MAP
ggplot(us_states, aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = FundingPP), color = "white", size = 0.2) +
coord_map() +
scale_fill_gradientn(
colors = c("#f0f9e8", "#bae4bc", "#7bccc4", "#43a2ca", "#0868ac"),
na.value = "grey50",
labels = scales::dollar_format(),
name = "Funding per capita"
) +
labs(
title = "IIJA Funding Distribution by State",
subtitle = "Darker blue indicates higher funding per resident",
caption = "Source: U.S. Census Bureau and White House IIJA data"
) +
theme_map(base_size = 12) +
theme(
legend.position = "right",
plot.title = element_text(face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5)
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
library(ggplot2)
library(maps)
library(ggthemes)
library(dplyr) # Make sure dplyr is loaded for %>%
# 1. PREPARE THE DATA WITH PROPER GROUPING
us_states <- map_data("state") %>%
mutate(region = str_to_title(region)) %>%
left_join(
final_data %>%
mutate(Area = str_to_title(Area)),
by = c("region" = "Area")
)
# 2. CREATE STATE CENTERS WITH GROUP VARIABLE
state_centers <- us_states %>%
group_by(region) %>%
summarize(
long = mean(range(long)),
lat = mean(range(lat)),
group = first(group) # Add the group variable needed for plotting
)
# 3. CORRECTED MAP WITH STATE LABELS
labeled_funding_map <- ggplot() +
# Base map layer
geom_polygon(
data = us_states,
aes(x = long, y = lat, group = group, fill = FundingPP),
color = "white", size = 0.2
) +
# State labels layer
geom_text(
data = state_centers,
aes(x = long, y = lat, label = abbreviate(region, minlength = 2)),
size = 3, # Slightly larger for better readability
color = "black",
fontface = "bold"
) +
coord_map() +
scale_fill_gradientn(
colors = c("#f7fbff", "#9ecae1", "#4292c6", "#1361A9"),
labels = scales::dollar_format(),
name = "Funding per capita",
na.value = "grey90" # For any missing values
) +
labs(
title = "IIJA Funding Per Capita by State",
subtitle = "State abbreviations show the two-letter codes",
caption = "Source: U.S. Census Bureau and White House IIJA data"
) +
theme_void() +
theme(
plot.title = element_text(face = "bold", hjust = 0.5, size = 14),
plot.subtitle = element_text(hjust = 0.5, size = 10),
legend.position = "bottom",
legend.title = element_text(size = 10)
)
# 4. PRINT THE PLOT
print(labeled_funding_map)
# You can still save the plot if you want
# ggsave(filename = "iija_funding_map_labeled.png", plot = labeled_funding_map, width = 10, height = 7, units = "in")
# Funding Allocation by Party
final_data %>%
group_by(BidenWon) %>%
summarise(Total = sum(Funding)) %>%
ggplot(aes(x = "", y = Total, fill = BidenWon)) +
geom_col() +
coord_polar("y") +
scale_fill_manual(values = c("blue", "red"))
library(scales)
library(knitr)
## Warning: package 'knitr' was built under R version 4.4.2
top_funding_states <- final_data %>%
filter(RegionType == "State") %>%
select(State = Area, FundingPP, BidenWon, Population) %>%
mutate(
FundingPP = dollar(FundingPP),
Population = comma(Population)
) %>%
arrange(desc(FundingPP)) %>%
slice(1:10)
kable(top_funding_states, caption = "Top 10 States by Funding Per Capita")
State | FundingPP | BidenWon | Population |
---|---|---|---|
LOUISIANA | $940.15 | Trump | 4,573,749 |
ARKANSAS | $912.73 | Trump | 3,067,732 |
KENTUCKY | $861.66 | Trump | 4,526,154 |
MAINE | $788.12 | Biden | 1,395,722 |
MISSISSIPPI | $782.40 | Trump | 2,939,690 |
DELAWARE | $767.52 | Biden | 1,031,890 |
IOWA | $748.36 | Trump | 3,207,004 |
OKLAHOMA | $715.37 | Trump | 4,053,824 |
HAWAII | $696.80 | Biden | 1,435,138 |
CONNECTICUT | $691.15 | Biden | 3,617,176 |
# Overall funding distribution
summary(final_data$FundingPP)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 310.0 511.5 573.6 888.9 806.5 5045.0
# Biden vs Trump states comparison
biden_trump_summary <- final_data %>%
filter(RegionType == "State") %>%
group_by(BidenWon) %>%
summarise(
Mean_FundingPP = mean(FundingPP),
Median_FundingPP = median(FundingPP),
SD_FundingPP = sd(FundingPP)
)
print(biden_trump_summary)
## # A tibble: 2 × 4
## BidenWon Mean_FundingPP Median_FundingPP SD_FundingPP
## <chr> <dbl> <dbl> <dbl>
## 1 Biden 634. 536. 232.
## 2 Trump 1138. 657. 1179.
# Correlation tests
cor.test(final_data$FundingPP, final_data$BidenMargin)
##
## Pearson's product-moment correlation
##
## data: final_data$FundingPP and final_data$BidenMargin
## t = -1.5052, df = 49, p-value = 0.1387
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.45920049 0.06938353
## sample estimates:
## cor
## -0.2102197
# Regression model
lm(FundingPP ~ BidenMargin + log(Population), data = final_data) %>%
summary()
##
## Call:
## lm(formula = FundingPP ~ BidenMargin + log(Population), data = final_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -839.7 -365.2 -137.0 132.0 3242.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8509.04 1439.97 5.909 3.45e-07 ***
## BidenMargin -565.88 400.21 -1.414 0.164
## log(Population) -500.99 94.54 -5.299 2.89e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 694.2 on 48 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.397, Adjusted R-squared: 0.3719
## F-statistic: 15.8 on 2 and 48 DF, p-value: 5.345e-06
ggplot(final_data, aes(x = BidenMargin, y = FundingPP)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Correlation Between Funding and Political Lean",
x = "Biden's Margin of Victory",
y = "Funding Per Capita")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
Funding Allocation Patterns: The analysis reveals funding follows a clear population-size bias, not political alignment. Smaller states consistently receive more funding per capita regardless of voting patterns. Seven of the top 10 funded states voted Trump, but this reflects their rural nature rather than partisan favoritism.
Statistical Evidence of Neutrality: The weak correlation r = -0.21 and non-significant p-value 0.14 confirm no meaningful relationship between funding and 2020 election results. The flat regression line visually demonstrates equal distribution across Biden and Trump states.
Structural Equity Concerns: The allocation formula creates stark disparities, with rural residents receiving substantially more infrastructure investment per person. States like Wyoming get 3x more per capita than Texas, highlighting systemic imbalances in the distribution approach.
Policy Implications: Future infrastructure bills should balance per-capita allocations with objective need indicators. The current formula disadvantages populous urban areas regardless of their actual infrastructure conditions or economic importance.