library(forcats)
library(ggplot2)
library(tidyr)
library(dplyr)
library(readxl)DATA_608_Story_01_New
Story - 1 : Infrastructure Investment & Jobs Act Funding Allocation
Instructions
The attached Excel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory.
Your story (Data Visualization(s) ) should address the following questions:
Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?
- Let’s plot the funding allocation by State and Territory, and sort in ascending order
setwd("C:/Users/month/OneDrive/Documents/Cuny MSDS/Data 608_Knowledge And Visual Analytics")
allocation <- read_excel("IIJAFUNDINGASOFMARCH2023.xlsx")
head(allocation)# A tibble: 6 × 2
`State, Teritory or Tribal Nation` `Total (Billions)`
<chr> <dbl>
1 ALABAMA 3
2 ALASKA 3.7
3 AMERICAN SAMOA 0.0686
4 ARIZONA 3.5
5 ARKANSAS 2.8
6 CALIFORNIA 18.4
allocation |>
ggplot(aes(
y = fct_reorder(`State, Teritory or Tribal Nation`, `Total (Billions)`, .desc = TRUE),
x = `Total (Billions)`)) +
geom_bar(stat = "identity") +
theme(
axis.text.y = element_text(size = 5),
plot.margin = margin(10, 10, 10, 20)) +
labs(title = "IIJA Funding as of March 2023",
x = "Total Funding (Billions)",
y = "State, Territory, or Tribal Nation")- Let’s plot the population by State and Territory, and sort in ascending order
library(forcats)
library(ggplot2)
library(tidyr)
library(dplyr)
library(readxl)
setwd("C:/Users/month/OneDrive/Documents/Cuny MSDS/Data 608_Knowledge And Visual Analytics")
population <- read_excel("EST2025POP.xlsx")
head(population)# A tibble: 6 × 2
`State, Teritory or Tribal Nation` Population
<chr> <dbl>
1 Alabama 5193088
2 Alaska 737270
3 AMERICAN SAMOA 46400
4 Arizona 7623818
5 Arkansas 3114791
6 California 39355309
population |>
ggplot(aes(
y = fct_reorder(`State, Teritory or Tribal Nation`, `Population`, .desc = TRUE),
x = `Population`)) +
geom_bar(stat = "identity") +
theme(
axis.text.y = element_text(size = 5),
plot.margin = margin(10, 10, 10, 20)) +
labs(title = "US Population as of 2025",
x = "Total Population",
y = "State, Territory, or Tribal Nation")Comment:
The allocation could potentially be equitable based on the population of each of the States and Territories, if a random State or Territory has the same position on both graphs. From the two visualizations above, we realize that there are some instances where the funding allocation by State or Territory does not match the population by State or Territory. For example, Wyoming is the 5th smallest State or Territory in terms of population, but it has the 21st smallest funding by State or Territory.
- Let’s estimate the Funding per Capita to assess the impact from the change in the populations
library(stringr)
allocation_norm <- allocation |>
mutate(state_key = str_to_upper(str_squish(str_trim(`State, Teritory or Tribal Nation`))))
population_norm <- population |>
mutate(state_key = str_to_upper(str_squish(str_trim(`State, Teritory or Tribal Nation`))))
joined <- allocation_norm |>
left_join(population_norm, by = "state_key") |>
mutate(
funding_per_capita = (`Total (Billions)` * 1e9) / Population,
`State, Teritory or Tribal Nation`= `State, Teritory or Tribal Nation.x`
) |>
select(-c(1,3,4)) |>
select(`State, Teritory or Tribal Nation`, everything())
head(joined)# A tibble: 6 × 4
State, Teritory or Tribal N…¹ `Total (Billions)` Population funding_per_capita
<chr> <dbl> <dbl> <dbl>
1 ALABAMA 3 5193088 578.
2 ALASKA 3.7 737270 5019.
3 AMERICAN SAMOA 0.0686 46400 1478.
4 ARIZONA 3.5 7623818 459.
5 ARKANSAS 2.8 3114791 899.
6 CALIFORNIA 18.4 39355309 468.
# ℹ abbreviated name: ¹`State, Teritory or Tribal Nation`
- Let’s plot the Funding per Capita
stats <- summary(joined$funding_per_capita)
joined |>
ggplot(aes(y = `State, Teritory or Tribal Nation`,
x = funding_per_capita)) +
geom_bar(stat = "identity") +
theme(
axis.text.y = element_text(size = 5),
plot.margin = margin(10, 10, 10, 20)) +
labs(title = "USA Per Capita Funding as of 2025",
x = "Per Capita Funding (Dollars)",
y = "State, Territory, or Tribal Nation") +
geom_vline(xintercept = stats["Median"], color = "blue", linetype = "dashed") +
geom_text(aes(x = stats["Median"], y = -1, label = paste("Median =", round(stats["Median"], 2))),
vjust = -0.5, hjust = 0.5, size = 4, color = "blue")Comment:
The visualization shows that several states receive significantly higher per‑capita funding than the median. Alaska stands out with more than five thousand dollars per resident; over eight times the median allocation. Other states and territories, including Wyoming, West Virginia, Vermont, the U.S. Virgin Islands, South Dakota, Rhode Island, the Northern Mariana Islands, North Dakota, New Mexico, Montana, the District of Columbia, and American Samoa, also exhibit disproportionately high per‑capita funding relative to their populations.
Conversely, some jurisdictions; most notably Puerto Rico and Florida, fall well below the median.
Taken together, these disparities suggest that IIJA funding is not distributed proportionately to population size, indicating that the allocation may not be equitable if evaluated strictly on a per‑capita basis.
Does the allocation favor the political interests of the Biden administration?
- Let’s recreate a join to only take into consideration the States
setwd("C:/Users/month/OneDrive/Documents/Cuny MSDS/Data 608_Knowledge And Visual Analytics")
results2020 <- read.csv("1976_2020president.csv", na.strings = c("")) |>
filter(c(year == 2020), (party_detailed %in% c("DEMOCRAT","REPUBLICAN"))) |>
select(c(1,2,8,9,11))
head(results2020) year state candidate party_detailed candidatevotes
1 2020 ALABAMA BIDEN, JOSEPH R. JR DEMOCRAT 849624
2 2020 ALABAMA TRUMP, DONALD J. REPUBLICAN 1441170
3 2020 ALASKA BIDEN, JOSEPH R. JR DEMOCRAT 153778
4 2020 ALASKA TRUMP, DONALD J. REPUBLICAN 189951
5 2020 ARIZONA BIDEN, JOSEPH R. JR DEMOCRAT 1672143
6 2020 ARIZONA TRUMP, DONALD J. REPUBLICAN 1661686
joined_new <- joined |>
left_join(results2020, by = c(`State, Teritory or Tribal Nation` = "state"))
#Dropping NA's to only keep States (no Territories)
states_only <- drop_na(joined_new)
head(states_only)# A tibble: 6 × 8
State, Teritory or Tr…¹ `Total (Billions)` Population funding_per_capita year
<chr> <dbl> <dbl> <dbl> <int>
1 ALABAMA 3 5193088 578. 2020
2 ALABAMA 3 5193088 578. 2020
3 ALASKA 3.7 737270 5019. 2020
4 ALASKA 3.7 737270 5019. 2020
5 ARIZONA 3.5 7623818 459. 2020
6 ARIZONA 3.5 7623818 459. 2020
# ℹ abbreviated name: ¹`State, Teritory or Tribal Nation`
# ℹ 3 more variables: candidate <chr>, party_detailed <chr>,
# candidatevotes <int>
- Let’s explore the funding from the political parties standpoint
joined_new |>
group_by(party_detailed) |>
summarize(total = sum(`Total (Billions)`)) |>
mutate(funding_ratio = total / sum(total)) |>
ggplot(aes(x = party_detailed, y = funding_ratio * 100, fill = party_detailed)) +
geom_bar(stat = "identity") +
labs(title = "Funding Per Party",
x = "Political Party",
y = "Percentage of Total Allocations") +
scale_fill_manual(values = c("blue", "red", "grey")) +
geom_text(aes(label = paste0(round(funding_ratio*100, 0),'%'), vjust = -0.2))Comment:
The funding per political parties is approximately the same between Democrats and Republicans.
- Let’s explore whether the winner of the 2020 presidential elections (Biden administration) favored Democrat states
states_only |>
group_by(`State, Teritory or Tribal Nation`) |>
slice_max(order_by = candidatevotes, n = 1, with_ties = FALSE) |>
ungroup() |>
ggplot(aes(y = reorder(`State, Teritory or Tribal Nation`, funding_per_capita), x = funding_per_capita, fill = candidate)) +
geom_bar(stat = "identity")+
theme(
axis.text.y = element_text(size = 5),
plot.margin = margin(10, 10, 10, 20)) +
labs(title = "USA Per Capita Funding as of 2025",
x = "Funding Per Capita (Dollars)",
y = "State",
fill = "Elected Candidate 2020") +
scale_fill_manual(values = c("blue", "red")) Comment:
States receiving higher levels of Funding per Capita, are predominantly those that supported the Republican candidate in the 2020 presidential election. This pattern suggests that the distribution of funds does not simply reflect partisan alignment with the current administration. However, many factors could influence how these allocations were determined. Additional information such as: demographic characteristics, economic conditions, and infrastructure needs, would be necessary to better understand the underlying drivers of these funding decisions.
Conclusion
The visualizations suggest that funding levels are not distributed evenly across states, as there are substantial differences in per‑capita allocations. Although large states like California and Texas received high total amounts due to their populations, their per‑person funding was relatively low compared with states that have much smaller populations, such as Alaska and Wyoming.
At the same time, the distribution does not appear to align with President Biden’s partisan interests. The states receiving the highest per‑capita funding are predominantly Republican‑leaning. To understand what factors may be driving these patterns, further analysis would be needed; particularly incorporating demographic, economic, and infrastructure‑related variables.
Sources of the data used:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/42MVDX
(For the 2020 Presidential Election results).