DATA 608 - Story-1 : Infrastructure Investment & Jobs Act Funding Allocation :

Bikash Bhowmik —- 7 Sep 2025

Column

Column

Instruction

The attached Excel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory.

Your story (Data Visualization(s) ) should address the following questions:

  • Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?

  • Does the allocation favor the political interests of the Biden administration?

Notes:

  1. You will need to source data on the current (estimated) population of each of the States and Territories (accuracy is more important than precision) and on the official election results of the 2020 Presidential election.

  2. You may choose to develop you visualizations using a desktop application or a code library. Your submittal should be in the form of a report (document) or a presentation.

Introduction

The visualization explores the relationship between total population and total funding in 2023, using a dataset that merges population figures with funding allocations across U.S. states and territories. A scatter plot presents this relationship, with each point representing a state: the x-axis shows total funding (in billions), the y-axis shows total population, and colors distinguish individual states. A regression line is included to highlight overall trends. To improve readability, scientific notation was removed from the y-axis labels, ensuring a clearer and more accessible view of the data.

Alongside the Infrastructure Investment and Jobs Act funding allocations, additional data was incorporated from U.S. Census Bureau population estimates and the 2020 election results dataset from Kaggle.

Together, these sources help address two key questions:

  • Is the funding allocation equitable relative to population, or does it reveal bias?

  • Does the allocation align with or favor the political interests of the Biden administration?

Load Packages

I have loaded necessary packages.

” library(readxl)

library(tidyverse)

library(kableExtra)

library(readr)

library(dplyr)

library(ggplot2)

library(plotly)

library(stringr) ”

Load Data

# Set file path
file_path_xls <- "D:/Cuny_sps/DATA_608/Story-1/IIJA-FUNDING-AS-OF-MARCH-2023.xlsx"

# Read Excel file 
df <- read_excel(file_path_xls)

# Save as CSV
write.csv(df, "D:/Cuny_sps/DATA_608/Story-1/IIJA_FUNDING_MARCH_2023.csv", row.names = FALSE)


# Load the IIIJA Funding data .csv file
IIIJA_Funding <- read.csv("D:/Cuny_sps/DATA_608/Story-1/IIJA_FUNDING_MARCH_2023.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))

#Load the State Population data 
State_Pop <- read.csv("D:/Cuny_sps/DATA_608/Story-1/state-population-table.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))

#Load the Territory Population data 
Territory_Pop <- read.csv("D:/Cuny_sps/DATA_608/Story-1/state-population-territory-table.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))

#Load the 2020 Presidential Election Results data from the provided GITHUB URL into a data frame named Territory_Pop.
Voting_Results <- read.csv("D:/Cuny_sps/DATA_608/Story-1/Presidential-Election-Voting-Results-2020.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))

Clean Dataset

Join the State_Pop and Territory_Pop datasets using merge command in R.

# Join the State_Pop and Territory_Pop datasets based on a the State column using the merge function .
State_Territory_Pop <- merge(State_Pop, Territory_Pop, by = "state", all = TRUE)

Rename columns

# Rename the columns in the IIIJA_Funding dataframe
IIIJA_Funding <- IIIJA_Funding %>%
  rename(State = 'State..Teritory.or.Tribal.Nation', Total_Billions = 'Total..Billions.')


State_Territory_Pop <- State_Territory_Pop %>%
  rename(State = 'state', Population_2024 = 'pop2024.x', Population_2023 = 'pop2023.x', Population_2020 = 'pop2020.x',
         Growth_Rate = 'growthRate.x', Growth = 'growth.x', Rank =  'rank.x')

Voting_Results <- Voting_Results%>%
  rename(State = 'state', Biden = biden_vote, Trump = trump_vote)

Remove columns that are not needed

State_Territory_Pop <- State_Territory_Pop %>%
  select(-c(densityMi.x , fips.x, pop2019.x, pop2010.x, percent.x, growthSince2010.x, area.x, fips.y, densityMi.y, pop2024.y, pop2023.y, pop2020.y, pop2019.y, pop2010.y, growthRate.y, growth.y, growthSince2010.y, area.y, rank.y))

Voting_Results <- Voting_Results %>%
  select(-c(trump_pct, biden_pct))

Convert State Column to proper case

# Convert values in the State column to proper case
IIIJA_Funding$State <- str_to_title(IIIJA_Funding$State)

View the Datasets

Display the IIIJA Funding data ( First 6 records)

head(IIIJA_Funding)  %>% kable() %>% 
kable_styling(bootstrap_options = "striped", font_size = 12) %>% 
scroll_box(height = "300px", width = "100%")
State Total_Billions
Alabama 3.0000
Alaska 3.7000
American Samoa 0.0686
Arizona 3.5000
Arkansas 2.8000
California 18.4000




“State and Territory Population Data (2020–2024 Estimates)”

#Display the State and TerritoryPopulation data
head(State_Territory_Pop)  %>% kable() %>% 
kable_styling(bootstrap_options = "striped", font_size = 12) %>% 
scroll_box(height = "300px", width = "100%")
State Population_2024 Population_2023 Population_2020 Growth_Rate Growth Rank percent.y
Alabama 5143033 5108468 5031864 0.00677 34565 24 NA
Alaska 733536 733406 732964 0.00018 130 48 NA
Arizona 7497004 7431344 7186683 0.00884 65660 14 NA
Arkansas 3089060 3067732 3014348 0.00695 21328 33 NA
California 38889770 38965193 39503200 -0.00194 -75423 1 NA
Colorado 5914181 5877610 5785219 0.00622 36571 21 NA




“2020 Presidential Election Results by State”

#Display the 2020 Presidential Election data
head(Voting_Results)  %>% kable() %>% 
kable_styling(bootstrap_options = "striped", font_size = 12) %>% 
scroll_box(height = "300px", width = "100%")
State state_abr Trump Biden trump_win biden_win
Alaska AK 189543 153502 1 0
Hawaii HI 196864 366130 0 1
Washington WA 1584651 2369612 0 1
Oregon OR 958448 1340383 0 1
California CA 5982194 11082293 0 1
Idaho ID 554128 287031 1 0

  • Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?
# Create a bar plot of IIJA Funding
ggplot(IIIJA_Funding, aes(x = State, y = Total_Billions)) +
  geom_bar(stat = "identity", fill = "gray50") +
   geom_text(aes(label = case_when(
    State == "New York" ~ "NY",
    State == "California" ~ "CA",
    State == "Texas" ~ "TX",
    State == "Florida" ~ "FL",
    State == "Illinois" ~ "IL",
    TRUE ~ ""
  )), 
  vjust = -0.5, color = "#4F3C5F", fontface = "bold") +
  labs(title = "Allocation of IIJA Funding by State and Territory As Of March 2023",
       x = "State/Territory",
       y = "Total Funding (Billions)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))



Distribution of IIJA Funding Across States and Territories (2023)

# Reorder states by descending Total_Billions
IIIJA_Funding <- IIIJA_Funding %>%
  mutate(State = reorder(State, -Total_Billions))

# Create a bar plot of IIJA Funding sorted
ggplot(IIIJA_Funding, aes(x = State, y = Total_Billions)) +
  geom_bar(stat = "identity", fill = "gray50") +
  geom_text(aes(label = case_when(
    State == "New York" ~ "NY",
    State == "California" ~ "CA",
    State == "Texas" ~ "TX",
    State == "Florida" ~ "FL",
    State == "Illinois" ~ "IL",
    TRUE ~ ""
  )),
  vjust = -0.5, color = "#4F3C5F", fontface = "bold") +
  labs(title = "Allocation of IIJA Funding by State and Territory As Of March 2023",
       x = "State/Territory",
       y = "Total Funding (Billions)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))



Population Distribution Across States and Territories (2023)

# Filter out rows with missing values in Population_2023
State_Territory_Pop <- State_Territory_Pop[complete.cases(State_Territory_Pop$Population_2023), ]

# Create a bar plot of population distribution by state
ggplot(State_Territory_Pop, aes(x = State, y = Population_2023, fill = State)) +
  geom_bar(stat = "identity" , fill = "gray50") +
     geom_text(aes(label = case_when(
    State == "New York" ~ "NY",
    State == "California" ~ "CA",
    State == "Texas" ~ "TX",
    State == "Florida" ~ "FL",
    State == "Illinois" ~ "IL",
    TRUE ~ ""
  )), 
  vjust = -0.5, color = "#4F3C5F", fontface = "bold") +
  labs(title = "Total Population by State or Territory For 2023",
       x = "State/Territory",
       y = "Total Population (Billions)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  guides(fill = "none") +
  scale_y_continuous(labels = scales::comma)




Combine the IIJA Funding data with the 2023 Population data.

Population_Funding <- list(IIIJA_Funding,State_Territory_Pop) 
Population_Funding <- Population_Funding %>% reduce(inner_join, by='State')

Combined <- Population_Funding %>% 
  ggplot(aes(x = Total_Billions, y = Population_2023)) +
  geom_point(color = "gray50", size = 4, alpha = 0.6) +
  geom_smooth(method = "lm", color = "#4F3C5F", se = FALSE) +
  # Highlight outliers
  geom_point(data = subset(Population_Funding, State %in% c("California","Texas","New York","Florida","Puerto Rico")),
             aes(x = Total_Billions, y = Population_2023), 
             color = "darkorange", size = 5) +
  geom_text(data = subset(Population_Funding, State %in% c("California","Texas","New York","Florida","Puerto Rico")),
            aes(label = State), vjust = -1, size = 3.5, color = "black") +
  theme_bw() +
  labs(title= "Total Population vs Total Funding (2023)", 
       x = "Total Funding (Billions)",
       y = "Total Population") +
  theme(plot.title = element_text(size=15, color="#281f30", face="bold"), 
        axis.title.x = element_text(face="bold", color="#281f30"), 
        axis.title.y = element_text(face="bold", color="#281f30"),
        legend.position = "none") +
  scale_y_continuous(labels = scales::comma)

ggplotly(Combined)

CONCLUSION - QUESTION 1

The results demonstrate a strong positive correlation between population size and funding allocation, suggesting that distribution largely scales with demographic magnitude. High-population states such as California, Texas, New York, and Florida emerge as expected outliers, receiving allocations that align proportionally with their population share.

  • Does the allocation favor the political interests of the Biden administration?

VIEW 2020 PRESIDENTIAL ELECTION DATA

# Assuming Voting_Results is your data frame
Voting_Results_Combined <- Voting_Results %>%
  gather(Political_Candidate, Total_Votes, Trump, Biden, factor_key = TRUE) %>%
  mutate(Winner = ifelse(biden_win == 1, "DEMOCRATIC", "REPUBLICAN"))


#Remove the trump_win and biden_win columns because they are no longer needed
Voting_Results_Combined <- Voting_Results_Combined %>%
  select(-c(trump_win, biden_win))

#Display the 2020 Presidential Election data
head(Voting_Results_Combined)  %>% kable() %>% 
kable_styling(bootstrap_options = "striped", font_size = 12) %>% 
scroll_box(height = "300px", width = "100%")
State state_abr Political_Candidate Total_Votes Winner
Alaska AK Trump 189543 REPUBLICAN
Hawaii HI Trump 196864 DEMOCRATIC
Washington WA Trump 1584651 DEMOCRATIC
Oregon OR Trump 958448 DEMOCRATIC
California CA Trump 5982194 DEMOCRATIC
Idaho ID Trump 554128 REPUBLICAN




Compare Trump vs Biden Vote by State

# Create a bar plot of IIJA Funding of votes by candidate
ggplot(Voting_Results_Combined, aes(x = State, y = Total_Votes, fill = Political_Candidate)) +
  geom_bar(stat = "identity") +
  labs(title = "Compare Trump vs Biden Vote by State (2020 Election)",
       x = "State/Territory",
       y = "Total Votes") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_y_continuous(labels = scales::comma) +
  scale_fill_manual(values = c("Trump" = "#FF9999",   # light red
                               "Biden" = "#9999FF")) # light blue




Create a new dataframe and combine the election data, the IIJA Funding data with the 2023 Population data.

Political_Funding = list(Population_Funding,Voting_Results_Combined) 
Political_Funding <- Political_Funding %>% reduce(inner_join, by='State')

# Create a bar plot of IIJA Funding for each Political candidate
ggplot(Political_Funding, aes(x = State, y = Total_Billions, fill = Political_Candidate)) +
  geom_bar(stat = "identity") +
  labs(title = "Allocation of Funding by State (Biden & Trump)",
       x = "State/Territory",
       y = "Total Funding (Billions)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_fill_manual(values = c("Trump" = "#FF9999",   # light red
                               "Biden" = "#9999FF")) # light blue




“Percentage of IIJA Funding Allocated to Biden vs. Trump States (2020 Election)”

# Create a bar plot of Percentage Allocation by candidate
Political_Funding %>%
  group_by(Winner) %>%
  summarize(Total_Fund = sum(Total_Billions)) %>%
  mutate(fund_percent = Total_Fund / sum(Total_Fund)) %>%
  ggplot(aes(x = Winner, y = fund_percent * 100, fill = Winner)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = paste0(round(fund_percent*100, 0), "%")), 
            vjust = 1.5, color = "white", fontface = "bold", size = 5) +
  labs(title = "Percentage of IIJA Funding Allocated to Biden vs Trump States", 
       x = "Political Party", 
       y = "Percentage of Total Allocation") +
  theme(axis.text.x = element_text(angle = 0, hjust = 0.5)) +
  scale_fill_manual(values = c("DEMOCRATIC" =  "#9999FF", "REPUBLICAN" = "#FF9999")) +
  annotate("text", x = 1, y = 60, label = "Slight Democratic tilt (53% vs 47%)",
           color = "blue", fontface = "bold", size = 3)




“Population Distribution by State or Territory (2023)”

# Create a bar plot of population distribution by state
ggplot(Political_Funding, aes(x = State, y = Population_2023, fill = Political_Candidate)) +
  geom_bar(stat = "identity") +
  labs(title = "Total Population by State or Territory (2023)",
       x = "State/Territory",
       y = "Total Population") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_y_continuous(labels = scales::comma) +
  scale_fill_manual(values = c("Trump" = "#FF9999", "Biden" = "#9999FF"))

CONCLUSION - QUESTION 2

The 2020 election comparison indicates that states won by Biden and the Democratic Party accounted for approximately 53% of total funding, while Republican-leaning states received about 47%. This distribution suggests a modest tilt toward Democratic states, implying that the allocation may align more closely with the political base of the Biden administration.

REFERENCES

The US Census Bureau population estimates for US states and territories (State and Territory Population dataset) was taken from the World Population Review website. The data is linked here: https://worldpopulationreview.com/states

The 2020 election results dataset was downloaded from Kaggle and is linked here: https://www.kaggle.com/datasets/callummacpherson14/2020-us-presidential-election-results-by-state?select=voting.csv