DATA 608 - Story-1 : Infrastructure Investment & Jobs Act Funding Allocation :

Bikash Bhowmik —- 7 Sep 2025

Column

Column

Instruction

The attached Excel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory.

Your story (Data Visualization(s) ) should address the following questions:

  • Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?

  • Does the allocation favor the political interests of the Biden administration?

Notes:

  1. You will need to source data on the current (estimated) population of each of the States and Territories (accuracy is more important than precision) and on the official election results of the 2020 Presidential election.

  2. You may choose to develop you visualizations using a desktop application or a code library. Your submittal should be in the form of a report (document) or a presentation.

Introduction

The visualization explores the relationship between total population and total funding in 2023, using a dataset that merges population figures with funding allocations across U.S. states and territories. A scatter plot presents this relationship, with each point representing a state: the x-axis shows total funding (in billions), the y-axis shows total population, and colors distinguish individual states. A regression line is included to highlight overall trends. To improve readability, scientific notation was removed from the y-axis labels, ensuring a clearer and more accessible view of the data.

Alongside the Infrastructure Investment and Jobs Act funding allocations, additional data was incorporated from U.S. Census Bureau population estimates and the 2020 election results dataset from Kaggle.

Together, these sources help address two key questions:

  • Is the funding allocation equitable relative to population, or does it reveal bias?

  • Does the allocation align with or favor the political interests of the Biden administration?

Load Packages

I have loaded necessary packages.

” library(readxl)

library(tidyverse)

library(kableExtra)

library(readr)

library(dplyr)

library(ggplot2)

library(plotly)

library(stringr) ”

Load Data

# Set file path
file_path_xls <- "D:/Cuny_sps/DATA_608/Story-1/IIJA-FUNDING-AS-OF-MARCH-2023.xlsx"

# Read Excel file 
df <- read_excel(file_path_xls)

# Save as CSV
write.csv(df, "D:/Cuny_sps/DATA_608/Story-1/IIJA_FUNDING_MARCH_2023.csv", row.names = FALSE)


# Load the IIIJA Funding data .csv file
IIIJA_Funding <- read.csv("D:/Cuny_sps/DATA_608/Story-1/IIJA_FUNDING_MARCH_2023.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))

#Load the State Population data 
State_Pop <- read.csv("D:/Cuny_sps/DATA_608/Story-1/state-population-table.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))

#Load the Territory Population data 
Territory_Pop <- read.csv("D:/Cuny_sps/DATA_608/Story-1/state-population-territory-table.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))

#Load the 2020 Presidential Election Results data from the provided GITHUB URL into a data frame named Territory_Pop.
Voting_Results <- read.csv("D:/Cuny_sps/DATA_608/Story-1/Presidential-Election-Voting-Results-2020.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))

Clean Dataset

Join the State_Pop and Territory_Pop datasets using merge command in R.

# Join the State_Pop and Territory_Pop datasets based on a the State column using the merge function .
State_Territory_Pop <- merge(State_Pop, Territory_Pop, by = "state", all = TRUE)

Rename columns

# Rename the columns in the IIIJA_Funding dataframe
IIIJA_Funding <- IIIJA_Funding %>%
  rename(State = 'State..Teritory.or.Tribal.Nation', Total_Billions = 'Total..Billions.')


State_Territory_Pop <- State_Territory_Pop %>%
  rename(State = 'state', Population_2024 = 'pop2024.x', Population_2023 = 'pop2023.x', Population_2020 = 'pop2020.x',
         Growth_Rate = 'growthRate.x', Growth = 'growth.x', Rank =  'rank.x')

Voting_Results <- Voting_Results%>%
  rename(State = 'state', Biden = biden_vote, Trump = trump_vote)

Remove columns that are not needed

State_Territory_Pop <- State_Territory_Pop %>%
  select(-c(densityMi.x , fips.x, pop2019.x, pop2010.x, percent.x, growthSince2010.x, area.x, fips.y, densityMi.y, pop2024.y, pop2023.y, pop2020.y, pop2019.y, pop2010.y, growthRate.y, growth.y, growthSince2010.y, area.y, rank.y))

Voting_Results <- Voting_Results %>%
  select(-c(trump_pct, biden_pct))

Convert State Column to proper case

# Convert values in the State column to proper case
IIIJA_Funding$State <- str_to_title(IIIJA_Funding$State)

View the Datasets

Display the IIIJA Funding data ( First 6 records)

head(IIIJA_Funding)  %>% kable() %>% 
kable_styling(bootstrap_options = "striped", font_size = 12) %>% 
scroll_box(height = "300px", width = "100%")
State Total_Billions
Alabama 3.0000
Alaska 3.7000
American Samoa 0.0686
Arizona 3.5000
Arkansas 2.8000
California 18.4000




“State and Territory Population Data (2020–2024 Estimates)”

#Display the State and TerritoryPopulation data
head(State_Territory_Pop)  %>% kable() %>% 
kable_styling(bootstrap_options = "striped", font_size = 12) %>% 
scroll_box(height = "300px", width = "100%")
State Population_2024 Population_2023 Population_2020 Growth_Rate Growth Rank percent.y
Alabama 5143033 5108468 5031864 0.00677 34565 24 NA
Alaska 733536 733406 732964 0.00018 130 48 NA
Arizona 7497004 7431344 7186683 0.00884 65660 14 NA
Arkansas 3089060 3067732 3014348 0.00695 21328 33 NA
California 38889770 38965193 39503200 -0.00194 -75423 1 NA
Colorado 5914181 5877610 5785219 0.00622 36571 21 NA




“2020 Presidential Election Results by State”

#Display the 2020 Presidential Election data
head(Voting_Results)  %>% kable() %>% 
kable_styling(bootstrap_options = "striped", font_size = 12) %>% 
scroll_box(height = "300px", width = "100%")
State state_abr Trump Biden trump_win biden_win
Alaska AK 189543 153502 1 0
Hawaii HI 196864 366130 0 1
Washington WA 1584651 2369612 0 1
Oregon OR 958448 1340383 0 1
California CA 5982194 11082293 0 1
Idaho ID 554128 287031 1 0

  • Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?



Combine the IIJA Funding data with the 2023 Population data.

Population_Funding <- list(IIIJA_Funding,State_Territory_Pop) 
Population_Funding <- Population_Funding %>% reduce(inner_join, by='State')

Combined <- Population_Funding %>% 
  ggplot(aes(x = Total_Billions, y = Population_2023)) +
  geom_point(color = "gray50", size = 4, alpha = 0.6) +
  geom_smooth(method = "lm", color = "#4F3C5F", se = FALSE) +
  # Highlight outliers
  geom_point(data = subset(Population_Funding, State %in% c("California","Texas","New York","Florida","Pennsylvania","Georgia", "Puerto Rico")),
             aes(x = Total_Billions, y = Population_2023), 
             color = "darkorange", size = 5) +
  geom_text(data = subset(Population_Funding, State %in% c("California","Texas","New York","Florida","Pennsylvania","Georgia","Puerto Rico")),
            aes(label = State), vjust = -1, size = 3.5, color = "black") +
  theme_bw() +
  labs(title= "Total Population vs Total Funding (2023)", 
       x = "Total Funding (Billions)",
       y = "Total Population") +
  theme(plot.title = element_text(size=15, color="#281f30", face="bold"), 
        axis.title.x = element_text(face="bold", color="#281f30"), 
        axis.title.y = element_text(face="bold", color="#281f30"),
        legend.position = "none") +
  scale_y_continuous(labels = scales::comma)

ggplotly(Combined)

CONCLUSION - QUESTION 1

The results demonstrate a strong positive correlation between population size and funding allocation, suggesting that distribution largely scales with demographic magnitude. High-population states such as California, Texas, New York, and Florida emerge as expected outliers, receiving allocations that align proportionally with their population share.Therefore, funding is largely equitable relative to population.




  • Does the allocation favor the political interests of the Biden administration?

VIEW 2020 PRESIDENTIAL ELECTION DATA

# Assuming Voting_Results is your data frame
Voting_Results_Combined <- Voting_Results %>%
  gather(Political_Candidate, Total_Votes, Trump, Biden, factor_key = TRUE) %>%
  mutate(Winner = ifelse(biden_win == 1, "DEMOCRATIC", "REPUBLICAN"))


#Remove the trump_win and biden_win columns because they are no longer needed
Voting_Results_Combined <- Voting_Results_Combined %>%
  select(-c(trump_win, biden_win))

#Display the 2020 Presidential Election data
head(Voting_Results_Combined)  %>% kable() %>% 
kable_styling(bootstrap_options = "striped", font_size = 12) %>% 
scroll_box(height = "300px", width = "100%")
State state_abr Political_Candidate Total_Votes Winner
Alaska AK Trump 189543 REPUBLICAN
Hawaii HI Trump 196864 DEMOCRATIC
Washington WA Trump 1584651 DEMOCRATIC
Oregon OR Trump 958448 DEMOCRATIC
California CA Trump 5982194 DEMOCRATIC
Idaho ID Trump 554128 REPUBLICAN




“Percentage of IIJA Funding Allocated to Biden vs. Trump States (2020 Election)”

Political_Funding = list(Population_Funding,Voting_Results_Combined) 
Political_Funding <- Political_Funding %>% reduce(inner_join, by='State')

# Summarize funding by political alignment
Funding_By_Party <- Political_Funding %>%
  group_by(Winner) %>%
  summarize(Total_Fund = sum(Total_Billions)) %>%
  mutate(Percent = round(100 * Total_Fund / sum(Total_Fund), 1))


# Create pie chart
ggplot(Funding_By_Party, aes(x = "", y = Percent, fill = Winner)) +
  geom_col(width = 1, color = "white") +
  coord_polar(theta = "y") +
  geom_text(aes(label = paste0(Percent, "%")), 
            position = position_stack(vjust = 0.5), 
            color = "white", size = 6, fontface = "bold") +
  labs(title = "IIJA Funding Allocation by 2020 Election Outcome" ,
        subtitle = "Slight Democratic tilt (53% vs 47%)")+
  theme_void() +
  theme(plot.title = element_text(size=16, face="bold", hjust=0.5),
        plot.subtitle = element_text(size=12, face="bold", color="black", hjust=0.5)) +
  scale_fill_manual(values = c("DEMOCRATIC"="#9999FF","REPUBLICAN"="#FF9999"))

CONCLUSION - QUESTION 2

The 2020 election comparison indicates that states won by Biden and the Democratic Party accounted for approximately 53% of total funding, while Republican-leaning states received about 47%.Therefore, while slightly tilted toward Democrats, the allocation does not show extreme bias.

REFERENCES

The US Census Bureau population estimates for US states and territories (State and Territory Population dataset) was taken from the World Population Review website. The data is linked here: https://worldpopulationreview.com/states

The 2020 election results dataset was downloaded from Kaggle and is linked here: https://www.kaggle.com/datasets/callummacpherson14/2020-us-presidential-election-results-by-state?select=voting.csv