Infrastructure Investment and Jobs Act Funding Allocation

The purpose of this project is to show if the the allocation equitable based on the population of each of the States and Territories and present supportive evidence to show the allocation favor the political interest of Biden administration. We preprocess the data set in order to add the following addition to the data :

IJA funding data (already provided in your file).

State/Territory population estimates (can be sourced from official census data or recent population estimates).

2020 Presidential election results (available from official election sources, which indicate whether a state supported Joe Biden or Donald Trump).

library(readxl)
library(stringr)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Load Dataset and Transformation

The data preprocessing involved renaming some of the column names across all files to ensure consistency for merging. Additionally, we converted the string values to title case (capitalizing the first letter of each word) instead of using all capital letters, making the text format uniform across the datasets.

## # A tibble: 6 × 2
##   State          Total_Funding
##   <chr>                  <dbl>
## 1 Alabama               3     
## 2 Alaska                3.7   
## 3 American Samoa        0.0686
## 4 Arizona               3.5   
## 5 Arkansas              2.8   
## 6 California           18.4

The population data set was downloaded from the United Census Bureau in their archive file load. It cointains information about yearly population changes by States and Other US Regions

# load Population 
population <- read.csv('NST-EST2023-POPCHG2020_2023.csv', header = T)

pop_state <- population %>% filter(STATE >= 1 ) %>% select(NAME,POPESTIMATE2023)  %>% rename(
                                                                                                  State = NAME
                                                                                                  )
head(pop_state)
##        State POPESTIMATE2023
## 1    Alabama         5108468
## 2     Alaska          733406
## 3    Arizona         7431344
## 4   Arkansas         3067732
## 5 California        38965193
## 6   Colorado         5877610
# Big population 
pop_cat <- population %>% select(STATE,NAME,POPESTIMATE2023) %>% filter(STATE == 0 )  %>% rename(State_Number = STATE,State = NAME)

head(pop_cat)  
##   State_Number              State POPESTIMATE2023
## 1            0      United States       334914895
## 2            0   Northeast Region        56983517
## 3            0        New England        15159777
## 4            0    Middle Atlantic        41823740
## 5            0     Midwest Region        68909283
## 6            0 East North Central        47146039

Voting Data set was dowloaded from Havard Dataverse. It contains constituency (state-level) returns for elections to the U.S. presidency from 1976 to 2020.

voting <- read.csv('1976-2020-president.csv',header = T)
voting <- voting %>%  filter(year == 2020)
voting <- voting %>% select(year, state, state_cen,office, candidate,candidatevotes ,totalvotes,party_simplified)
head(voting)
##   year   state state_cen       office           candidate candidatevotes
## 1 2020 ALABAMA        63 US PRESIDENT BIDEN, JOSEPH R. JR         849624
## 2 2020 ALABAMA        63 US PRESIDENT    TRUMP, DONALD J.        1441170
## 3 2020 ALABAMA        63 US PRESIDENT       JORGENSEN, JO          25176
## 4 2020 ALABAMA        63 US PRESIDENT                               7312
## 5 2020  ALASKA        94 US PRESIDENT BIDEN, JOSEPH R. JR         153778
## 6 2020  ALASKA        94 US PRESIDENT    TRUMP, DONALD J.         189951
##   totalvotes party_simplified
## 1    2323282         DEMOCRAT
## 2    2323282       REPUBLICAN
## 3    2323282      LIBERTARIAN
## 4    2323282            OTHER
## 5     359530         DEMOCRAT
## 6     359530       REPUBLICAN
# transformation 

# Change capital letter to title form 
library(stringr)
voting$State <- str_to_title(voting$state)  
head(voting)
##   year   state state_cen       office           candidate candidatevotes
## 1 2020 ALABAMA        63 US PRESIDENT BIDEN, JOSEPH R. JR         849624
## 2 2020 ALABAMA        63 US PRESIDENT    TRUMP, DONALD J.        1441170
## 3 2020 ALABAMA        63 US PRESIDENT       JORGENSEN, JO          25176
## 4 2020 ALABAMA        63 US PRESIDENT                               7312
## 5 2020  ALASKA        94 US PRESIDENT BIDEN, JOSEPH R. JR         153778
## 6 2020  ALASKA        94 US PRESIDENT    TRUMP, DONALD J.         189951
##   totalvotes party_simplified   State
## 1    2323282         DEMOCRAT Alabama
## 2    2323282       REPUBLICAN Alabama
## 3    2323282      LIBERTARIAN Alabama
## 4    2323282            OTHER Alabama
## 5     359530         DEMOCRAT  Alaska
## 6     359530       REPUBLICAN  Alaska

We merged all the datasets together to create the final data frame. During data cleaning, we checked for typos in the string values and found a few that resulted in missing values (NA). For instance, “Delaware” was misspelled as “Deleware” in the funding dataset, and we also corrected “District of Columbia.”

# Change typo in the dataset 
# Change typo to delaware
funding<- funding %>%
  mutate(State = ifelse(State == "Deleware", "Delaware", State))

funding<- funding %>%
  mutate(State = ifelse(State == "District of Columbia", "District Of Columbia", State))

funding<- funding %>% 
  mutate(State = ifelse(State == "District of Columbia", "District Of Columbia", State))

pop_state <- pop_state %>%  mutate(State = ifelse(State == "District of Columbia", "District Of Columbia", State))

# Create first merger table 
table1_merger <- left_join(funding,pop_state, by='State')
table2 <- left_join(table1_merger,voting, by='State')
df <- table2 %>% select(State,Total_Funding,POPESTIMATE2023,year, candidate, candidatevotes, totalvotes, party_simplified) %>% filter(party_simplified == 'DEMOCRAT' | 
                                                                                                        party_simplified == 'REPUBLICAN') %>% rename(Population_Estimate = POPESTIMATE2023)

df$Population_millions = as.numeric(as.numeric(df$Population_Estimate/1e6))
df$candidatevotes_millions <- as.numeric(df$candidatevotes/1e6)
df$totalvotes_millions <- as.numeric(df$totalvotes/1e6)

head(df)
## # A tibble: 6 × 11
##   State   Total_Funding Population_Estimate  year candidate       candidatevotes
##   <chr>           <dbl>               <int> <int> <chr>                    <int>
## 1 Alabama           3               5108468  2020 BIDEN, JOSEPH …         849624
## 2 Alabama           3               5108468  2020 TRUMP, DONALD …        1441170
## 3 Alaska            3.7              733406  2020 BIDEN, JOSEPH …         153778
## 4 Alaska            3.7              733406  2020 TRUMP, DONALD …         189951
## 5 Arizona           3.5             7431344  2020 BIDEN, JOSEPH …        1672143
## 6 Arizona           3.5             7431344  2020 TRUMP, DONALD …        1661686
## # ℹ 5 more variables: totalvotes <int>, party_simplified <chr>,
## #   Population_millions <dbl>, candidatevotes_millions <dbl>,
## #   totalvotes_millions <dbl>
attach(df)

Is the allocation equitable based on the population of each of the States and Territories or is bias apparent?

Based on the scatter plot, smaller states consistently receive higher funding per capita, suggesting that the allocation may not be fully equitable based on population. Smaller states appear to receive a disproportionate amount of funding, while larger states, despite having more voters, are receiving less funding per capita. The points in the plot are colored by the number of votes the winning candidate received in the 2020 Presidential election for each state. This reveals that the majority of voters are in states with lower funding. To ensure fairness, allocations should be more consistent across all states to achieve equity.

ggplot(df, aes(x= Population_millions, y= Total_Funding/Population_millions)) +
  geom_point(aes(color=candidatevotes_millions)) + scale_x_log10() + scale_y_log10() +
  labs(title = "IIJA Funding Per Capita vs. Population",
       x = "Population (In Millions)",
       y = "Funding Per Capita (in Billions)") +
  theme_minimal()

df<- df %>%
  mutate(Winner = ifelse(candidate == "BIDEN, JOSEPH R. JR", "1",'0'))

df$fundingpercapita <- df$Total_Funding/df$Population_millions
# Adding another graph to support what we discuss in graph 1 
fs <- df %>% arrange(candidatevotes_millions) %>% tail(20) %>%  select(State,fundingpercapita,Total_Funding) %>% distinct()
ggplot(fs, aes(x = reorder(State, -fundingpercapita), y = fundingpercapita)) +
  geom_bar(stat = "identity", fill = "skyblue", color = "black") +
  labs(title = "Highest Funding Per Capita by State",
       x = "State",
       y = "Funding Per Capita (Billions per Capita)") +
  theme_minimal() +
  coord_flip()  #

ls <- df %>% arrange(candidatevotes_millions) %>% head(20) %>%  select(State,fundingpercapita,Total_Funding) %>% distinct()
ggplot(ls, aes(x = reorder(State, -fundingpercapita), y = fundingpercapita)) +
  geom_bar(stat = "identity", fill = "skyblue", color = "black") +
  labs(title = "Lowest Funding Per Capita by State",
       x = "State",
       y = "Funding Per Capita (Billions per Capita)") +
  theme_minimal() +
  coord_flip() 

** Does the allocation favor the political interests of the Biden administration?

The visualization of the 2020 Election suggests that states where Biden received more votes also received higher funding allocations. However, the dataset has limitations because it doesn’t differentiate total votes between candidates for each state. The Highest Funding by Candidate plot effectively compares the funding per capita between states that supported Biden and those that supported Trump, highlighting any potential bias in funding based on political support. Notably, in the 20 states with the most votes, there is a clear trend showing higher funding per capita in states with more voters. The 2020 election results also demonstrate how close the race for the presidential seat was. The third plot, which should display funding amounts instead of votes (with the y-axis representing funding in billions), visually shows whether states that supported Biden or Trump received more IIJA funding. The color coding of the candidates helps in analyzing whether funding distribution favors certain political groups.

t <- df %>% arrange(candidatevotes_millions) %>% head(20) %>%  select(candidate,fundingpercapita,Total_Funding, candidatevotes) %>% distinct()
as <- t %>% select(candidate,fundingpercapita) %>% group_by(candidate) %>% reframe(candidate,                                                                                  percapita = sum(fundingpercapita)) %>% distinct()

ggplot(as, aes(x = candidate, y = percapita)) +
  geom_bar(stat = "identity", fill = "skyblue", color = "black") +
  labs(title = "Highest Funding Per Capita by Candidate",
       x = "Candidate",
       y = "Funding Per Capita (Billions per Capita)") +
  theme_minimal() 

ggplot(df, aes(x = State, y = totalvotes_millions, fill = candidate))+
  geom_bar(stat = "identity") +
  labs(title = "Compare Trump vs Biden Vote by State for 2020 Election",
       x = "State/Territory",
       y = "Total Vote by Candidate") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_y_continuous(labels = scales::comma)

# Create a bar plot of IIJA Funding for each Political candidate
ggplot(df, aes(x = State, y = totalvotes_millions, fill = candidate)) +
  geom_bar(stat = "identity") +
  labs(title = "Allocation of Funding by State (Biden & Trump)",
       x = "State/Territory",
       y = "Total Funding (Billions)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))