The attached Excel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory.
Your story (Data Visualization(s) ) should address the following questions:
Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?
Does the allocation favor the political interests of the Biden administration?
Notes:
You will need to source data on the current (estimated) population of each of the States and Territories (accuracy is more important than precision) and on the official election results of the 2020 Presidential election.
You may choose to develop you visualizations using a desktop application or a code library. Your submittal should be in the form of a report (document) or a presentation.
The visualization explores the relationship between total population and total funding in 2023, using a dataset that merges population figures with funding allocations across U.S. states and territories. A scatter plot presents this relationship, with each point representing a state: the x-axis shows total funding (in billions), the y-axis shows total population, and colors distinguish individual states. A regression line is included to highlight overall trends. To improve readability, scientific notation was removed from the y-axis labels, ensuring a clearer and more accessible view of the data.
Alongside the Infrastructure Investment and Jobs Act funding allocations, additional data was incorporated from U.S. Census Bureau population estimates and the 2020 election results dataset from Kaggle.
Together, these sources help address two key questions:
Is the funding allocation equitable relative to population, or does it reveal bias?
Does the allocation align with or favor the political interests of the Biden administration?
I have loaded necessary packages.
” library(readxl)
library(tidyverse)
library(kableExtra)
library(readr)
library(dplyr)
library(ggplot2)
library(plotly)
library(stringr) ”
# Set file path
file_path_xls <- "D:/Cuny_sps/DATA_608/Story-1/IIJA-FUNDING-AS-OF-MARCH-2023.xlsx"
# Read Excel file
df <- read_excel(file_path_xls)
# Save as CSV
write.csv(df, "D:/Cuny_sps/DATA_608/Story-1/IIJA_FUNDING_MARCH_2023.csv", row.names = FALSE)
# Load the IIIJA Funding data .csv file
IIIJA_Funding <- read.csv("D:/Cuny_sps/DATA_608/Story-1/IIJA_FUNDING_MARCH_2023.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))
#Load the State Population data
State_Pop <- read.csv("D:/Cuny_sps/DATA_608/Story-1/state-population-table.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))
#Load the Territory Population data
Territory_Pop <- read.csv("D:/Cuny_sps/DATA_608/Story-1/state-population-territory-table.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))
#Load the 2020 Presidential Election Results data from the provided GITHUB URL into a data frame named Territory_Pop.
Voting_Results <- read.csv("D:/Cuny_sps/DATA_608/Story-1/Presidential-Election-Voting-Results-2020.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))
Join the State_Pop and Territory_Pop datasets using merge command in R.
# Join the State_Pop and Territory_Pop datasets based on a the State column using the merge function .
State_Territory_Pop <- merge(State_Pop, Territory_Pop, by = "state", all = TRUE)
Rename columns
# Rename the columns in the IIIJA_Funding dataframe
IIIJA_Funding <- IIIJA_Funding %>%
rename(State = 'State..Teritory.or.Tribal.Nation', Total_Billions = 'Total..Billions.')
State_Territory_Pop <- State_Territory_Pop %>%
rename(State = 'state', Population_2024 = 'pop2024.x', Population_2023 = 'pop2023.x', Population_2020 = 'pop2020.x',
Growth_Rate = 'growthRate.x', Growth = 'growth.x', Rank = 'rank.x')
Voting_Results <- Voting_Results%>%
rename(State = 'state', Biden = biden_vote, Trump = trump_vote)
Remove columns that are not needed
State_Territory_Pop <- State_Territory_Pop %>%
select(-c(densityMi.x , fips.x, pop2019.x, pop2010.x, percent.x, growthSince2010.x, area.x, fips.y, densityMi.y, pop2024.y, pop2023.y, pop2020.y, pop2019.y, pop2010.y, growthRate.y, growth.y, growthSince2010.y, area.y, rank.y))
Voting_Results <- Voting_Results %>%
select(-c(trump_pct, biden_pct))
Convert State Column to proper case
# Convert values in the State column to proper case
IIIJA_Funding$State <- str_to_title(IIIJA_Funding$State)
Display the IIIJA Funding data ( First 6 records)
head(IIIJA_Funding) %>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 12) %>%
scroll_box(height = "300px", width = "100%")
State | Total_Billions |
---|---|
Alabama | 3.0000 |
Alaska | 3.7000 |
American Samoa | 0.0686 |
Arizona | 3.5000 |
Arkansas | 2.8000 |
California | 18.4000 |
“State and Territory Population Data (2020–2024
Estimates)”
#Display the State and TerritoryPopulation data
head(State_Territory_Pop) %>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 12) %>%
scroll_box(height = "300px", width = "100%")
State | Population_2024 | Population_2023 | Population_2020 | Growth_Rate | Growth | Rank | percent.y |
---|---|---|---|---|---|---|---|
Alabama | 5143033 | 5108468 | 5031864 | 0.00677 | 34565 | 24 | NA |
Alaska | 733536 | 733406 | 732964 | 0.00018 | 130 | 48 | NA |
Arizona | 7497004 | 7431344 | 7186683 | 0.00884 | 65660 | 14 | NA |
Arkansas | 3089060 | 3067732 | 3014348 | 0.00695 | 21328 | 33 | NA |
California | 38889770 | 38965193 | 39503200 | -0.00194 | -75423 | 1 | NA |
Colorado | 5914181 | 5877610 | 5785219 | 0.00622 | 36571 | 21 | NA |
“2020 Presidential Election Results by State”
#Display the 2020 Presidential Election data
head(Voting_Results) %>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 12) %>%
scroll_box(height = "300px", width = "100%")
State | state_abr | Trump | Biden | trump_win | biden_win |
---|---|---|---|---|---|
Alaska | AK | 189543 | 153502 | 1 | 0 |
Hawaii | HI | 196864 | 366130 | 0 | 1 |
Washington | WA | 1584651 | 2369612 | 0 | 1 |
Oregon | OR | 958448 | 1340383 | 0 | 1 |
California | CA | 5982194 | 11082293 | 0 | 1 |
Idaho | ID | 554128 | 287031 | 1 | 0 |
Combine the IIJA Funding data with the 2023 Population
data.
Population_Funding <- list(IIIJA_Funding,State_Territory_Pop)
Population_Funding <- Population_Funding %>% reduce(inner_join, by='State')
Combined <- Population_Funding %>%
ggplot(aes(x = Total_Billions, y = Population_2023)) +
geom_point(color = "gray50", size = 4, alpha = 0.6) +
geom_smooth(method = "lm", color = "#4F3C5F", se = FALSE) +
# Highlight outliers
geom_point(data = subset(Population_Funding, State %in% c("California","Texas","New York","Florida","Pennsylvania","Georgia", "Puerto Rico")),
aes(x = Total_Billions, y = Population_2023),
color = "darkorange", size = 5) +
geom_text(data = subset(Population_Funding, State %in% c("California","Texas","New York","Florida","Pennsylvania","Georgia","Puerto Rico")),
aes(label = State), vjust = -1, size = 3.5, color = "black") +
theme_bw() +
labs(title= "Total Population vs Total Funding (2023)",
x = "Total Funding (Billions)",
y = "Total Population") +
theme(plot.title = element_text(size=15, color="#281f30", face="bold"),
axis.title.x = element_text(face="bold", color="#281f30"),
axis.title.y = element_text(face="bold", color="#281f30"),
legend.position = "none") +
scale_y_continuous(labels = scales::comma)
ggplotly(Combined)
The results demonstrate a strong positive correlation between population size and funding allocation, suggesting that distribution largely scales with demographic magnitude. High-population states such as California, Texas, New York, and Florida emerge as expected outliers, receiving allocations that align proportionally with their population share.Therefore, funding is largely equitable relative to population.
VIEW 2020 PRESIDENTIAL ELECTION DATA
# Assuming Voting_Results is your data frame
Voting_Results_Combined <- Voting_Results %>%
gather(Political_Candidate, Total_Votes, Trump, Biden, factor_key = TRUE) %>%
mutate(Winner = ifelse(biden_win == 1, "DEMOCRATIC", "REPUBLICAN"))
#Remove the trump_win and biden_win columns because they are no longer needed
Voting_Results_Combined <- Voting_Results_Combined %>%
select(-c(trump_win, biden_win))
#Display the 2020 Presidential Election data
head(Voting_Results_Combined) %>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 12) %>%
scroll_box(height = "300px", width = "100%")
State | state_abr | Political_Candidate | Total_Votes | Winner |
---|---|---|---|---|
Alaska | AK | Trump | 189543 | REPUBLICAN |
Hawaii | HI | Trump | 196864 | DEMOCRATIC |
Washington | WA | Trump | 1584651 | DEMOCRATIC |
Oregon | OR | Trump | 958448 | DEMOCRATIC |
California | CA | Trump | 5982194 | DEMOCRATIC |
Idaho | ID | Trump | 554128 | REPUBLICAN |
“Percentage of IIJA Funding Allocated to Biden
vs. Trump States (2020 Election)”
Political_Funding = list(Population_Funding,Voting_Results_Combined)
Political_Funding <- Political_Funding %>% reduce(inner_join, by='State')
# Summarize funding by political alignment
Funding_By_Party <- Political_Funding %>%
group_by(Winner) %>%
summarize(Total_Fund = sum(Total_Billions)) %>%
mutate(Percent = round(100 * Total_Fund / sum(Total_Fund), 1))
# Create pie chart
ggplot(Funding_By_Party, aes(x = "", y = Percent, fill = Winner)) +
geom_col(width = 1, color = "white") +
coord_polar(theta = "y") +
geom_text(aes(label = paste0(Percent, "%")),
position = position_stack(vjust = 0.5),
color = "white", size = 6, fontface = "bold") +
labs(title = "IIJA Funding Allocation by 2020 Election Outcome" ,
subtitle = "Slight Democratic tilt (53% vs 47%)")+
theme_void() +
theme(plot.title = element_text(size=16, face="bold", hjust=0.5),
plot.subtitle = element_text(size=12, face="bold", color="black", hjust=0.5)) +
scale_fill_manual(values = c("DEMOCRATIC"="#9999FF","REPUBLICAN"="#FF9999"))
The 2020 election comparison indicates that states won by Biden and the Democratic Party accounted for approximately 53% of total funding, while Republican-leaning states received about 47%.Therefore, while slightly tilted toward Democrats, the allocation does not show extreme bias.
The US Census Bureau population estimates for US states and territories (State and Territory Population dataset) was taken from the World Population Review website. The data is linked here: https://worldpopulationreview.com/states
The 2020 election results dataset was downloaded from Kaggle and is linked here: https://www.kaggle.com/datasets/callummacpherson14/2020-us-presidential-election-results-by-state?select=voting.csv