The attached Excel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory.
Your story (Data Visualization(s) ) should address the following questions:
Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?
Does the allocation favor the political interests of the Biden administration?
Notes:
You will need to source data on the current (estimated) population of each of the States and Territories (accuracy is more important than precision) and on the official election results of the 2020 Presidential election.
You may choose to develop you visualizations using a desktop application or a code library. Your submittal should be in the form of a report (document) or a presentation.
The visualization explores the relationship between total population and total funding in 2023, using a dataset that merges population figures with funding allocations across U.S. states and territories. A scatter plot presents this relationship, with each point representing a state: the x-axis shows total funding (in billions), the y-axis shows total population, and colors distinguish individual states. A regression line is included to highlight overall trends. To improve readability, scientific notation was removed from the y-axis labels, ensuring a clearer and more accessible view of the data.
Alongside the Infrastructure Investment and Jobs Act funding allocations, additional data was incorporated from U.S. Census Bureau population estimates and the 2020 election results dataset from Kaggle.
Together, these sources help address two key questions:
Is the funding allocation equitable relative to population, or does it reveal bias?
Does the allocation align with or favor the political interests of the Biden administration?
I have loaded necessary packages.
” library(readxl)
library(tidyverse)
library(kableExtra)
library(readr)
library(dplyr)
library(ggplot2)
library(plotly)
library(stringr) ”
# Set file path
file_path_xls <- "D:/Cuny_sps/DATA_608/Story-1/IIJA-FUNDING-AS-OF-MARCH-2023.xlsx"
# Read Excel file
df <- read_excel(file_path_xls)
# Save as CSV
write.csv(df, "D:/Cuny_sps/DATA_608/Story-1/IIJA_FUNDING_MARCH_2023.csv", row.names = FALSE)
# Load the IIIJA Funding data .csv file
IIIJA_Funding <- read.csv("D:/Cuny_sps/DATA_608/Story-1/IIJA_FUNDING_MARCH_2023.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))
#Load the State Population data
State_Pop <- read.csv("D:/Cuny_sps/DATA_608/Story-1/state-population-table.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))
#Load the Territory Population data
Territory_Pop <- read.csv("D:/Cuny_sps/DATA_608/Story-1/state-population-territory-table.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))
#Load the 2020 Presidential Election Results data from the provided GITHUB URL into a data frame named Territory_Pop.
Voting_Results <- read.csv("D:/Cuny_sps/DATA_608/Story-1/Presidential-Election-Voting-Results-2020.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))
Join the State_Pop and Territory_Pop datasets using merge command in R.
# Join the State_Pop and Territory_Pop datasets based on a the State column using the merge function .
State_Territory_Pop <- merge(State_Pop, Territory_Pop, by = "state", all = TRUE)
Rename columns
# Rename the columns in the IIIJA_Funding dataframe
IIIJA_Funding <- IIIJA_Funding %>%
rename(State = 'State..Teritory.or.Tribal.Nation', Total_Billions = 'Total..Billions.')
State_Territory_Pop <- State_Territory_Pop %>%
rename(State = 'state', Population_2024 = 'pop2024.x', Population_2023 = 'pop2023.x', Population_2020 = 'pop2020.x',
Growth_Rate = 'growthRate.x', Growth = 'growth.x', Rank = 'rank.x')
Voting_Results <- Voting_Results%>%
rename(State = 'state', Biden = biden_vote, Trump = trump_vote)
Remove columns that are not needed
State_Territory_Pop <- State_Territory_Pop %>%
select(-c(densityMi.x , fips.x, pop2019.x, pop2010.x, percent.x, growthSince2010.x, area.x, fips.y, densityMi.y, pop2024.y, pop2023.y, pop2020.y, pop2019.y, pop2010.y, growthRate.y, growth.y, growthSince2010.y, area.y, rank.y))
Voting_Results <- Voting_Results %>%
select(-c(trump_pct, biden_pct))
Convert State Column to proper case
# Convert values in the State column to proper case
IIIJA_Funding$State <- str_to_title(IIIJA_Funding$State)
Display the IIIJA Funding data ( First 6 records)
head(IIIJA_Funding) %>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 12) %>%
scroll_box(height = "300px", width = "100%")
State | Total_Billions |
---|---|
Alabama | 3.0000 |
Alaska | 3.7000 |
American Samoa | 0.0686 |
Arizona | 3.5000 |
Arkansas | 2.8000 |
California | 18.4000 |
“State and Territory Population Data (2020–2024
Estimates)”
#Display the State and TerritoryPopulation data
head(State_Territory_Pop) %>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 12) %>%
scroll_box(height = "300px", width = "100%")
State | Population_2024 | Population_2023 | Population_2020 | Growth_Rate | Growth | Rank | percent.y |
---|---|---|---|---|---|---|---|
Alabama | 5143033 | 5108468 | 5031864 | 0.00677 | 34565 | 24 | NA |
Alaska | 733536 | 733406 | 732964 | 0.00018 | 130 | 48 | NA |
Arizona | 7497004 | 7431344 | 7186683 | 0.00884 | 65660 | 14 | NA |
Arkansas | 3089060 | 3067732 | 3014348 | 0.00695 | 21328 | 33 | NA |
California | 38889770 | 38965193 | 39503200 | -0.00194 | -75423 | 1 | NA |
Colorado | 5914181 | 5877610 | 5785219 | 0.00622 | 36571 | 21 | NA |
“2020 Presidential Election Results by State”
#Display the 2020 Presidential Election data
head(Voting_Results) %>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 12) %>%
scroll_box(height = "300px", width = "100%")
State | state_abr | Trump | Biden | trump_win | biden_win |
---|---|---|---|---|---|
Alaska | AK | 189543 | 153502 | 1 | 0 |
Hawaii | HI | 196864 | 366130 | 0 | 1 |
Washington | WA | 1584651 | 2369612 | 0 | 1 |
Oregon | OR | 958448 | 1340383 | 0 | 1 |
California | CA | 5982194 | 11082293 | 0 | 1 |
Idaho | ID | 554128 | 287031 | 1 | 0 |
# Create a bar plot of IIJA Funding
ggplot(IIIJA_Funding, aes(x = State, y = Total_Billions)) +
geom_bar(stat = "identity", fill = "gray50") +
geom_text(aes(label = case_when(
State == "New York" ~ "NY",
State == "California" ~ "CA",
State == "Texas" ~ "TX",
State == "Florida" ~ "FL",
State == "Illinois" ~ "IL",
TRUE ~ ""
)),
vjust = -0.5, color = "#4F3C5F", fontface = "bold") +
labs(title = "Allocation of IIJA Funding by State and Territory As Of March 2023",
x = "State/Territory",
y = "Total Funding (Billions)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Distribution of IIJA Funding Across States and Territories (2023)
# Reorder states by descending Total_Billions
IIIJA_Funding <- IIIJA_Funding %>%
mutate(State = reorder(State, -Total_Billions))
# Create a bar plot of IIJA Funding sorted
ggplot(IIIJA_Funding, aes(x = State, y = Total_Billions)) +
geom_bar(stat = "identity", fill = "gray50") +
geom_text(aes(label = case_when(
State == "New York" ~ "NY",
State == "California" ~ "CA",
State == "Texas" ~ "TX",
State == "Florida" ~ "FL",
State == "Illinois" ~ "IL",
TRUE ~ ""
)),
vjust = -0.5, color = "#4F3C5F", fontface = "bold") +
labs(title = "Allocation of IIJA Funding by State and Territory As Of March 2023",
x = "State/Territory",
y = "Total Funding (Billions)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Population Distribution Across States and Territories (2023)
# Filter out rows with missing values in Population_2023
State_Territory_Pop <- State_Territory_Pop[complete.cases(State_Territory_Pop$Population_2023), ]
# Create a bar plot of population distribution by state
ggplot(State_Territory_Pop, aes(x = State, y = Population_2023, fill = State)) +
geom_bar(stat = "identity" , fill = "gray50") +
geom_text(aes(label = case_when(
State == "New York" ~ "NY",
State == "California" ~ "CA",
State == "Texas" ~ "TX",
State == "Florida" ~ "FL",
State == "Illinois" ~ "IL",
TRUE ~ ""
)),
vjust = -0.5, color = "#4F3C5F", fontface = "bold") +
labs(title = "Total Population by State or Territory For 2023",
x = "State/Territory",
y = "Total Population (Billions)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
guides(fill = "none") +
scale_y_continuous(labels = scales::comma)
Combine the IIJA Funding data with the 2023
Population data.
Population_Funding <- list(IIIJA_Funding,State_Territory_Pop)
Population_Funding <- Population_Funding %>% reduce(inner_join, by='State')
Combined <- Population_Funding %>%
ggplot(aes(x = Total_Billions, y = Population_2023)) +
geom_point(color = "gray50", size = 4, alpha = 0.6) +
geom_smooth(method = "lm", color = "#4F3C5F", se = FALSE) +
# Highlight outliers
geom_point(data = subset(Population_Funding, State %in% c("California","Texas","New York","Florida","Puerto Rico")),
aes(x = Total_Billions, y = Population_2023),
color = "darkorange", size = 5) +
geom_text(data = subset(Population_Funding, State %in% c("California","Texas","New York","Florida","Puerto Rico")),
aes(label = State), vjust = -1, size = 3.5, color = "black") +
theme_bw() +
labs(title= "Total Population vs Total Funding (2023)",
x = "Total Funding (Billions)",
y = "Total Population") +
theme(plot.title = element_text(size=15, color="#281f30", face="bold"),
axis.title.x = element_text(face="bold", color="#281f30"),
axis.title.y = element_text(face="bold", color="#281f30"),
legend.position = "none") +
scale_y_continuous(labels = scales::comma)
ggplotly(Combined)
The results demonstrate a strong positive correlation between population size and funding allocation, suggesting that distribution largely scales with demographic magnitude. High-population states such as California, Texas, New York, and Florida emerge as expected outliers, receiving allocations that align proportionally with their population share.
VIEW 2020 PRESIDENTIAL ELECTION DATA
# Assuming Voting_Results is your data frame
Voting_Results_Combined <- Voting_Results %>%
gather(Political_Candidate, Total_Votes, Trump, Biden, factor_key = TRUE) %>%
mutate(Winner = ifelse(biden_win == 1, "DEMOCRATIC", "REPUBLICAN"))
#Remove the trump_win and biden_win columns because they are no longer needed
Voting_Results_Combined <- Voting_Results_Combined %>%
select(-c(trump_win, biden_win))
#Display the 2020 Presidential Election data
head(Voting_Results_Combined) %>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 12) %>%
scroll_box(height = "300px", width = "100%")
State | state_abr | Political_Candidate | Total_Votes | Winner |
---|---|---|---|---|
Alaska | AK | Trump | 189543 | REPUBLICAN |
Hawaii | HI | Trump | 196864 | DEMOCRATIC |
Washington | WA | Trump | 1584651 | DEMOCRATIC |
Oregon | OR | Trump | 958448 | DEMOCRATIC |
California | CA | Trump | 5982194 | DEMOCRATIC |
Idaho | ID | Trump | 554128 | REPUBLICAN |
Compare Trump vs Biden Vote by State
# Create a bar plot of IIJA Funding of votes by candidate
ggplot(Voting_Results_Combined, aes(x = State, y = Total_Votes, fill = Political_Candidate)) +
geom_bar(stat = "identity") +
labs(title = "Compare Trump vs Biden Vote by State (2020 Election)",
x = "State/Territory",
y = "Total Votes") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_y_continuous(labels = scales::comma) +
scale_fill_manual(values = c("Trump" = "#FF9999", # light red
"Biden" = "#9999FF")) # light blue
Create a new dataframe and combine the election data,
the IIJA Funding data with the 2023 Population data.
Political_Funding = list(Population_Funding,Voting_Results_Combined)
Political_Funding <- Political_Funding %>% reduce(inner_join, by='State')
# Create a bar plot of IIJA Funding for each Political candidate
ggplot(Political_Funding, aes(x = State, y = Total_Billions, fill = Political_Candidate)) +
geom_bar(stat = "identity") +
labs(title = "Allocation of Funding by State (Biden & Trump)",
x = "State/Territory",
y = "Total Funding (Billions)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_manual(values = c("Trump" = "#FF9999", # light red
"Biden" = "#9999FF")) # light blue
“Percentage of IIJA Funding Allocated to Biden
vs. Trump States (2020 Election)”
# Create a bar plot of Percentage Allocation by candidate
Political_Funding %>%
group_by(Winner) %>%
summarize(Total_Fund = sum(Total_Billions)) %>%
mutate(fund_percent = Total_Fund / sum(Total_Fund)) %>%
ggplot(aes(x = Winner, y = fund_percent * 100, fill = Winner)) +
geom_bar(stat = "identity") +
geom_text(aes(label = paste0(round(fund_percent*100, 0), "%")),
vjust = 1.5, color = "white", fontface = "bold", size = 5) +
labs(title = "Percentage of IIJA Funding Allocated to Biden vs Trump States",
x = "Political Party",
y = "Percentage of Total Allocation") +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5)) +
scale_fill_manual(values = c("DEMOCRATIC" = "#9999FF", "REPUBLICAN" = "#FF9999")) +
annotate("text", x = 1, y = 60, label = "Slight Democratic tilt (53% vs 47%)",
color = "blue", fontface = "bold", size = 3)
“Population Distribution by State or Territory
(2023)”
# Create a bar plot of population distribution by state
ggplot(Political_Funding, aes(x = State, y = Population_2023, fill = Political_Candidate)) +
geom_bar(stat = "identity") +
labs(title = "Total Population by State or Territory (2023)",
x = "State/Territory",
y = "Total Population") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_y_continuous(labels = scales::comma) +
scale_fill_manual(values = c("Trump" = "#FF9999", "Biden" = "#9999FF"))
The 2020 election comparison indicates that states won by Biden and the Democratic Party accounted for approximately 53% of total funding, while Republican-leaning states received about 47%. This distribution suggests a modest tilt toward Democratic states, implying that the allocation may align more closely with the political base of the Biden administration.
The US Census Bureau population estimates for US states and territories (State and Territory Population dataset) was taken from the World Population Review website. The data is linked here: https://worldpopulationreview.com/states
The 2020 election results dataset was downloaded from Kaggle and is linked here: https://www.kaggle.com/datasets/callummacpherson14/2020-us-presidential-election-results-by-state?select=voting.csv