DATA 608 - Infrastructure Investment Jobs Act Funding Allocation (Story 1)
INSTRUCTIONS
The providedExcel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory.
Your story (Data Visualization(s) ) should address the following questions:
Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?
Does the allocation favor the political interests of the Biden administration?
Notes:
You will need to source data on the current (estimated) population of each of the States and Territories (accuracy is more important than precision) and on the official election results of the 2020 Presidential election.
You may choose to develop you visualizations using a desktop application or a code library. Your submittal should be in the form of a report (document) or a presentation.
ABSTRACT
The following data visualization investigates the relationship between total population and total funding in the year 2023, focusing on a dataset that combines information on population and funding across different states. The data is visualized using a scatter plot, where each point represents a state, with the x-axis depicting total funding in billions and the y-axis representing the total population. The color of each point indicates the respective state. The analysis includes a regression line to discern trends in the data. Notably, efforts have been made to enhance the clarity of the plot by preventing scientific notation in the y-axis labels, contributing to a more accessible and informative representation of the relationship between population and funding.
In addition to the provided data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory, additional population estimates for US states and territories data was procured from the US Census Bureau website and the 2020 election results dataset from Kaggle.
All four dataset will assist in providing answers to the following questions:
Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?
Does the allocation favor the political interests of the Biden administration?
LOAD PACKAGES
The following code below loops through the list of necessary packages and checks to determine if each is installed. If the package is not found it is installed and loaded.
pkges <- c("readxl", "tidyverse", "kableExtra", "readr", "dplyr", "ggplot2", "plotly", "stringr")
# Loop through the packages
for (p in pkges) {
# Check if package is installed
if (!requireNamespace(p, quietly = TRUE)) {
install.packages(p) #If the package is not installed, install the package
library(p, character.only = TRUE) #Load the package
} else {
library(p, character.only = TRUE) #If the package is already installed, load the package
}
}LOAD DATA
# Load the IIIJA Funding data from the provided GITHUB URL into a data frame named IIIJA_Funding.
IIIJA_Funding <- read.csv("https://raw.githubusercontent.com/BeshkiaKvarnstrom/DATA-608-Knowledge-and-Visual-Analytics/main/IIJA-FUNDING-AS-OF-MARCH-2023-1.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))
#Load the State Population data from the provided GITHUB URL into a data frame named State_Pop.
State_Pop <- read.csv("https://raw.githubusercontent.com/BeshkiaKvarnstrom/DATA-608-Knowledge-and-Visual-Analytics/main/state-population-table.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))
#Load the Territory Population data from the provided GITHUB URL into a data frame named Territory_Pop.
Territory_Pop <- read.csv("https://raw.githubusercontent.com/BeshkiaKvarnstrom/DATA-608-Knowledge-and-Visual-Analytics/main/state-population-territory-table.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on
## 'https://raw.githubusercontent.com/BeshkiaKvarnstrom/DATA-608-Knowledge-and-Visual-Analytics/main/state-population-territory-table.csv'
#Load the 2020 Presidential Election Results data from the provided GITHUB URL into a data frame named Territory_Pop.
Voting_Results <- read.csv("https://raw.githubusercontent.com/BeshkiaKvarnstrom/DATA-608-Knowledge-and-Visual-Analytics/main/Presidential-Election-Voting-Results-2020.csv", header=T, stringsAsFactors = F, na.strings=c("","NA"))PREPROCESS/CLEAN EACH DATASET
Join the State_Pop and Territory_Pop datasets
# Join the State_Pop and Territory_Pop datasets based on a the State column using the merge function in R.
State_Territory_Pop <- merge(State_Pop, Territory_Pop, by = "state", all = TRUE)Rename columns
# Rename the columns in the IIIJA_Funding dataframe
IIIJA_Funding <- IIIJA_Funding %>%
rename(State = 'State..Teritory.or.Tribal.Nation', Total_Billions = 'Total..Billions.')
State_Territory_Pop <- State_Territory_Pop %>%
rename(State = 'state', Population_2024 = 'pop2024.x', Population_2023 = 'pop2023.x', Population_2020 = 'pop2020.x',
Growth_Rate = 'growthRate.x', Growth = 'growth.x', Rank = 'rank.x')
Voting_Results <- Voting_Results%>%
rename(State = 'state', Biden = biden_vote, Trump = trump_vote)Remove columns that are not needed
State_Territory_Pop <- State_Territory_Pop %>%
select(-c(densityMi.x , fips.x, pop2019.x, pop2010.x, percent.x, growthSince2010.x, area.x, fips.y, densityMi.y, pop2024.y, pop2023.y, pop2020.y, pop2019.y, pop2010.y, growthRate.y, growth.y, growthSince2010.y, area.y, rank.y))
Voting_Results <- Voting_Results %>%
select(-c(trump_pct, biden_pct))Convert State Column to proper case
VIEW THE DATASETS
#Display the IIIJA Funding data
head(IIIJA_Funding) %>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 12) %>%
scroll_box(height = "300px", width = "100%")| State | Total_Billions |
|---|---|
| Alabama | 3.0000 |
| Alaska | 3.7000 |
| American Samoa | 0.0686 |
| Arizona | 3.5000 |
| Arkansas | 2.8000 |
| California | 18.4000 |
#Display the State and TerritoryPopulation data
head(State_Territory_Pop) %>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 12) %>%
scroll_box(height = "300px", width = "100%")| State | Population_2024 | Population_2023 | Population_2020 | Growth_Rate | Growth | Rank | percent.y |
|---|---|---|---|---|---|---|---|
| Alabama | 5143033 | 5108468 | 5031864 | 0.00677 | 34565 | 24 | NA |
| Alaska | 733536 | 733406 | 732964 | 0.00018 | 130 | 48 | NA |
| Arizona | 7497004 | 7431344 | 7186683 | 0.00884 | 65660 | 14 | NA |
| Arkansas | 3089060 | 3067732 | 3014348 | 0.00695 | 21328 | 33 | NA |
| California | 38889770 | 38965193 | 39503200 | -0.00194 | -75423 | 1 | NA |
| Colorado | 5914181 | 5877610 | 5785219 | 0.00622 | 36571 | 21 | NA |
#Display the 2020 Presidential Election data
head(Voting_Results) %>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 12) %>%
scroll_box(height = "300px", width = "100%")| State | state_abr | Trump | Biden | trump_win | biden_win |
|---|---|---|---|---|---|
| Alaska | AK | 189543 | 153502 | 1 | 0 |
| Hawaii | HI | 196864 | 366130 | 0 | 1 |
| Washington | WA | 1584651 | 2369612 | 0 | 1 |
| Oregon | OR | 958448 | 1340383 | 0 | 1 |
| California | CA | 5982194 | 11082293 | 0 | 1 |
| Idaho | ID | 554128 | 287031 | 1 | 0 |
Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?
# Create a bar plot of IIJA Funding
ggplot(IIIJA_Funding, aes(x = State, y = Total_Billions, fill = State)) +
geom_bar(stat = "identity") +
labs(title = "Allocation of IIJA Funding by State and Territory As Of March 2023",
x = "State/Territory",
y = "Total Funding (Billions)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
guides(fill = "none") # Filter out rows with missing values in Population_2023
State_Territory_Pop <- State_Territory_Pop[complete.cases(State_Territory_Pop$Population_2023), ]
# Create a bar plot of population distribution by state
ggplot(State_Territory_Pop, aes(x = State, y = Population_2023, fill = State)) +
geom_bar(stat = "identity") +
labs(title = "Total Population by State or Territory For 2023",
x = "State/Territory",
y = "Total Population (Billions)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
guides(fill = "none") +
scale_y_continuous(labels = scales::comma)Combine the IIJA Funding data with the 2023 Population data
Popultion_Funding = list(IIIJA_Funding,State_Territory_Pop)
Popultion_Funding <- Popultion_Funding %>% reduce(inner_join, by='State')
# Create a plot of Total Population vs Total Funding
Combined <- Popultion_Funding %>%
ggplot(aes(x = Total_Billions, y = Population_2023)) +
geom_jitter(aes(color = State), size = 4, alpha = 0.6) +
theme_bw() +
labs(title= "Total Population vs Total Funding (For 2023)",
x = "Total Funding (Billions)",
y = "Total Population (Billions)") +
geom_smooth(method = "lm", color = "#4F3C5F", se = FALSE) +
theme(plot.title = element_text(size= 15, color = "#281f30",face = "bold"),
axis.title.x = element_text(face = "bold", color = "#281f30"),
axis.title.y = element_text(face = "bold", color = "#281f30"),
legend.position = "none")+
scale_y_continuous(labels = scales::comma)
ggplotly(Combined)CONCLUSION - QUESTION 1
The above data visualizations shows that there is a positive correlation between population size and the allocation of funds. This is evident where States with larger population sizes, such as California, Texas, New York and Florida received a larger allocation of funding.
Does the allocation favor the political interests of the Biden administration?
VIEW 2020 PRESIDENTIAL ELECTION DATA
# Assuming Voting_Results is your data frame
Voting_Results_Combined <- Voting_Results %>%
gather(Political_Candidate, Total_Votes, Trump, Biden, factor_key = TRUE) %>%
mutate(Winner = ifelse(biden_win == 1, "DEMOCRATIC", "REPUBLICAN"))
#Remove the trump_win and biden_win columns because they are no longer needed
Voting_Results_Combined <- Voting_Results_Combined %>%
select(-c(trump_win, biden_win))
#Display the 2020 Presidential Election data
head(Voting_Results_Combined) %>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 12) %>%
scroll_box(height = "300px", width = "100%")| State | state_abr | Political_Candidate | Total_Votes | Winner |
|---|---|---|---|---|
| Alaska | AK | Trump | 189543 | REPUBLICAN |
| Hawaii | HI | Trump | 196864 | DEMOCRATIC |
| Washington | WA | Trump | 1584651 | DEMOCRATIC |
| Oregon | OR | Trump | 958448 | DEMOCRATIC |
| California | CA | Trump | 5982194 | DEMOCRATIC |
| Idaho | ID | Trump | 554128 | REPUBLICAN |
Compare Trump vs Biden Vote by State
# Create a bar plot of IIJA Funding of votes by candidate
ggplot(Voting_Results_Combined, aes(x = State, y = Total_Votes, fill = Political_Candidate))+
geom_bar(stat = "identity") +
labs(title = "Compare Trump vs Biden Vote by State for 2020 Election",
x = "State/Territory",
y = "Total Vote by Candidate") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_y_continuous(labels = scales::comma)Create a new dataframe and combine the election data, the IIJA Funding data with the 2023 Population data.
Political_Funding = list(Popultion_Funding,Voting_Results_Combined)
Political_Funding <- Political_Funding %>% reduce(inner_join, by='State')
# Create a bar plot of IIJA Funding for each Political candidate
ggplot(Political_Funding, aes(x = State, y = Total_Billions, fill = Political_Candidate)) +
geom_bar(stat = "identity") +
labs(title = "Allocation of Funding by State (Biden & Trump)",
x = "State/Territory",
y = "Total Funding (Billions)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))# Create a bar plot of Percentage Allocation by candidate
Political_Funding %>%
group_by(Winner) %>%
summarize(Total_Fund = sum(Total_Billions)) %>%
mutate(fund_percent = Total_Fund / sum(Total_Fund )) %>%
ggplot(aes(x = Winner, y = fund_percent * 100, fill = Winner)) +
geom_bar(stat = "identity") +
labs(title = "Compare Trump vs Biden Percentage Allocation in the 2020 Election",
x = "Political Party",
y = "Percentage of Total Allocations") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_text(aes(label = paste0(round(fund_percent*100, 0),'%'), vjust = -0.2))# Create a bar plot of population distribution by state
ggplot(Political_Funding, aes(x = State, y = Population_2023, fill = Political_Candidate)) +
geom_bar(stat = "identity") +
labs(title = "Total Population by State or Territory For 2023",
x = "State/Territory",
y = "Total Population (Billions)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_y_continuous(labels = scales::comma)CONCLUSION - QUESTION 2
The visualization of the 2020 Election data shows that the Biden/Democratic party received higher funding (53%) while the Trump/Republican party received 47% funding. The data visualizations for question 2 shows that the allocation favored the political interests of the Biden administration because more funding went to the states that voted for Biden.
REFERENCES
The US Census Bureau population estimates for US states and territories (State and Territory Population dataset) was taken from the World Population Review website. The data is linked here: https://worldpopulationreview.com/states
The 2020 election results dataset was downloaded from Kaggle and is linked here: https://www.kaggle.com/datasets/callummacpherson14/2020-us-presidential-election-results-by-state?select=voting.csv