This assignment is about to finding story (Data Visualization(s)) that covering the following questions: Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent? Does the allocation favor the political interests of the Biden administration?
For the conducting the analyses, the data regarding the Infrastructure Investment and Jobs Act funding (IIJA Funding-2023) by State and Territory has been provided. Additionally, I have collected two more datasets. First One is the population data based on US Census-2020 results available on US Census Bureau official website (https://www.census.gov/) and the second one is the US Presidential Election-2020 result data from US Federal Election Commission official website (https://www.fec.gov/). All the datasets are collected in Excel file format. Important to note that, US Presidential Election-2020 data are not applicable for US territories and Tribal communities. Therefore, for conducting political interests or party bias analysis, these areas have been excluded.
library(readxl)
library(dplyr)
library(stringr)
library(ggplot2)
df <- as.data.frame(read_excel("F:\\CUNY masters\\data608\\major assignment 01\\datasets\\IIJA FUNDING AS OF MARCH 2023.xlsx"))
head(df)
## State, Teritory or Tribal Nation Total (Billions)
## 1 ALABAMA 3.0000
## 2 ALASKA 3.7000
## 3 AMERICAN SAMOA 0.0686
## 4 ARIZONA 3.5000
## 5 ARKANSAS 2.8000
## 6 CALIFORNIA 18.4000
df1<-read_excel("F:\\CUNY masters\\data608\\major assignment 01\\datasets\\NST-EST2022-POP.xlsx")
## New names:
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
head(df1)
## # A tibble: 6 × 5
## table with row headers in column A and column he…¹ ...2 ...3 ...4 ...5
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Annual Estimates of the Resident Population for t… <NA> <NA> NA NA
## 2 Geographic Area Apri… Popu… NA NA
## 3 <NA> <NA> 2020 2.02e3 2.02e3
## 4 United States 3314… 3315… 3.32e8 3.33e8
## 5 Northeast 5760… 5744… 5.73e7 5.70e7
## 6 Midwest 6898… 6896… 6.88e7 6.88e7
## # ℹ abbreviated name:
## # ¹​`table with row headers in column A and column headers in rows 3 through 4. (leading dots indicate sub-parts)`
df5<-read_excel("F:\\CUNY masters\\data608\\major assignment 01\\datasets\\federalelections-2020.xlsx")
head(df5)
## # A tibble: 6 × 5
## State Party `Genaral %` `Winner Indicator` `Electoral Votes`
## <chr> <chr> <dbl> <chr> <chr>
## 1 Alabama R 0.620 W 9
## 2 Alaska R 0.528 W 3
## 3 Arizona D 0.494 W 11
## 4 Arkansas R 0.624 W 6
## 5 California D 0.635 W 55
## 6 Colorado D 0.554 W 9
# Change column names
colnames(df) <- c("State_Teritory_or_Tribal_Nation", "Total_Billions")
# Correct misspell name in column
df[9,"State_Teritory_or_Tribal_Nation"] <-"DELAWARE"
# Lowering letter from uppercase to lowercase while keeping only first letter uppercase of word
df$State_Teritory_or_Tribal_Nation <- str_to_title(tolower(df$State_Teritory_or_Tribal_Nation))
# Remove unnecessary columns 2,3,4
df2 <- select(df1, -2,-4,-5)
# Remove unnecessary rows 1,2,3,4,5,6,7,8,66,65,64,63,62
df3<-df2 <- slice(df2, -1,-2,-3,-4,-5,-6,-7,-8,-62,-63,-64,-65,-66)
# Remove the dots and renumber rows
df3[] <- lapply(df3, function(x) sub("^\\.", "", x))
df3<-data.frame(df3)
# Change column names
colnames(df3) <- c("State_Teritory_or_Tribal_Nation", "Population")
df3<-slice(df3,-52)
# Add a teritory name and population in two columns. Data source US Census Bureau 2020
df3[53, c("State_Teritory_or_Tribal_Nation", "Population")] <- c("American Samoa", 49710)
df3[54, c("State_Teritory_or_Tribal_Nation", "Population")] <- c("Northern Mariana Islands", 47329)
df3[55, c("State_Teritory_or_Tribal_Nation", "Population")] <- c("Guam", 153836)
df3[56, c("State_Teritory_or_Tribal_Nation", "Population")] <- c("US Virgin Islands", 87146)
df3[57, c("State_Teritory_or_Tribal_Nation", "Population")] <- c("Tribal Communities", 3700000)
# Convert the population column values to numeric
df3$Population <- as.numeric(df3$Population)
# Lowering letter from uppercase to lowercase while keeping only first letter uppercase of word
df3$State_Teritory_or_Tribal_Nation <- str_to_title(tolower(df3$State_Teritory_or_Tribal_Nation))
# order the values in alphabetically
df3[] <- df3[order(df3$State_Teritory_or_Tribal_Nation ), ]
# Join two data frames based on the "State_Teritory_or_Tribal_Nation" column
df4 <- inner_join(df, df3, by = "State_Teritory_or_Tribal_Nation")
# Add Per Capita Funding column to df4 by dividing the allocated funding by the population
df4 <- df4 %>% mutate(Per_Capita_Funding_Dollar = Total_Billions*1000000000 /Population)
df6<-select(df5,c("State","Party"))
df9<- df4 %>% rename(State = State_Teritory_or_Tribal_Nation)
# Join df7 and df6 dataframe based on State column
df8 <- inner_join(df9, df6, by = "State")
df4 %>% ggplot(aes(x= State_Teritory_or_Tribal_Nation, y= Per_Capita_Funding_Dollar)) + geom_bar(stat = "identity") + labs(title = "Unequitable Allocation of Per Capita Funding by State,Territory or Tribal Nation", x = "State,Territory or Tribal Nation", y = "Per Capita Funding in Dollar") + geom_hline(yintercept = median(df4$Per_Capita_Funding_Dollar), color = "blue", linetype = "dashed") + theme(axis.text.x = element_text(angle = 90, hjust =1))
# Create the bar plot
ggplot(df8, aes(x = State, y =Per_Capita_Funding_Dollar, fill = Party)) + geom_bar(stat = "identity") + labs(title = "Unbiased Per Capita Funding Allocation For States By Biden Administration", x = "State", y = "Per_Capita_Funding_Dollar") + scale_fill_manual(values = c("blue", "red")) + theme_minimal() + theme(axis.text.x = element_text(angle = 90, hjust =1))
From the two visualizations above, it is seen that the per capita funding allocation for US States, territories or Tribal Communities is not equitable. Republican led states seem getting more per capita funding allocation than democrat led states.Therefore, an unbiased per capita funding allocation for states under the Biden administration has been observed in this analysis.