Story - 1 : Infrastructure Investment & Jobs Act Funding Allocation
The attached Excel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory. Your story (Data Visualization(s) ) should address the following questions:
Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?
Does the allocation favor the political interests of the Biden administration?
library("tidyverse")
library("dplyr")
library("plotly")
The file named IIJA FUNDING AS OF MARCH 2023.xlsx was converted to a csv file and uploaded to GitHub. The population of each of the States and Territories from the official election results of the 2020 Presidential election was accessed using the following link. The table - 2020 PRESIDENTIAL ELECTORAL AND POPULAR VOTE from Page 12 was extracted with Excel. The ELECTORAL VOTE column was omitted because of low volume. The Popular Vote column was the focus of the analysis.
https://www.fec.gov/resources/cms-content/documents/federalelections2020.pdf
funding <- read_csv("https://raw.githubusercontent.com/tonyCUNY/tonyCUNY/main/IIJA%20FUNDING%20AS%20OF%20MARCH%202023.csv")
voter <- read_csv("https://raw.githubusercontent.com/tonyCUNY/tonyCUNY/main/2020_PRESIDENTIAL_POPULAR%20VOTE.csv")
pop <- read_csv("https://www2.census.gov/programs-surveys/popest/datasets/2020-2022/state/totals/NST-EST2022-POPCHG2020_2022.csv")
head(funding)
## # A tibble: 6 × 2
## `State, Teritory or Tribal Nation` `Total (Billions)`
## <chr> <dbl>
## 1 ALABAMA 3
## 2 ALASKA 3.7
## 3 AMERICAN SAMOA 0.0686
## 4 ARIZONA 3.5
## 5 ARKANSAS 2.8
## 6 CALIFORNIA 18.4
head(voter)
## # A tibble: 6 × 5
## STATE Biden Trump `All Other` `Total Vote`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AL 849624 1441170 32488 2323282
## 2 AK 153778 189951 15801 359530
## 3 AZ 1672143 1661686 53497 3387326
## 4 AR 423932 760647 34490 1219069
## 5 CA 11110639 6006518 384223 17501380
## 6 CO 1804352 1364607 88021 3256980
head(pop)
## # A tibble: 6 × 25
## SUMLEV REGION DIVISION STATE NAME ESTIMATESBASE2020 POPESTIMATE2020
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 010 0 0 00 United States 331449520 331511512
## 2 020 1 0 00 Northeast Regi… 57609156 57448898
## 3 030 1 1 00 New England 15116206 15074473
## 4 030 1 2 00 Middle Atlantic 42492950 42374425
## 5 020 2 0 00 Midwest Region 68985537 68961043
## 6 030 2 3 00 East North Cen… 47368637 47338744
## # ℹ 18 more variables: POPESTIMATE2021 <dbl>, POPESTIMATE2022 <dbl>,
## # NPOPCHG_2020 <dbl>, NPOPCHG_2021 <dbl>, NPOPCHG_2022 <dbl>,
## # PPOPCHG_2020 <dbl>, PPOPCHG_2021 <dbl>, PPOPCHG_2022 <dbl>,
## # NRANK_ESTBASE2020 <chr>, NRANK_POPEST2020 <chr>, NRANK_POPEST2021 <chr>,
## # NRANK_POPEST2022 <chr>, NRANK_NPCHG2020 <chr>, NRANK_NPCHG2021 <chr>,
## # NRANK_NPCHG2022 <chr>, NRANK_PPCHG2020 <chr>, NRANK_PPCHG2021 <chr>,
## # NRANK_PPCHG2022 <chr>
funding_2 <- funding |>
rename(State = `State, Teritory or Tribal Nation`) |>
rename(Total = "Total (Billions)")
voter_2 <- voter |>
mutate(State_full = toupper(state.name[match(STATE, state.abb)])) |>
relocate(State_full) |>
select(-STATE) |>
rename(State = State_full)
pop_2 <- pop |>
select(NAME, POPESTIMATE2020) |>
rename(State = NAME) |>
mutate(State = toupper(State)) |>
rename(Population = POPESTIMATE2020)
head(funding_2)
## # A tibble: 6 × 2
## State Total
## <chr> <dbl>
## 1 ALABAMA 3
## 2 ALASKA 3.7
## 3 AMERICAN SAMOA 0.0686
## 4 ARIZONA 3.5
## 5 ARKANSAS 2.8
## 6 CALIFORNIA 18.4
head(voter_2)
## # A tibble: 6 × 5
## State Biden Trump `All Other` `Total Vote`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 ALABAMA 849624 1441170 32488 2323282
## 2 ALASKA 153778 189951 15801 359530
## 3 ARIZONA 1672143 1661686 53497 3387326
## 4 ARKANSAS 423932 760647 34490 1219069
## 5 CALIFORNIA 11110639 6006518 384223 17501380
## 6 COLORADO 1804352 1364607 88021 3256980
head(pop_2)
## # A tibble: 6 × 2
## State Population
## <chr> <dbl>
## 1 UNITED STATES 331511512
## 2 NORTHEAST REGION 57448898
## 3 NEW ENGLAND 15074473
## 4 MIDDLE ATLANTIC 42374425
## 5 MIDWEST REGION 68961043
## 6 EAST NORTH CENTRAL 47338744
merged_data <- merge(funding_2, voter_2, by = "State")
merged_data_2 <- merge(merged_data, pop_2, by = "State")
head(merged_data_2)
## State Total Biden Trump All Other Total Vote Population
## 1 ALABAMA 3.0 849624 1441170 32488 2323282 5031362
## 2 ALASKA 3.7 153778 189951 15801 359530 732923
## 3 ARIZONA 3.5 1672143 1661686 53497 3387326 7179943
## 4 ARKANSAS 2.8 423932 760647 34490 1219069 3014195
## 5 CALIFORNIA 18.4 11110639 6006518 384223 17501380 39501653
## 6 COLORADO 3.2 1804352 1364607 88021 3256980 5784865
write.csv(merged_data_2, "merged_data_2.csv", row.names = FALSE)
Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?
From the following plot, we can say there is a positive linear relationship between population and funding. Higher population will receive more funding
p<- ggplot(data = merged_data_2, mapping = aes(x = Population / 1e6, y = Total, color = State)) +
geom_point() +
geom_smooth(method = "lm", color = "red")+
theme_bw()+
labs(title= "Population vs Funding", x="Population (Millions)", y= "Funding ($ Billions)") +
theme(plot.title = element_text(size = 20, hjust = 0.5))+
guides(color = FALSE)
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggplotly(p, tooltip = c("x", "y", "State"))
## `geom_smooth()` using formula = 'y ~ x'
Does the allocation favor the political interests of the Biden administration?
From the following chart, we can see that states favoring Biden receive more funding. The total funding in ‘Biden’s State’ is larger than in ‘Trump’s State’.
my_data_sum <- merged_data_2 |>
mutate(Biden_gt_Trump = ifelse(Biden > Trump, "Biden", "Trump")) %>%
group_by(Biden_gt_Trump) |>
summarise(Sum_Total = sum(Total))
# Create a bar plot
ggplot(my_data_sum, aes(x = Biden_gt_Trump, y = Sum_Total, fill = Biden_gt_Trump)) +
geom_bar(stat = "identity") +
labs(x = "Biden vs. Trump", y = "Total Funding", title = "Total Funding by Biden vs. Trump") +
theme_minimal()