data608_story

Introduction

Story - 1 : Infrastructure Investment & Jobs Act Funding Allocation

The attached Excel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory. Your story (Data Visualization(s) ) should address the following questions:

Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?

Does the allocation favor the political interests of the Biden administration?

Library

library("tidyverse")
library("dplyr")
library("plotly")

Loading the dataset

The file named IIJA FUNDING AS OF MARCH 2023.xlsx was converted to a csv file and uploaded to GitHub. The population of each of the States and Territories from the official election results of the 2020 Presidential election was accessed using the following link. The table - 2020 PRESIDENTIAL ELECTORAL AND POPULAR VOTE from Page 12 was extracted with Excel. The ELECTORAL VOTE column was omitted because of low volume. The Popular Vote column was the focus of the analysis.

https://www.fec.gov/resources/cms-content/documents/federalelections2020.pdf

funding <- read_csv("https://raw.githubusercontent.com/tonyCUNY/tonyCUNY/main/IIJA%20FUNDING%20AS%20OF%20MARCH%202023.csv")
voter <- read_csv("https://raw.githubusercontent.com/tonyCUNY/tonyCUNY/main/2020_PRESIDENTIAL_POPULAR%20VOTE.csv")
pop <- read_csv("https://www2.census.gov/programs-surveys/popest/datasets/2020-2022/state/totals/NST-EST2022-POPCHG2020_2022.csv")

head(funding)

## # A tibble: 6 × 2
##   `State, Teritory or Tribal Nation` `Total (Billions)`
##   <chr>                                           <dbl>
## 1 ALABAMA                                        3     
## 2 ALASKA                                         3.7   
## 3 AMERICAN SAMOA                                 0.0686
## 4 ARIZONA                                        3.5   
## 5 ARKANSAS                                       2.8   
## 6 CALIFORNIA                                    18.4

head(voter)

## # A tibble: 6 × 5
##   STATE    Biden   Trump `All Other` `Total Vote`
##   <chr>    <dbl>   <dbl>       <dbl>        <dbl>
## 1 AL      849624 1441170       32488      2323282
## 2 AK      153778  189951       15801       359530
## 3 AZ     1672143 1661686       53497      3387326
## 4 AR      423932  760647       34490      1219069
## 5 CA    11110639 6006518      384223     17501380
## 6 CO     1804352 1364607       88021      3256980

head(pop)

## # A tibble: 6 × 25
##   SUMLEV REGION DIVISION STATE NAME            ESTIMATESBASE2020 POPESTIMATE2020
##   <chr>  <chr>  <chr>    <chr> <chr>                       <dbl>           <dbl>
## 1 010    0      0        00    United States           331449520       331511512
## 2 020    1      0        00    Northeast Regi…          57609156        57448898
## 3 030    1      1        00    New England              15116206        15074473
## 4 030    1      2        00    Middle Atlantic          42492950        42374425
## 5 020    2      0        00    Midwest Region           68985537        68961043
## 6 030    2      3        00    East North Cen…          47368637        47338744
## # ℹ 18 more variables: POPESTIMATE2021 <dbl>, POPESTIMATE2022 <dbl>,
## #   NPOPCHG_2020 <dbl>, NPOPCHG_2021 <dbl>, NPOPCHG_2022 <dbl>,
## #   PPOPCHG_2020 <dbl>, PPOPCHG_2021 <dbl>, PPOPCHG_2022 <dbl>,
## #   NRANK_ESTBASE2020 <chr>, NRANK_POPEST2020 <chr>, NRANK_POPEST2021 <chr>,
## #   NRANK_POPEST2022 <chr>, NRANK_NPCHG2020 <chr>, NRANK_NPCHG2021 <chr>,
## #   NRANK_NPCHG2022 <chr>, NRANK_PPCHG2020 <chr>, NRANK_PPCHG2021 <chr>,
## #   NRANK_PPCHG2022 <chr>

Tidy the Data

funding_2 <- funding |> 
                rename(State = `State, Teritory or Tribal Nation`) |> 
                rename(Total = "Total (Billions)")

voter_2 <- voter |> 
              mutate(State_full = toupper(state.name[match(STATE, state.abb)])) |> 
              relocate(State_full) |> 
              select(-STATE) |> 
              rename(State = State_full) 

pop_2 <- pop |> 
              select(NAME, POPESTIMATE2020) |> 
              rename(State = NAME) |> 
              mutate(State = toupper(State)) |> 
              rename(Population = POPESTIMATE2020)

head(funding_2)

## # A tibble: 6 × 2
##   State            Total
##   <chr>            <dbl>
## 1 ALABAMA         3     
## 2 ALASKA          3.7   
## 3 AMERICAN SAMOA  0.0686
## 4 ARIZONA         3.5   
## 5 ARKANSAS        2.8   
## 6 CALIFORNIA     18.4

head(voter_2)

## # A tibble: 6 × 5
##   State         Biden   Trump `All Other` `Total Vote`
##   <chr>         <dbl>   <dbl>       <dbl>        <dbl>
## 1 ALABAMA      849624 1441170       32488      2323282
## 2 ALASKA       153778  189951       15801       359530
## 3 ARIZONA     1672143 1661686       53497      3387326
## 4 ARKANSAS     423932  760647       34490      1219069
## 5 CALIFORNIA 11110639 6006518      384223     17501380
## 6 COLORADO    1804352 1364607       88021      3256980

head(pop_2)

## # A tibble: 6 × 2
##   State              Population
##   <chr>                   <dbl>
## 1 UNITED STATES       331511512
## 2 NORTHEAST REGION     57448898
## 3 NEW ENGLAND          15074473
## 4 MIDDLE ATLANTIC      42374425
## 5 MIDWEST REGION       68961043
## 6 EAST NORTH CENTRAL   47338744

merged_data <- merge(funding_2, voter_2, by = "State")
merged_data_2 <- merge(merged_data, pop_2, by = "State")

head(merged_data_2)

##        State Total    Biden   Trump All Other Total Vote Population
## 1    ALABAMA   3.0   849624 1441170     32488    2323282    5031362
## 2     ALASKA   3.7   153778  189951     15801     359530     732923
## 3    ARIZONA   3.5  1672143 1661686     53497    3387326    7179943
## 4   ARKANSAS   2.8   423932  760647     34490    1219069    3014195
## 5 CALIFORNIA  18.4 11110639 6006518    384223   17501380   39501653
## 6   COLORADO   3.2  1804352 1364607     88021    3256980    5784865

write.csv(merged_data_2, "merged_data_2.csv", row.names = FALSE)

Visualization

Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?

From the following plot, we can say there is a positive linear relationship between population and funding. Higher population will receive more funding

p<- ggplot(data = merged_data_2, mapping = aes(x = Population / 1e6, y = Total, color = State)) +
    geom_point() + 
    geom_smooth(method = "lm", color = "red")+
    theme_bw()+
    labs(title= "Population vs Funding", x="Population (Millions)", y= "Funding ($ Billions)") +
    theme(plot.title = element_text(size = 20, hjust = 0.5))+
    guides(color = FALSE)

## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

ggplotly(p, tooltip = c("x", "y", "State"))

## `geom_smooth()` using formula = 'y ~ x'

Does the allocation favor the political interests of the Biden administration?

From the following chart, we can see that states favoring Biden receive more funding. The total funding in ‘Biden’s State’ is larger than in ‘Trump’s State’.

my_data_sum <- merged_data_2 |> 
  mutate(Biden_gt_Trump = ifelse(Biden > Trump, "Biden", "Trump")) %>%
  group_by(Biden_gt_Trump) |> 
  summarise(Sum_Total = sum(Total))

# Create a bar plot
ggplot(my_data_sum, aes(x = Biden_gt_Trump, y = Sum_Total, fill = Biden_gt_Trump)) +
  geom_bar(stat = "identity") +
  labs(x = "Biden vs. Trump", y = "Total Funding", title = "Total Funding by Biden vs. Trump") +
  theme_minimal()

data608_story_1

CHUN SHING LEUNG

2024-02-04

Introduction

Library

Loading the dataset

Tidy the Data

Visualization