DATA 608 Assignment 1

Overview

The attached Excel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory.

Your story (Data Visualization(s) ) should address the following questions:

Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?

Does the allocation favor the political interests of the Biden administration?

Notes:

You will need to source data on the current (estimated) population of each of the States and Territories (accuracy is more important than precision) and on the official election results of the 2020 Presidential election.
You may choose to develop you visualizations using a desktop application or a code library. Your submittal should be in the form of a report (document) or a presentation.
This assignment is due by the end of week two of the semester.

Data Sources

Current (estimated) population of each of the States and Territories:

This data is from the US Census https://www.census.gov/data/tables/time-series/demo/popest/2020s-state-total.html

Official election results of the 2020 Presidential election:

This data comes from the Associated Press https://www.kaggle.com/datasets/callummacpherson14/2020-us-presidential-election-results-by-state?resource=download

Load Libraries

library(statebins)
library(hrbrthemes)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(httr)
library(dplyr)
library(stringr)
library(rvest)

## 
## Attaching package: 'rvest'
## 
## The following object is masked from 'package:readr':
## 
##     guess_encoding

library(janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(stringr)
library(tidytext)
library(tibble)
library(textdata)

## 
## Attaching package: 'textdata'
## 
## The following object is masked from 'package:httr':
## 
##     cache_info

library(tidyr)
library(readr)
library(purrr)
library(forcats)
library(ggplot2)
library(colorspace)

Read in Data

# Read in funding data from 2023
fund_data <- read_csv('https://raw.githubusercontent.com/gillianmcgovern0/cuny-data-608/refs/heads/main/IIJA%20FUNDING%20AS%20OF%20MARCH%202023.csv', show_col_types = FALSE)
head(fund_data)

## # A tibble: 6 × 2
##   `State, Teritory or Tribal Nation` `Total (Billions)`
##   <chr>                                           <dbl>
## 1 ALABAMA                                        3     
## 2 ALASKA                                         3.7   
## 3 AMERICAN SAMOA                                 0.0686
## 4 ARIZONA                                        3.5   
## 5 ARKANSAS                                       2.8   
## 6 CALIFORNIA                                    18.4

# Read in population data from 2023
pop_data <- read_csv('https://raw.githubusercontent.com/gillianmcgovern0/cuny-data-608/refs/heads/main/NST-EST2024-POP.csv', show_col_types = FALSE)

## New names:
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
## • `` -> `...14`
## • `` -> `...15`
## • `` -> `...16`
## • `` -> `...17`
## • `` -> `...18`
## • `` -> `...19`
## • `` -> `...20`
## • `` -> `...21`
## • `` -> `...22`
## • `` -> `...23`
## • `` -> `...24`
## • `` -> `...25`
## • `` -> `...26`
## • `` -> `...27`
## • `` -> `...28`
## • `` -> `...29`
## • `` -> `...30`
## • `` -> `...31`
## • `` -> `...32`
## • `` -> `...33`
## • `` -> `...34`
## • `` -> `...35`
## • `` -> `...36`

head(pop_data)

## # A tibble: 6 × 36
##   table with row heade…¹ ...2  ...3     ...4    ...5    ...6    ...7 ...8  ...9 
##   <chr>                  <chr> <chr>   <dbl>   <dbl>   <dbl>   <dbl> <lgl> <lgl>
## 1 Annual Estimates of t… <NA>  <NA>  NA      NA      NA      NA      NA    NA   
## 2 Geographic Area        Apri… Popu… NA      NA      NA      NA      NA    NA   
## 3 <NA>                   <NA>  2020   2.02e3  2.02e3  2.02e3  2.02e3 NA    NA   
## 4 United States          331,… 331,…  3.32e8  3.34e8  3.37e8  3.40e8 NA    NA   
## 5 Northeast              57,6… 57,4…  5.73e7  5.72e7  5.74e7  5.78e7 NA    NA   
## 6 Midwest                68,9… 68,9…  6.89e7  6.89e7  6.92e7  6.96e7 NA    NA   
## # ℹ abbreviated name:
## #   ¹`table with row headers in column A and column headers in rows 3 through 4. (leading dots indicate sub-parts)`
## # ℹ 27 more variables: ...10 <lgl>, ...11 <lgl>, ...12 <lgl>, ...13 <lgl>,
## #   ...14 <lgl>, ...15 <lgl>, ...16 <lgl>, ...17 <lgl>, ...18 <lgl>,
## #   ...19 <lgl>, ...20 <lgl>, ...21 <lgl>, ...22 <lgl>, ...23 <lgl>,
## #   ...24 <lgl>, ...25 <lgl>, ...26 <lgl>, ...27 <lgl>, ...28 <lgl>,
## #   ...29 <lgl>, ...30 <lgl>, ...31 <lgl>, ...32 <lgl>, ...33 <lgl>, …

# Read in 2020 presidential election results data
election_data <- read_csv('https://raw.githubusercontent.com/gillianmcgovern0/cuny-data-608/refs/heads/main/voting.csv', show_col_types = FALSE)
head(election_data)

## # A tibble: 6 × 8
##   state  state_abr trump_pct biden_pct trump_vote biden_vote trump_win biden_win
##   <chr>  <chr>         <dbl>     <dbl>      <dbl>      <dbl>     <dbl>     <dbl>
## 1 Alaska AK             53.1      43       189543     153502         1         0
## 2 Hawaii HI             34.3      63.7     196864     366130         0         1
## 3 Washi… WA             39        58.4    1584651    2369612         0         1
## 4 Oregon OR             40.7      56.9     958448    1340383         0         1
## 5 Calif… CA             34.3      63.5    5982194   11082293         0         1
## 6 Idaho  ID             63.9      33.1     554128     287031         1         0

Clean and Prepare Data

Cleaning up the data:

# Cleaning

# Rename columns
colnames(fund_data)[colnames(fund_data) == 'State, Teritory or Tribal Nation'] <- 'state_territory_tribal_nation'
colnames(fund_data)[colnames(fund_data) == 'Total (Billions)'] <- 'total_funding_billions'

# Make states lowercase
fund_data$state_territory_tribal_nation <- tolower(str_to_title(fund_data$state_territory_tribal_nation))
head(fund_data, 20)

## # A tibble: 20 × 2
##    state_territory_tribal_nation total_funding_billions
##    <chr>                                          <dbl>
##  1 alabama                                       3     
##  2 alaska                                        3.7   
##  3 american samoa                                0.0686
##  4 arizona                                       3.5   
##  5 arkansas                                      2.8   
##  6 california                                   18.4   
##  7 colorado                                      3.2   
##  8 connecticut                                   2.5   
##  9 deleware                                      0.792 
## 10 district of columbia                          1.1   
## 11 florida                                       8.2   
## 12 georgia                                       5     
## 13 guam                                          0.143 
## 14 hawaii                                        1     
## 15 idaho                                         1.2   
## 16 illinois                                      8.4   
## 17 indiana                                       3.4   
## 18 iowa                                          2.4   
## 19 kansas                                        1.5   
## 20 kentucky                                      3.9

# Get rid of useless rows and columns
colnames(pop_data) <- pop_data[3, ]
names(pop_data)[1:2] <- c("state_territory_tribal_nation", "baseline")
new_pop_data <- pop_data %>%
  select("state_territory_tribal_nation", "2023")
new_pop_data <- new_pop_data[-(1:8), ]
new_pop_data <- new_pop_data[-(52:59), ]
# Get rid of characters in file
new_pop_data$state_territory_tribal_nation <- gsub("\\.", "", new_pop_data$state_territory_tribal_nation)
# Rename columns
colnames(new_pop_data) <- c("state_territory_tribal_nation", "total_population_2023")
# Make states lowercase (to merge later)
new_pop_data$state_territory_tribal_nation <- tolower(new_pop_data$state_territory_tribal_nation)
head(new_pop_data)

## # A tibble: 6 × 2
##   state_territory_tribal_nation total_population_2023
##   <chr>                                         <dbl>
## 1 alabama                                     5117673
## 2 alaska                                       736510
## 3 arizona                                     7473027
## 4 arkansas                                    3069463
## 5 california                                 39198693
## 6 colorado                                    5901339

Merge the data frames:

# Merge the data on state column
# Fix Delaware spelling in funding data
fund_data$state_territory_tribal_nation[fund_data$state_territory_tribal_nation == "deleware"] <- "delaware"

# Use states/territories from the US Census dataset (trusted source)
df_merged <- merge(new_pop_data, fund_data, by = "state_territory_tribal_nation", all.x = TRUE) 
df_merged

##    state_territory_tribal_nation total_population_2023 total_funding_billions
## 1                        alabama               5117673                 3.0000
## 2                         alaska                736510                 3.7000
## 3                        arizona               7473027                 3.5000
## 4                       arkansas               3069463                 2.8000
## 5                     california              39198693                18.4000
## 6                       colorado               5901339                 3.2000
## 7                    connecticut               3643023                 2.5000
## 8                       delaware               1036423                 0.7920
## 9           district of columbia                687324                 1.1000
## 10                       florida              22904868                 8.2000
## 11                       georgia              11064432                 5.0000
## 12                        hawaii               1441387                 1.0000
## 13                         idaho               1971122                 1.2000
## 14                      illinois              12642259                 8.4000
## 15                       indiana               6880131                 3.4000
## 16                          iowa               3218414                 2.4000
## 17                        kansas               2951500                 1.5000
## 18                      kentucky               4550595                 3.9000
## 19                     louisiana               4588071                 4.3000
## 20                         maine               1399646                 1.1000
## 21                      maryland               6217062                 2.7000
## 22                 massachusetts               7066568                 3.6000
## 23                      michigan              10083356                 5.2000
## 24                     minnesota               5753048                 2.7000
## 25                   mississippi               2943172                 2.3000
## 26                      missouri               6208038                 3.8000
## 27                       montana               1131302                 3.3000
## 28                      nebraska               1987864                 1.3000
## 29                        nevada               3214363                 1.7000
## 30                 new hampshire               1402199                 0.7518
## 31                    new jersey               9379642                 5.1000
## 32                    new mexico               2121164                 2.6000
## 33                      new york              19737367                10.1000
## 34                north carolina              10881189                 4.5000
## 35                  north dakota                789047                 1.8000
## 36                          ohio              11824034                 6.6000
## 37                      oklahoma               4063882                 2.9000
## 38                        oregon               4253653                 2.3000
## 39                  pennsylvania              13017721                 8.1000
## 40                  rhode island               1103429                 1.1000
## 41                south carolina               5387830                 2.3000
## 42                  south dakota                918305                 1.3000
## 43                     tennessee               7148304                 3.7000
## 44                         texas              30727890                14.2000
## 45                          utah               3443222                 1.8000
## 46                       vermont                648708                 0.8521
## 47                      virginia               8734685                 4.5000
## 48                    washington               7857320                 4.0000
## 49                 west virginia               1770495                 2.0000
## 50                     wisconsin               5930405                 2.8000
## 51                       wyoming                585067                 2.3000

We now have the data for the first question in a tidy format where each row represent a unique state/territory. Population size plays a big role in the amount of funds though, so it should be taken into consideration. Therefore let’s normalize the data and create a per capita feature.

# Normalize the data

# Add a per capita column - total funds / total population
df_merged <- df_merged |>
  mutate(
    funding_per_capita = (total_funding_billions * 1000000000) / total_population_2023,
  )
head(df_merged)

##   state_territory_tribal_nation total_population_2023 total_funding_billions
## 1                       alabama               5117673                    3.0
## 2                        alaska                736510                    3.7
## 3                       arizona               7473027                    3.5
## 4                      arkansas               3069463                    2.8
## 5                    california              39198693                   18.4
## 6                      colorado               5901339                    3.2
##   funding_per_capita
## 1           586.2039
## 2          5023.6928
## 3           468.3510
## 4           912.2117
## 5           469.4034
## 6           542.2498

We now have data that considers the funds given the population. Yet, for the story of this data, seeing the exact funds numbers won’t be needed. We just need a general idea of the allocation of funds. So let’s set up some bins to simplify the visualization:

# Create bins for simplification

df_merged_bins <- mutate(
  df_merged,
  funding_per_capita_bins = cut(
      funding_per_capita,
      breaks = c(0, 400, 600, 800, 1000, 6000),
      labels = c("< $400", "$400 to $600", "$600 to $800", "$800 to $1000", "> $1000"),
      right = FALSE
    )
  )
head(df_merged_bins)

##   state_territory_tribal_nation total_population_2023 total_funding_billions
## 1                       alabama               5117673                    3.0
## 2                        alaska                736510                    3.7
## 3                       arizona               7473027                    3.5
## 4                      arkansas               3069463                    2.8
## 5                    california              39198693                   18.4
## 6                      colorado               5901339                    3.2
##   funding_per_capita funding_per_capita_bins
## 1           586.2039            $400 to $600
## 2          5023.6928                 > $1000
## 3           468.3510            $400 to $600
## 4           912.2117           $800 to $1000
## 5           469.4034            $400 to $600
## 6           542.2498            $400 to $600

Now let’s look at the election data for the second part:

# Clean data

# Keep states/territory from the Associated Press
colnames(election_data)[colnames(election_data) == 'state'] <- 'state_territory_tribal_nation'
election_data$state_territory_tribal_nation <- tolower(election_data$state_territory_tribal_nation)
election_data_merged <- merge(election_data, df_merged_bins, by = "state_territory_tribal_nation", all.x = TRUE) 
election_data_merged

##    state_territory_tribal_nation state_abr trump_pct biden_pct trump_vote
## 1                        alabama        AL      62.2      36.7    1441170
## 2                         alaska        AK      53.1      43.0     189543
## 3                        arizona        AZ      49.1      49.4    1661686
## 4                       arkansas        AR      62.4      34.8     760647
## 5                     california        CA      34.3      63.5    5982194
## 6                       colorado        CO      41.9      55.4    1364471
## 7                    connecticut        CT      39.2      59.3     715291
## 8                       delaware        DE      39.8      58.8     200603
## 9           district of columbia        DC       5.4      93.0      18586
## 10                       florida        FL      51.2      47.9    5668731
## 11                       georgia        GA      49.3      49.5    2461837
## 12                        hawaii        HI      34.3      63.7     196864
## 13                         idaho        ID      63.9      33.1     554128
## 14                      illinois        IL      40.5      57.6    2438943
## 15                       indiana        IN      57.1      41.0    1729516
## 16                          iowa        IA      53.2      45.0     897672
## 17                        kansas        KS      56.5      41.3     752933
## 18                      kentucky        KY      62.1      36.2    1326646
## 19                     louisiana        LA      58.5      39.9    1255776
## 20                         maine        ME      44.0      53.1     360480
## 21                      maryland        MD      32.4      65.8     976414
## 22                 massachusetts        MA      32.3      65.9    1167202
## 23                      michigan        MI      47.8      50.6    2649842
## 24                     minnesota        MN      45.4      52.6    1484056
## 25                   mississippi        MS      57.6      41.1     756789
## 26                      missouri        MO      56.8      41.4    1718282
## 27                       montana        MT      56.9      40.6     343647
## 28                      nebraska        NE      58.5      39.4     556846
## 29                        nevada        NV      47.7      50.1     669890
## 30                 new hampshire        NH      45.5      52.9     365660
## 31                    new jersey        NJ      41.4      57.3    1883260
## 32                    new mexico        NM      43.5      54.3     401894
## 33                      new york        NY      41.5      57.1    3005636
## 34                north carolina        NC      50.1      48.7    2758775
## 35                  north dakota        ND      65.5      31.9     235595
## 36                          ohio        OH      53.3      45.3    3154834
## 37                      oklahoma        OK      65.4      32.3    1020280
## 38                        oregon        OR      40.7      56.9     958448
## 39                  pennsylvania        PA      48.8      50.0    3378263
## 40                  rhode island        RI      38.8      59.7     199922
## 41                south carolina        SC      55.1      43.4    1385103
## 42                  south dakota        SD      61.8      35.6     261043
## 43                     tennessee        TN      60.7      37.4    1849556
## 44                         texas        TX      52.1      46.5    5890347
## 45                          utah        UT      58.2      37.7     865140
## 46                       vermont        VT      30.8      66.4     112704
## 47                      virginia        VA      44.2      54.4    1962430
## 48                    washington        WA      39.0      58.4    1584651
## 49                 west virginia        WV      68.6      29.7     545382
## 50                     wisconsin        WI      48.9      49.6    1610065
## 51                       wyoming        WY      70.4      26.7     193559
##    biden_vote trump_win biden_win total_population_2023 total_funding_billions
## 1      849624         1         0               5117673                 3.0000
## 2      153502         1         0                736510                 3.7000
## 3     1672143         0         1               7473027                 3.5000
## 4      423932         1         0               3069463                 2.8000
## 5    11082293         0         1              39198693                18.4000
## 6     1804196         0         1               5901339                 3.2000
## 7     1080680         0         1               3643023                 2.5000
## 8      296268         0         1               1036423                 0.7920
## 9      317323         0         1                687324                 1.1000
## 10    5297045         1         0              22904868                 8.2000
## 11    2474507         0         1              11064432                 5.0000
## 12     366130         0         1               1441387                 1.0000
## 13     287031         1         0               1971122                 1.2000
## 14    3463260         0         1              12642259                 8.4000
## 15    1242413         1         0               6880131                 3.4000
## 16     759061         1         0               3218414                 2.4000
## 17     551144         1         0               2951500                 1.5000
## 18     772474         1         0               4550595                 3.9000
## 19     856034         1         0               4588071                 4.3000
## 20     434966         0         1               1399646                 1.1000
## 21    1985023         0         1               6217062                 2.7000
## 22    2382202         0         1               7066568                 3.6000
## 23    2804040         0         1              10083356                 5.2000
## 24    1717077         0         1               5753048                 2.7000
## 25     539508         1         0               2943172                 2.3000
## 26    1252902         1         0               6208038                 3.8000
## 27     244836         1         0               1131302                 3.3000
## 28     374583         1         0               1987864                 1.3000
## 29     703486         0         1               3214363                 1.7000
## 30     424935         0         1               1402199                 0.7518
## 31    2608327         0         1               9379642                 5.1000
## 32     501614         0         1               2121164                 2.6000
## 33    4137641         0         1              19737367                10.1000
## 34    2684292         1         0              10881189                 4.5000
## 35     114902         1         0                789047                 1.8000
## 36    2679165         1         0              11824034                 6.6000
## 37     503890         1         0               4063882                 2.9000
## 38    1340383         0         1               4253653                 2.3000
## 39    3459923         0         1              13017721                 8.1000
## 40     307486         0         1               1103429                 1.1000
## 41    1091541         1         0               5387830                 2.3000
## 42     150471         1         0                918305                 1.3000
## 43    1139332         1         0               7148304                 3.7000
## 44    5259126         1         0              30727890                14.2000
## 45     560282         1         0               3443222                 1.8000
## 46     242820         0         1                648708                 0.8521
## 47    2413568         0         1               8734685                 4.5000
## 48    2369612         0         1               7857320                 4.0000
## 49     235984         1         0               1770495                 2.0000
## 50    1630673         0         1               5930405                 2.8000
## 51      73491         1         0                585067                 2.3000
##    funding_per_capita funding_per_capita_bins
## 1            586.2039            $400 to $600
## 2           5023.6928                 > $1000
## 3            468.3510            $400 to $600
## 4            912.2117           $800 to $1000
## 5            469.4034            $400 to $600
## 6            542.2498            $400 to $600
## 7            686.2433            $600 to $800
## 8            764.1668            $600 to $800
## 9           1600.4097                 > $1000
## 10           358.0025                  < $400
## 11           451.8985            $400 to $600
## 12           693.7762            $600 to $800
## 13           608.7903            $600 to $800
## 14           664.4382            $600 to $800
## 15           494.1766            $400 to $600
## 16           745.7089            $600 to $800
## 17           508.2162            $400 to $600
## 18           857.0308           $800 to $1000
## 19           937.2130           $800 to $1000
## 20           785.9130            $600 to $800
## 21           434.2887            $400 to $600
## 22           509.4411            $400 to $600
## 23           515.7013            $400 to $600
## 24           469.3164            $400 to $600
## 25           781.4698            $600 to $800
## 26           612.1097            $600 to $800
## 27          2916.9930                 > $1000
## 28           653.9683            $600 to $800
## 29           528.8762            $400 to $600
## 30           536.1578            $400 to $600
## 31           543.7308            $400 to $600
## 32          1225.7421                 > $1000
## 33           511.7197            $400 to $600
## 34           413.5577            $400 to $600
## 35          2281.2329                 > $1000
## 36           558.1851            $400 to $600
## 37           713.6034            $600 to $800
## 38           540.7117            $400 to $600
## 39           622.2287            $600 to $800
## 40           996.8924           $800 to $1000
## 41           426.8880            $400 to $600
## 42          1415.6517                 > $1000
## 43           517.6053            $400 to $600
## 44           462.1209            $400 to $600
## 45           522.7662            $400 to $600
## 46          1313.5340                 > $1000
## 47           515.1874            $400 to $600
## 48           509.0794            $400 to $600
## 49          1129.6276                 > $1000
## 50           472.1431            $400 to $600
## 51          3931.1737                 > $1000

# Merge in the data
election_data_merged <- election_data_merged |>
  select(state_territory_tribal_nation, biden_win, total_funding_billions, funding_per_capita, funding_per_capita_bins)

election_data_merged <- election_data_merged |>
  mutate(
    results = ifelse(biden_win == 1, "Biden", "Trump")
  )
# Final dataset
head(election_data_merged)

##   state_territory_tribal_nation biden_win total_funding_billions
## 1                       alabama         0                    3.0
## 2                        alaska         0                    3.7
## 3                       arizona         1                    3.5
## 4                      arkansas         0                    2.8
## 5                    california         1                   18.4
## 6                      colorado         1                    3.2
##   funding_per_capita funding_per_capita_bins results
## 1           586.2039            $400 to $600   Trump
## 2          5023.6928                 > $1000   Trump
## 3           468.3510            $400 to $600   Biden
## 4           912.2117           $800 to $1000   Trump
## 5           469.4034            $400 to $600   Biden
## 6           542.2498            $400 to $600   Biden

EDA

For this let’s just do a quick EDA (given the main point of this assignment are the visualizations).

summary(election_data_merged)

##  state_territory_tribal_nation   biden_win      total_funding_billions
##  Length:51                     Min.   :0.0000   Min.   : 0.7518       
##  Class :character              1st Qu.:0.0000   1st Qu.: 1.8000       
##  Mode  :character              Median :1.0000   Median : 2.8000       
##                                Mean   :0.5098   Mean   : 3.7568       
##                                3rd Qu.:1.0000   3rd Qu.: 4.1500       
##                                Max.   :1.0000   Max.   :18.4000       
##  funding_per_capita  funding_per_capita_bins   results         
##  Min.   : 358.0     < $400       : 1         Length:51         
##  1st Qu.: 509.3     $400 to $600 :25         Class :character  
##  Median : 586.2     $600 to $800 :12         Mode  :character  
##  Mean   : 896.9     $800 to $1000: 4                           
##  3rd Qu.: 821.5     > $1000      : 9                           
##  Max.   :5023.7

summary(df_merged_bins)

##  state_territory_tribal_nation total_population_2023 total_funding_billions
##  Length:51                     Min.   :  585067      Min.   : 0.7518       
##  Class :character              1st Qu.: 1870808      1st Qu.: 1.8000       
##  Mode  :character              Median : 4550595      Median : 2.8000       
##                                Mean   : 6604044      Mean   : 3.7568       
##                                3rd Qu.: 7665174      3rd Qu.: 4.1500       
##                                Max.   :39198693      Max.   :18.4000       
##  funding_per_capita  funding_per_capita_bins
##  Min.   : 358.0     < $400       : 1        
##  1st Qu.: 509.3     $400 to $600 :25        
##  Median : 586.2     $600 to $800 :12        
##  Mean   : 896.9     $800 to $1000: 4        
##  3rd Qu.: 821.5     > $1000      : 9        
##  Max.   :5023.7

Problem Statement

df_merged_total_sorted <- df_merged_bins |>
  arrange(desc(total_funding_billions)) |>
  slice(1:10)

ggplot(df_merged_total_sorted, aes(x = reorder(state_territory_tribal_nation, total_funding_billions), y = total_funding_billions)) + 
  geom_col(fill = lighten("maroon", 0.2)) + 
  scale_y_continuous(
    name = "Total Funding in Billions",
  ) +
  coord_flip() + 
  theme(
    text = element_text(color = "gray30"),
    axis.text.x = element_text(color = "gray30"),
    axis.title.y = element_blank(),
    axis.line.y = element_blank(),
    axis.ticks.length = unit(0, "pt"),
    axis.text.y = element_text(
      size = 8,
    ),
  ) +
  labs(title = "Top 10 Highest IIJA Funded States in the United States")

Fig. 1 shows us that it appears the states with the largest cities are the most funded. Why is this?

Analysis

Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?

# This graph is inspired by the Fundamentals of Data Visualization textbook

# geom_statebins only supports the 50 states, to my knowledge
state_names <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming")

df_merged_bins_updated <- df_merged_bins

colnames(df_merged_bins_updated)[colnames(df_merged_bins_updated) == 'state_territory_tribal_nation'] <- 'state'

filtered_df_merged_bins_updated <- df_merged_bins_updated %>%
  filter(state %in% tolower(state_names))

filtered_df_merged_bins_updated$state <- str_to_title(filtered_df_merged_bins_updated$state)

filter(filtered_df_merged_bins_updated) %>%
  ggplot(aes(state = state, fill = funding_per_capita_bins)) +
  geom_statebins(lbl_size = 14/.pt) +
  expand_limits(x = -1.3) + # make space for legend
  coord_equal(expand = FALSE) +
   scale_fill_discrete_sequential(
    h1 = -83, h2 = 20, c1 = 30, cmax = 40, c2 = 0, l1 = 20, l2 = 100, p1 = 1, p2 = 1.2, rev = TRUE, 
    name = "IIJA Funding Per Capita",
    nmax = 7,
    order = 2:6,
    guide = guide_legend(
      override.aes = list(colour = "white", size = 1),
      reverse = TRUE
    )
  ) +
  theme(
    panel.grid.major = element_blank(), # Removes grid lines
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    panel.background = element_blank(),
    axis.title.x = element_blank(), # Remove x-axis title
    axis.text.x = element_blank(),  # Remove x-axis text (labels)
    axis.title.y = element_blank(), # Remove y-axis title
    axis.text.y = element_blank(),   # Remove y-axis text (labels)
    axis.ticks.x = element_blank(), # Remove x-axis tick marks
    axis.ticks.y = element_blank(),  # Remove y-axis tick marks
    legend.background = element_blank(),
    legend.position = c(0, 1),
    legend.justification = c(0, 1),
    legend.spacing.x = grid::unit(2, "pt"),
    legend.spacing.y = grid::unit(2, "pt"),
    legend.title = element_text(hjust = 0.5),
    legend.key.width = grid::unit(18, "pt"),
    legend.key.height = grid::unit(15, "pt")
  ) +
  labs(title = "Allocation of IIJA Funding (Per Capita) in the United States")

## Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
## 3.5.0.
## ℹ Please use the `legend.position.inside` argument of `theme()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Fig. 2 shows all the rural states (the Midwest and Alaska) are the darkest colored states indicating the states with the highest funds per capita. There is a bias towards the least populated states in the US.

Does the allocation favor the political interests of the Biden administration?

# This graph is inspired by the Fundamentals of Data Visualization textbook

# geom_statebins only supports the 50 states, to my knowledge
state_names <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming")

red <- lighten("red", amount = 0.2)
blue <- lighten("blue", amount = 0.2)

election_data_merged_updated <- election_data_merged

colnames(election_data_merged_updated)[colnames(election_data_merged_updated) == 'state_territory_tribal_nation'] <- 'state'

filtered_election_data_merged_updated <- election_data_merged_updated %>%
  filter(state %in% tolower(state_names))

filtered_election_data_merged_updated$state <- str_to_title(filtered_election_data_merged_updated$state)

filter(filtered_election_data_merged_updated) %>%
  ggplot(aes(state = state, fill = results)) +
  geom_statebins(lbl_size = 14/.pt) +
  expand_limits(x = -1.3) + # make space for legend
  coord_equal(expand = FALSE) +
  scale_fill_manual(values = c("Biden" = blue, "Trump" = red)) +
  labs(title = "2020 Presidential Election Results", fill = "Results") +
  theme(
    panel.grid.major = element_blank(), # Removes grid lines
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    panel.background = element_blank(),
    axis.title.x = element_blank(), # Remove x-axis title
    axis.text.x = element_blank(),  # Remove x-axis text (labels)
    axis.title.y = element_blank(), # Remove y-axis title
    axis.text.y = element_blank(),   # Remove y-axis text (labels)
    axis.ticks.x = element_blank(), # Remove x-axis tick marks
    axis.ticks.y = element_blank(),  # Remove y-axis tick marks
    legend.background = element_blank(),
    legend.position = c(0, 1),
    legend.justification = c(0, 1),
    legend.spacing.x = grid::unit(2, "pt"),
    legend.spacing.y = grid::unit(2, "pt"),
    legend.title = element_text(hjust = 0.5),
    legend.key.width = grid::unit(18, "pt"),
    legend.key.height = grid::unit(15, "pt")
  )

Fig. 3 shows all the rural states (the Midwest and Alaska), the highest funded states from Fig. 2, are red indicating the states that voted for Trump. With a couple exceptions such as Vermont and New Mexico, there is a bias towards states of the opposite party.

Conclusion

# This graph is inspired by the Fundamentals of Data Visualization textbook

# geom_statebins only supports the 50 states, to my knowledge
state_names <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming")

election_data_merged_updated <- election_data_merged

colnames(election_data_merged_updated)[colnames(election_data_merged_updated) == 'state_territory_tribal_nation'] <- 'state'

filtered_election_data_merged_updated <- election_data_merged_updated %>%
  filter(state %in% tolower(state_names))

filtered_election_data_merged_updated$state <- str_to_title(filtered_election_data_merged_updated$state)

filter(filtered_election_data_merged_updated) %>%
  ggplot(aes(state = state, fill = funding_per_capita_bins)) +
  geom_statebins(lbl_size = 14/.pt) +
  geom_statebins(lbl_size = 14/.pt,
                data = subset(filtered_election_data_merged_updated, results == "Trump"),
                dark_lbl = lighten("red", amount = 0.2),
                light_lbl = lighten("red", amount = 0.2)) +
   geom_statebins(lbl_size = 14/.pt,
                data = subset(filtered_election_data_merged_updated, results == "Biden"),
                dark_lbl = lighten("blue", amount = 0.2),
                light_lbl = lighten("blue", amount = 0.2)) +
  expand_limits(x = -1.3) + # make space for legend
  coord_equal(expand = FALSE) +
   scale_fill_discrete_sequential(
    h1 = -83, h2 = 20, c1 = 30, cmax = 40, c2 = 0, l1 = 20, l2 = 100, p1 = 1, p2 = 1.2, rev = TRUE, 
    name = "IIJA Funding Per Capita",
    nmax = 7,
    order = 2:6,
    guide = guide_legend(
      override.aes = list(colour = "white", size = 1),
      reverse = TRUE
    )
  ) +
  labs(title = "IIJA Funding Per Capita for the United States Given the Political Party", fill = "Results") +
  theme(
    panel.grid.major = element_blank(), # Removes grid lines
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    panel.background = element_blank(),
    axis.title.x = element_blank(), # Remove x-axis title
    axis.text.x = element_blank(),  # Remove x-axis text (labels)
    axis.title.y = element_blank(), # Remove y-axis title
    axis.text.y = element_blank(),   # Remove y-axis text (labels)
    axis.ticks.x = element_blank(), # Remove x-axis tick marks
    axis.ticks.y = element_blank(),  # Remove y-axis tick marks
    legend.background = element_blank(),
    legend.position = c(0, 1),
    legend.justification = c(0, 1),
    legend.spacing.x = grid::unit(2, "pt"),
    legend.spacing.y = grid::unit(2, "pt"),
    legend.title = element_text(hjust = 0.5),
    legend.key.width = grid::unit(18, "pt"),
    legend.key.height = grid::unit(15, "pt")
  )

Fig. 4 highlights the political party of each state while showing how IIJA is being allocated. The least populated states are getting higher funds per capita. Although it may seem the states with the highest populations are getting the most funds (as shown in Fig. 1), if you take into account the amount of people in each state, the funds are a lot less for the states with bigger cities. One reason for this could be due to infrastructure issues in rural areas that need the funds. Another reason could be the Biden administration wanting to win over states that did not vote for him in 2020. This data comes from 2023, so it’s possible this allocation was heavily related to the 2024 election year.

DATA 608 Assignment 1

Gillian McGovern

2025-09-07

Overview

Data Sources

Load Libraries

Read in Data

Clean and Prepare Data

EDA

Problem Statement

Analysis

Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?

Does the allocation favor the political interests of the Biden administration?

Conclusion