The attached Excel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory.
Your story (Data Visualization(s) ) should address the following questions:
Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?
Does the allocation favor the political interests of the Biden administration?
Notes:
You will need to source data on the current (estimated) population of each of the States and Territories (accuracy is more important than precision) and on the official election results of the 2020 Presidential election.
You may choose to develop you visualizations using a desktop application or a code library. Your submittal should be in the form of a report (document) or a presentation.
This assignment is due by the end of week two of the semester.
Current (estimated) population of each of the States and Territories:
This data is from the US Census https://www.census.gov/data/tables/time-series/demo/popest/2020s-state-total.html
Official election results of the 2020 Presidential election:
This data comes from the Associated Press https://www.kaggle.com/datasets/callummacpherson14/2020-us-presidential-election-results-by-state?resource=download
library(statebins)
library(hrbrthemes)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(httr)
library(dplyr)
library(stringr)
library(rvest)
##
## Attaching package: 'rvest'
##
## The following object is masked from 'package:readr':
##
## guess_encoding
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(stringr)
library(tidytext)
library(tibble)
library(textdata)
##
## Attaching package: 'textdata'
##
## The following object is masked from 'package:httr':
##
## cache_info
library(tidyr)
library(readr)
library(purrr)
library(forcats)
library(ggplot2)
library(colorspace)
# Read in funding data from 2023
fund_data <- read_csv('https://raw.githubusercontent.com/gillianmcgovern0/cuny-data-608/refs/heads/main/IIJA%20FUNDING%20AS%20OF%20MARCH%202023.csv', show_col_types = FALSE)
head(fund_data)
## # A tibble: 6 Ă— 2
## `State, Teritory or Tribal Nation` `Total (Billions)`
## <chr> <dbl>
## 1 ALABAMA 3
## 2 ALASKA 3.7
## 3 AMERICAN SAMOA 0.0686
## 4 ARIZONA 3.5
## 5 ARKANSAS 2.8
## 6 CALIFORNIA 18.4
# Read in population data from 2023
pop_data <- read_csv('https://raw.githubusercontent.com/gillianmcgovern0/cuny-data-608/refs/heads/main/NST-EST2024-POP.csv', show_col_types = FALSE)
## New names:
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
## • `` -> `...14`
## • `` -> `...15`
## • `` -> `...16`
## • `` -> `...17`
## • `` -> `...18`
## • `` -> `...19`
## • `` -> `...20`
## • `` -> `...21`
## • `` -> `...22`
## • `` -> `...23`
## • `` -> `...24`
## • `` -> `...25`
## • `` -> `...26`
## • `` -> `...27`
## • `` -> `...28`
## • `` -> `...29`
## • `` -> `...30`
## • `` -> `...31`
## • `` -> `...32`
## • `` -> `...33`
## • `` -> `...34`
## • `` -> `...35`
## • `` -> `...36`
head(pop_data)
## # A tibble: 6 Ă— 36
## table with row heade…¹ ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
## 1 Annual Estimates of t… <NA> <NA> NA NA NA NA NA NA
## 2 Geographic Area Apri… Popu… NA NA NA NA NA NA
## 3 <NA> <NA> 2020 2.02e3 2.02e3 2.02e3 2.02e3 NA NA
## 4 United States 331,… 331,… 3.32e8 3.34e8 3.37e8 3.40e8 NA NA
## 5 Northeast 57,6… 57,4… 5.73e7 5.72e7 5.74e7 5.78e7 NA NA
## 6 Midwest 68,9… 68,9… 6.89e7 6.89e7 6.92e7 6.96e7 NA NA
## # ℹ abbreviated name:
## # ¹​`table with row headers in column A and column headers in rows 3 through 4. (leading dots indicate sub-parts)`
## # ℹ 27 more variables: ...10 <lgl>, ...11 <lgl>, ...12 <lgl>, ...13 <lgl>,
## # ...14 <lgl>, ...15 <lgl>, ...16 <lgl>, ...17 <lgl>, ...18 <lgl>,
## # ...19 <lgl>, ...20 <lgl>, ...21 <lgl>, ...22 <lgl>, ...23 <lgl>,
## # ...24 <lgl>, ...25 <lgl>, ...26 <lgl>, ...27 <lgl>, ...28 <lgl>,
## # ...29 <lgl>, ...30 <lgl>, ...31 <lgl>, ...32 <lgl>, ...33 <lgl>, …
# Read in 2020 presidential election results data
election_data <- read_csv('https://raw.githubusercontent.com/gillianmcgovern0/cuny-data-608/refs/heads/main/voting.csv', show_col_types = FALSE)
head(election_data)
## # A tibble: 6 Ă— 8
## state state_abr trump_pct biden_pct trump_vote biden_vote trump_win biden_win
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Alaska AK 53.1 43 189543 153502 1 0
## 2 Hawaii HI 34.3 63.7 196864 366130 0 1
## 3 Washi… WA 39 58.4 1584651 2369612 0 1
## 4 Oregon OR 40.7 56.9 958448 1340383 0 1
## 5 Calif… CA 34.3 63.5 5982194 11082293 0 1
## 6 Idaho ID 63.9 33.1 554128 287031 1 0
Cleaning up the data:
# Cleaning
# Rename columns
colnames(fund_data)[colnames(fund_data) == 'State, Teritory or Tribal Nation'] <- 'state_territory_tribal_nation'
colnames(fund_data)[colnames(fund_data) == 'Total (Billions)'] <- 'total_funding_billions'
# Make states lowercase
fund_data$state_territory_tribal_nation <- tolower(str_to_title(fund_data$state_territory_tribal_nation))
head(fund_data, 20)
## # A tibble: 20 Ă— 2
## state_territory_tribal_nation total_funding_billions
## <chr> <dbl>
## 1 alabama 3
## 2 alaska 3.7
## 3 american samoa 0.0686
## 4 arizona 3.5
## 5 arkansas 2.8
## 6 california 18.4
## 7 colorado 3.2
## 8 connecticut 2.5
## 9 deleware 0.792
## 10 district of columbia 1.1
## 11 florida 8.2
## 12 georgia 5
## 13 guam 0.143
## 14 hawaii 1
## 15 idaho 1.2
## 16 illinois 8.4
## 17 indiana 3.4
## 18 iowa 2.4
## 19 kansas 1.5
## 20 kentucky 3.9
# Get rid of useless rows and columns
colnames(pop_data) <- pop_data[3, ]
names(pop_data)[1:2] <- c("state_territory_tribal_nation", "baseline")
new_pop_data <- pop_data %>%
select("state_territory_tribal_nation", "2023")
new_pop_data <- new_pop_data[-(1:8), ]
new_pop_data <- new_pop_data[-(52:59), ]
# Get rid of characters in file
new_pop_data$state_territory_tribal_nation <- gsub("\\.", "", new_pop_data$state_territory_tribal_nation)
# Rename columns
colnames(new_pop_data) <- c("state_territory_tribal_nation", "total_population_2023")
# Make states lowercase (to merge later)
new_pop_data$state_territory_tribal_nation <- tolower(new_pop_data$state_territory_tribal_nation)
head(new_pop_data)
## # A tibble: 6 Ă— 2
## state_territory_tribal_nation total_population_2023
## <chr> <dbl>
## 1 alabama 5117673
## 2 alaska 736510
## 3 arizona 7473027
## 4 arkansas 3069463
## 5 california 39198693
## 6 colorado 5901339
Merge the data frames:
# Merge the data on state column
# Fix Delaware spelling in funding data
fund_data$state_territory_tribal_nation[fund_data$state_territory_tribal_nation == "deleware"] <- "delaware"
# Use states/territories from the US Census dataset (trusted source)
df_merged <- merge(new_pop_data, fund_data, by = "state_territory_tribal_nation", all.x = TRUE)
df_merged
## state_territory_tribal_nation total_population_2023 total_funding_billions
## 1 alabama 5117673 3.0000
## 2 alaska 736510 3.7000
## 3 arizona 7473027 3.5000
## 4 arkansas 3069463 2.8000
## 5 california 39198693 18.4000
## 6 colorado 5901339 3.2000
## 7 connecticut 3643023 2.5000
## 8 delaware 1036423 0.7920
## 9 district of columbia 687324 1.1000
## 10 florida 22904868 8.2000
## 11 georgia 11064432 5.0000
## 12 hawaii 1441387 1.0000
## 13 idaho 1971122 1.2000
## 14 illinois 12642259 8.4000
## 15 indiana 6880131 3.4000
## 16 iowa 3218414 2.4000
## 17 kansas 2951500 1.5000
## 18 kentucky 4550595 3.9000
## 19 louisiana 4588071 4.3000
## 20 maine 1399646 1.1000
## 21 maryland 6217062 2.7000
## 22 massachusetts 7066568 3.6000
## 23 michigan 10083356 5.2000
## 24 minnesota 5753048 2.7000
## 25 mississippi 2943172 2.3000
## 26 missouri 6208038 3.8000
## 27 montana 1131302 3.3000
## 28 nebraska 1987864 1.3000
## 29 nevada 3214363 1.7000
## 30 new hampshire 1402199 0.7518
## 31 new jersey 9379642 5.1000
## 32 new mexico 2121164 2.6000
## 33 new york 19737367 10.1000
## 34 north carolina 10881189 4.5000
## 35 north dakota 789047 1.8000
## 36 ohio 11824034 6.6000
## 37 oklahoma 4063882 2.9000
## 38 oregon 4253653 2.3000
## 39 pennsylvania 13017721 8.1000
## 40 rhode island 1103429 1.1000
## 41 south carolina 5387830 2.3000
## 42 south dakota 918305 1.3000
## 43 tennessee 7148304 3.7000
## 44 texas 30727890 14.2000
## 45 utah 3443222 1.8000
## 46 vermont 648708 0.8521
## 47 virginia 8734685 4.5000
## 48 washington 7857320 4.0000
## 49 west virginia 1770495 2.0000
## 50 wisconsin 5930405 2.8000
## 51 wyoming 585067 2.3000
We now have the data for the first question in a tidy format where each row represent a unique state/territory. Population size plays a big role in the amount of funds though, so it should be taken into consideration. Therefore let’s normalize the data and create a per capita feature.
# Normalize the data
# Add a per capita column - total funds / total population
df_merged <- df_merged |>
mutate(
funding_per_capita = (total_funding_billions * 1000000000) / total_population_2023,
)
head(df_merged)
## state_territory_tribal_nation total_population_2023 total_funding_billions
## 1 alabama 5117673 3.0
## 2 alaska 736510 3.7
## 3 arizona 7473027 3.5
## 4 arkansas 3069463 2.8
## 5 california 39198693 18.4
## 6 colorado 5901339 3.2
## funding_per_capita
## 1 586.2039
## 2 5023.6928
## 3 468.3510
## 4 912.2117
## 5 469.4034
## 6 542.2498
We now have data that considers the funds given the population. Yet, for the story of this data, seeing the exact funds numbers won’t be needed. We just need a general idea of the allocation of funds. So let’s set up some bins to simplify the visualization:
# Create bins for simplification
df_merged_bins <- mutate(
df_merged,
funding_per_capita_bins = cut(
funding_per_capita,
breaks = c(0, 400, 600, 800, 1000, 6000),
labels = c("< $400", "$400 to $600", "$600 to $800", "$800 to $1000", "> $1000"),
right = FALSE
)
)
head(df_merged_bins)
## state_territory_tribal_nation total_population_2023 total_funding_billions
## 1 alabama 5117673 3.0
## 2 alaska 736510 3.7
## 3 arizona 7473027 3.5
## 4 arkansas 3069463 2.8
## 5 california 39198693 18.4
## 6 colorado 5901339 3.2
## funding_per_capita funding_per_capita_bins
## 1 586.2039 $400 to $600
## 2 5023.6928 > $1000
## 3 468.3510 $400 to $600
## 4 912.2117 $800 to $1000
## 5 469.4034 $400 to $600
## 6 542.2498 $400 to $600
Now let’s look at the election data for the second part:
# Clean data
# Keep states/territory from the Associated Press
colnames(election_data)[colnames(election_data) == 'state'] <- 'state_territory_tribal_nation'
election_data$state_territory_tribal_nation <- tolower(election_data$state_territory_tribal_nation)
election_data_merged <- merge(election_data, df_merged_bins, by = "state_territory_tribal_nation", all.x = TRUE)
election_data_merged
## state_territory_tribal_nation state_abr trump_pct biden_pct trump_vote
## 1 alabama AL 62.2 36.7 1441170
## 2 alaska AK 53.1 43.0 189543
## 3 arizona AZ 49.1 49.4 1661686
## 4 arkansas AR 62.4 34.8 760647
## 5 california CA 34.3 63.5 5982194
## 6 colorado CO 41.9 55.4 1364471
## 7 connecticut CT 39.2 59.3 715291
## 8 delaware DE 39.8 58.8 200603
## 9 district of columbia DC 5.4 93.0 18586
## 10 florida FL 51.2 47.9 5668731
## 11 georgia GA 49.3 49.5 2461837
## 12 hawaii HI 34.3 63.7 196864
## 13 idaho ID 63.9 33.1 554128
## 14 illinois IL 40.5 57.6 2438943
## 15 indiana IN 57.1 41.0 1729516
## 16 iowa IA 53.2 45.0 897672
## 17 kansas KS 56.5 41.3 752933
## 18 kentucky KY 62.1 36.2 1326646
## 19 louisiana LA 58.5 39.9 1255776
## 20 maine ME 44.0 53.1 360480
## 21 maryland MD 32.4 65.8 976414
## 22 massachusetts MA 32.3 65.9 1167202
## 23 michigan MI 47.8 50.6 2649842
## 24 minnesota MN 45.4 52.6 1484056
## 25 mississippi MS 57.6 41.1 756789
## 26 missouri MO 56.8 41.4 1718282
## 27 montana MT 56.9 40.6 343647
## 28 nebraska NE 58.5 39.4 556846
## 29 nevada NV 47.7 50.1 669890
## 30 new hampshire NH 45.5 52.9 365660
## 31 new jersey NJ 41.4 57.3 1883260
## 32 new mexico NM 43.5 54.3 401894
## 33 new york NY 41.5 57.1 3005636
## 34 north carolina NC 50.1 48.7 2758775
## 35 north dakota ND 65.5 31.9 235595
## 36 ohio OH 53.3 45.3 3154834
## 37 oklahoma OK 65.4 32.3 1020280
## 38 oregon OR 40.7 56.9 958448
## 39 pennsylvania PA 48.8 50.0 3378263
## 40 rhode island RI 38.8 59.7 199922
## 41 south carolina SC 55.1 43.4 1385103
## 42 south dakota SD 61.8 35.6 261043
## 43 tennessee TN 60.7 37.4 1849556
## 44 texas TX 52.1 46.5 5890347
## 45 utah UT 58.2 37.7 865140
## 46 vermont VT 30.8 66.4 112704
## 47 virginia VA 44.2 54.4 1962430
## 48 washington WA 39.0 58.4 1584651
## 49 west virginia WV 68.6 29.7 545382
## 50 wisconsin WI 48.9 49.6 1610065
## 51 wyoming WY 70.4 26.7 193559
## biden_vote trump_win biden_win total_population_2023 total_funding_billions
## 1 849624 1 0 5117673 3.0000
## 2 153502 1 0 736510 3.7000
## 3 1672143 0 1 7473027 3.5000
## 4 423932 1 0 3069463 2.8000
## 5 11082293 0 1 39198693 18.4000
## 6 1804196 0 1 5901339 3.2000
## 7 1080680 0 1 3643023 2.5000
## 8 296268 0 1 1036423 0.7920
## 9 317323 0 1 687324 1.1000
## 10 5297045 1 0 22904868 8.2000
## 11 2474507 0 1 11064432 5.0000
## 12 366130 0 1 1441387 1.0000
## 13 287031 1 0 1971122 1.2000
## 14 3463260 0 1 12642259 8.4000
## 15 1242413 1 0 6880131 3.4000
## 16 759061 1 0 3218414 2.4000
## 17 551144 1 0 2951500 1.5000
## 18 772474 1 0 4550595 3.9000
## 19 856034 1 0 4588071 4.3000
## 20 434966 0 1 1399646 1.1000
## 21 1985023 0 1 6217062 2.7000
## 22 2382202 0 1 7066568 3.6000
## 23 2804040 0 1 10083356 5.2000
## 24 1717077 0 1 5753048 2.7000
## 25 539508 1 0 2943172 2.3000
## 26 1252902 1 0 6208038 3.8000
## 27 244836 1 0 1131302 3.3000
## 28 374583 1 0 1987864 1.3000
## 29 703486 0 1 3214363 1.7000
## 30 424935 0 1 1402199 0.7518
## 31 2608327 0 1 9379642 5.1000
## 32 501614 0 1 2121164 2.6000
## 33 4137641 0 1 19737367 10.1000
## 34 2684292 1 0 10881189 4.5000
## 35 114902 1 0 789047 1.8000
## 36 2679165 1 0 11824034 6.6000
## 37 503890 1 0 4063882 2.9000
## 38 1340383 0 1 4253653 2.3000
## 39 3459923 0 1 13017721 8.1000
## 40 307486 0 1 1103429 1.1000
## 41 1091541 1 0 5387830 2.3000
## 42 150471 1 0 918305 1.3000
## 43 1139332 1 0 7148304 3.7000
## 44 5259126 1 0 30727890 14.2000
## 45 560282 1 0 3443222 1.8000
## 46 242820 0 1 648708 0.8521
## 47 2413568 0 1 8734685 4.5000
## 48 2369612 0 1 7857320 4.0000
## 49 235984 1 0 1770495 2.0000
## 50 1630673 0 1 5930405 2.8000
## 51 73491 1 0 585067 2.3000
## funding_per_capita funding_per_capita_bins
## 1 586.2039 $400 to $600
## 2 5023.6928 > $1000
## 3 468.3510 $400 to $600
## 4 912.2117 $800 to $1000
## 5 469.4034 $400 to $600
## 6 542.2498 $400 to $600
## 7 686.2433 $600 to $800
## 8 764.1668 $600 to $800
## 9 1600.4097 > $1000
## 10 358.0025 < $400
## 11 451.8985 $400 to $600
## 12 693.7762 $600 to $800
## 13 608.7903 $600 to $800
## 14 664.4382 $600 to $800
## 15 494.1766 $400 to $600
## 16 745.7089 $600 to $800
## 17 508.2162 $400 to $600
## 18 857.0308 $800 to $1000
## 19 937.2130 $800 to $1000
## 20 785.9130 $600 to $800
## 21 434.2887 $400 to $600
## 22 509.4411 $400 to $600
## 23 515.7013 $400 to $600
## 24 469.3164 $400 to $600
## 25 781.4698 $600 to $800
## 26 612.1097 $600 to $800
## 27 2916.9930 > $1000
## 28 653.9683 $600 to $800
## 29 528.8762 $400 to $600
## 30 536.1578 $400 to $600
## 31 543.7308 $400 to $600
## 32 1225.7421 > $1000
## 33 511.7197 $400 to $600
## 34 413.5577 $400 to $600
## 35 2281.2329 > $1000
## 36 558.1851 $400 to $600
## 37 713.6034 $600 to $800
## 38 540.7117 $400 to $600
## 39 622.2287 $600 to $800
## 40 996.8924 $800 to $1000
## 41 426.8880 $400 to $600
## 42 1415.6517 > $1000
## 43 517.6053 $400 to $600
## 44 462.1209 $400 to $600
## 45 522.7662 $400 to $600
## 46 1313.5340 > $1000
## 47 515.1874 $400 to $600
## 48 509.0794 $400 to $600
## 49 1129.6276 > $1000
## 50 472.1431 $400 to $600
## 51 3931.1737 > $1000
# Merge in the data
election_data_merged <- election_data_merged |>
select(state_territory_tribal_nation, biden_win, total_funding_billions, funding_per_capita, funding_per_capita_bins)
election_data_merged <- election_data_merged |>
mutate(
results = ifelse(biden_win == 1, "Biden", "Trump")
)
# Final dataset
head(election_data_merged)
## state_territory_tribal_nation biden_win total_funding_billions
## 1 alabama 0 3.0
## 2 alaska 0 3.7
## 3 arizona 1 3.5
## 4 arkansas 0 2.8
## 5 california 1 18.4
## 6 colorado 1 3.2
## funding_per_capita funding_per_capita_bins results
## 1 586.2039 $400 to $600 Trump
## 2 5023.6928 > $1000 Trump
## 3 468.3510 $400 to $600 Biden
## 4 912.2117 $800 to $1000 Trump
## 5 469.4034 $400 to $600 Biden
## 6 542.2498 $400 to $600 Biden
For this let’s just do a quick EDA (given the main point of this assignment are the visualizations).
summary(election_data_merged)
## state_territory_tribal_nation biden_win total_funding_billions
## Length:51 Min. :0.0000 Min. : 0.7518
## Class :character 1st Qu.:0.0000 1st Qu.: 1.8000
## Mode :character Median :1.0000 Median : 2.8000
## Mean :0.5098 Mean : 3.7568
## 3rd Qu.:1.0000 3rd Qu.: 4.1500
## Max. :1.0000 Max. :18.4000
## funding_per_capita funding_per_capita_bins results
## Min. : 358.0 < $400 : 1 Length:51
## 1st Qu.: 509.3 $400 to $600 :25 Class :character
## Median : 586.2 $600 to $800 :12 Mode :character
## Mean : 896.9 $800 to $1000: 4
## 3rd Qu.: 821.5 > $1000 : 9
## Max. :5023.7
summary(df_merged_bins)
## state_territory_tribal_nation total_population_2023 total_funding_billions
## Length:51 Min. : 585067 Min. : 0.7518
## Class :character 1st Qu.: 1870808 1st Qu.: 1.8000
## Mode :character Median : 4550595 Median : 2.8000
## Mean : 6604044 Mean : 3.7568
## 3rd Qu.: 7665174 3rd Qu.: 4.1500
## Max. :39198693 Max. :18.4000
## funding_per_capita funding_per_capita_bins
## Min. : 358.0 < $400 : 1
## 1st Qu.: 509.3 $400 to $600 :25
## Median : 586.2 $600 to $800 :12
## Mean : 896.9 $800 to $1000: 4
## 3rd Qu.: 821.5 > $1000 : 9
## Max. :5023.7
df_merged_total_sorted <- df_merged_bins |>
arrange(desc(total_funding_billions)) |>
slice(1:10)
ggplot(df_merged_total_sorted, aes(x = reorder(state_territory_tribal_nation, total_funding_billions), y = total_funding_billions)) +
geom_col(fill = lighten("maroon", 0.2)) +
scale_y_continuous(
name = "Total Funding in Billions",
) +
coord_flip() +
theme(
text = element_text(color = "gray30"),
axis.text.x = element_text(color = "gray30"),
axis.title.y = element_blank(),
axis.line.y = element_blank(),
axis.ticks.length = unit(0, "pt"),
axis.text.y = element_text(
size = 8,
),
) +
labs(title = "Top 10 Highest IIJA Funded States in the United States")
Fig. 1 shows us that it appears the states with the largest cities are the most funded. Why is this?
# This graph is inspired by the Fundamentals of Data Visualization textbook
# geom_statebins only supports the 50 states, to my knowledge
state_names <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming")
df_merged_bins_updated <- df_merged_bins
colnames(df_merged_bins_updated)[colnames(df_merged_bins_updated) == 'state_territory_tribal_nation'] <- 'state'
filtered_df_merged_bins_updated <- df_merged_bins_updated %>%
filter(state %in% tolower(state_names))
filtered_df_merged_bins_updated$state <- str_to_title(filtered_df_merged_bins_updated$state)
filter(filtered_df_merged_bins_updated) %>%
ggplot(aes(state = state, fill = funding_per_capita_bins)) +
geom_statebins(lbl_size = 14/.pt) +
expand_limits(x = -1.3) + # make space for legend
coord_equal(expand = FALSE) +
scale_fill_discrete_sequential(
h1 = -83, h2 = 20, c1 = 30, cmax = 40, c2 = 0, l1 = 20, l2 = 100, p1 = 1, p2 = 1.2, rev = TRUE,
name = "IIJA Funding Per Capita",
nmax = 7,
order = 2:6,
guide = guide_legend(
override.aes = list(colour = "white", size = 1),
reverse = TRUE
)
) +
theme(
panel.grid.major = element_blank(), # Removes grid lines
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.title.x = element_blank(), # Remove x-axis title
axis.text.x = element_blank(), # Remove x-axis text (labels)
axis.title.y = element_blank(), # Remove y-axis title
axis.text.y = element_blank(), # Remove y-axis text (labels)
axis.ticks.x = element_blank(), # Remove x-axis tick marks
axis.ticks.y = element_blank(), # Remove y-axis tick marks
legend.background = element_blank(),
legend.position = c(0, 1),
legend.justification = c(0, 1),
legend.spacing.x = grid::unit(2, "pt"),
legend.spacing.y = grid::unit(2, "pt"),
legend.title = element_text(hjust = 0.5),
legend.key.width = grid::unit(18, "pt"),
legend.key.height = grid::unit(15, "pt")
) +
labs(title = "Allocation of IIJA Funding (Per Capita) in the United States")
## Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
## 3.5.0.
## ℹ Please use the `legend.position.inside` argument of `theme()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Fig. 2 shows all the rural states (the Midwest and Alaska) are the darkest colored states indicating the states with the highest funds per capita. There is a bias towards the least populated states in the US.
# This graph is inspired by the Fundamentals of Data Visualization textbook
# geom_statebins only supports the 50 states, to my knowledge
state_names <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming")
red <- lighten("red", amount = 0.2)
blue <- lighten("blue", amount = 0.2)
election_data_merged_updated <- election_data_merged
colnames(election_data_merged_updated)[colnames(election_data_merged_updated) == 'state_territory_tribal_nation'] <- 'state'
filtered_election_data_merged_updated <- election_data_merged_updated %>%
filter(state %in% tolower(state_names))
filtered_election_data_merged_updated$state <- str_to_title(filtered_election_data_merged_updated$state)
filter(filtered_election_data_merged_updated) %>%
ggplot(aes(state = state, fill = results)) +
geom_statebins(lbl_size = 14/.pt) +
expand_limits(x = -1.3) + # make space for legend
coord_equal(expand = FALSE) +
scale_fill_manual(values = c("Biden" = blue, "Trump" = red)) +
labs(title = "2020 Presidential Election Results", fill = "Results") +
theme(
panel.grid.major = element_blank(), # Removes grid lines
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.title.x = element_blank(), # Remove x-axis title
axis.text.x = element_blank(), # Remove x-axis text (labels)
axis.title.y = element_blank(), # Remove y-axis title
axis.text.y = element_blank(), # Remove y-axis text (labels)
axis.ticks.x = element_blank(), # Remove x-axis tick marks
axis.ticks.y = element_blank(), # Remove y-axis tick marks
legend.background = element_blank(),
legend.position = c(0, 1),
legend.justification = c(0, 1),
legend.spacing.x = grid::unit(2, "pt"),
legend.spacing.y = grid::unit(2, "pt"),
legend.title = element_text(hjust = 0.5),
legend.key.width = grid::unit(18, "pt"),
legend.key.height = grid::unit(15, "pt")
)
Fig. 3 shows all the rural states (the Midwest and Alaska), the highest funded states from Fig. 2, are red indicating the states that voted for Trump. With a couple exceptions such as Vermont and New Mexico, there is a bias towards states of the opposite party.
# This graph is inspired by the Fundamentals of Data Visualization textbook
# geom_statebins only supports the 50 states, to my knowledge
state_names <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming")
election_data_merged_updated <- election_data_merged
colnames(election_data_merged_updated)[colnames(election_data_merged_updated) == 'state_territory_tribal_nation'] <- 'state'
filtered_election_data_merged_updated <- election_data_merged_updated %>%
filter(state %in% tolower(state_names))
filtered_election_data_merged_updated$state <- str_to_title(filtered_election_data_merged_updated$state)
filter(filtered_election_data_merged_updated) %>%
ggplot(aes(state = state, fill = funding_per_capita_bins)) +
geom_statebins(lbl_size = 14/.pt) +
geom_statebins(lbl_size = 14/.pt,
data = subset(filtered_election_data_merged_updated, results == "Trump"),
dark_lbl = lighten("red", amount = 0.2),
light_lbl = lighten("red", amount = 0.2)) +
geom_statebins(lbl_size = 14/.pt,
data = subset(filtered_election_data_merged_updated, results == "Biden"),
dark_lbl = lighten("blue", amount = 0.2),
light_lbl = lighten("blue", amount = 0.2)) +
expand_limits(x = -1.3) + # make space for legend
coord_equal(expand = FALSE) +
scale_fill_discrete_sequential(
h1 = -83, h2 = 20, c1 = 30, cmax = 40, c2 = 0, l1 = 20, l2 = 100, p1 = 1, p2 = 1.2, rev = TRUE,
name = "IIJA Funding Per Capita",
nmax = 7,
order = 2:6,
guide = guide_legend(
override.aes = list(colour = "white", size = 1),
reverse = TRUE
)
) +
labs(title = "IIJA Funding Per Capita for the United States Given the Political Party", fill = "Results") +
theme(
panel.grid.major = element_blank(), # Removes grid lines
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.title.x = element_blank(), # Remove x-axis title
axis.text.x = element_blank(), # Remove x-axis text (labels)
axis.title.y = element_blank(), # Remove y-axis title
axis.text.y = element_blank(), # Remove y-axis text (labels)
axis.ticks.x = element_blank(), # Remove x-axis tick marks
axis.ticks.y = element_blank(), # Remove y-axis tick marks
legend.background = element_blank(),
legend.position = c(0, 1),
legend.justification = c(0, 1),
legend.spacing.x = grid::unit(2, "pt"),
legend.spacing.y = grid::unit(2, "pt"),
legend.title = element_text(hjust = 0.5),
legend.key.width = grid::unit(18, "pt"),
legend.key.height = grid::unit(15, "pt")
)
Fig. 4 highlights the political party of each state while showing how IIJA is being allocated. The least populated states are getting higher funds per capita. Although it may seem the states with the highest populations are getting the most funds (as shown in Fig. 1), if you take into account the amount of people in each state, the funds are a lot less for the states with bigger cities. One reason for this could be due to infrastructure issues in rural areas that need the funds. Another reason could be the Biden administration wanting to win over states that did not vote for him in 2020. This data comes from 2023, so it’s possible this allocation was heavily related to the 2024 election year.