The attached Excel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory.
Your story (Data Visualization(s) ) should address the following questions:
Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent?
Does the allocation favor the political interests of the Biden administration?
Notes:
You will need to source data on the current (estimated) population of each of the States and Territories (accuracy is more important than precision) and on the official election results of the 2020 Presidential election.
You may choose to develop you visualizations using a desktop application or a code library. Your submittal should be in the form of a report (document) or a presentation.
This assignment is due by the end of week two of the semester.
Current (estimated) population of each of the States and Territories:
This data is from the US Census https://www.census.gov/data/tables/time-series/demo/popest/2020s-state-total.html
Official election results of the 2020 Presidential election:
This data comes from the Associated Press https://www.kaggle.com/datasets/callummacpherson14/2020-us-presidential-election-results-by-state?resource=download
library(statebins)
library(hrbrthemes)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(httr)
library(dplyr)
library(stringr)
library(rvest)
##
## Attaching package: 'rvest'
##
## The following object is masked from 'package:readr':
##
## guess_encoding
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(stringr)
library(tidytext)
library(tibble)
library(textdata)
##
## Attaching package: 'textdata'
##
## The following object is masked from 'package:httr':
##
## cache_info
library(tidyr)
library(readr)
library(purrr)
library(forcats)
library(ggplot2)
library(colorspace)
# Read in funding data from 2023
fund_data <- read_csv('https://raw.githubusercontent.com/gillianmcgovern0/cuny-data-608/refs/heads/main/IIJA%20FUNDING%20AS%20OF%20MARCH%202023.csv', show_col_types = FALSE)
head(fund_data)
## # A tibble: 6 Ă— 2
## `State, Teritory or Tribal Nation` `Total (Billions)`
## <chr> <dbl>
## 1 ALABAMA 3
## 2 ALASKA 3.7
## 3 AMERICAN SAMOA 0.0686
## 4 ARIZONA 3.5
## 5 ARKANSAS 2.8
## 6 CALIFORNIA 18.4
# Read in population data from 2023
pop_data <- read_csv('https://raw.githubusercontent.com/gillianmcgovern0/cuny-data-608/refs/heads/main/NST-EST2024-POP.csv', show_col_types = FALSE)
## New names:
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
## • `` -> `...14`
## • `` -> `...15`
## • `` -> `...16`
## • `` -> `...17`
## • `` -> `...18`
## • `` -> `...19`
## • `` -> `...20`
## • `` -> `...21`
## • `` -> `...22`
## • `` -> `...23`
## • `` -> `...24`
## • `` -> `...25`
## • `` -> `...26`
## • `` -> `...27`
## • `` -> `...28`
## • `` -> `...29`
## • `` -> `...30`
## • `` -> `...31`
## • `` -> `...32`
## • `` -> `...33`
## • `` -> `...34`
## • `` -> `...35`
## • `` -> `...36`
head(pop_data)
## # A tibble: 6 Ă— 36
## table with row heade…¹ ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
## 1 Annual Estimates of t… <NA> <NA> NA NA NA NA NA NA
## 2 Geographic Area Apri… Popu… NA NA NA NA NA NA
## 3 <NA> <NA> 2020 2.02e3 2.02e3 2.02e3 2.02e3 NA NA
## 4 United States 331,… 331,… 3.32e8 3.34e8 3.37e8 3.40e8 NA NA
## 5 Northeast 57,6… 57,4… 5.73e7 5.72e7 5.74e7 5.78e7 NA NA
## 6 Midwest 68,9… 68,9… 6.89e7 6.89e7 6.92e7 6.96e7 NA NA
## # ℹ abbreviated name:
## # ¹​`table with row headers in column A and column headers in rows 3 through 4. (leading dots indicate sub-parts)`
## # ℹ 27 more variables: ...10 <lgl>, ...11 <lgl>, ...12 <lgl>, ...13 <lgl>,
## # ...14 <lgl>, ...15 <lgl>, ...16 <lgl>, ...17 <lgl>, ...18 <lgl>,
## # ...19 <lgl>, ...20 <lgl>, ...21 <lgl>, ...22 <lgl>, ...23 <lgl>,
## # ...24 <lgl>, ...25 <lgl>, ...26 <lgl>, ...27 <lgl>, ...28 <lgl>,
## # ...29 <lgl>, ...30 <lgl>, ...31 <lgl>, ...32 <lgl>, ...33 <lgl>, …
# Read in 2020 presidential election results data
election_data <- read_csv('https://raw.githubusercontent.com/gillianmcgovern0/cuny-data-608/refs/heads/main/voting.csv', show_col_types = FALSE)
head(election_data)
## # A tibble: 6 Ă— 8
## state state_abr trump_pct biden_pct trump_vote biden_vote trump_win biden_win
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Alaska AK 53.1 43 189543 153502 1 0
## 2 Hawaii HI 34.3 63.7 196864 366130 0 1
## 3 Washi… WA 39 58.4 1584651 2369612 0 1
## 4 Oregon OR 40.7 56.9 958448 1340383 0 1
## 5 Calif… CA 34.3 63.5 5982194 11082293 0 1
## 6 Idaho ID 63.9 33.1 554128 287031 1 0
Cleaning up the data:
# Cleaning
# Rename columns
colnames(fund_data)[colnames(fund_data) == 'State, Teritory or Tribal Nation'] <- 'state_territory_tribal_nation'
colnames(fund_data)[colnames(fund_data) == 'Total (Billions)'] <- 'total_funding_billions'
# Make states lowercase
fund_data$state_territory_tribal_nation <- tolower(str_to_title(fund_data$state_territory_tribal_nation))
head(fund_data, 20)
## # A tibble: 20 Ă— 2
## state_territory_tribal_nation total_funding_billions
## <chr> <dbl>
## 1 alabama 3
## 2 alaska 3.7
## 3 american samoa 0.0686
## 4 arizona 3.5
## 5 arkansas 2.8
## 6 california 18.4
## 7 colorado 3.2
## 8 connecticut 2.5
## 9 deleware 0.792
## 10 district of columbia 1.1
## 11 florida 8.2
## 12 georgia 5
## 13 guam 0.143
## 14 hawaii 1
## 15 idaho 1.2
## 16 illinois 8.4
## 17 indiana 3.4
## 18 iowa 2.4
## 19 kansas 1.5
## 20 kentucky 3.9
# Get rid of useless rows and columns
colnames(pop_data) <- pop_data[3, ]
names(pop_data)[1:2] <- c("state_territory_tribal_nation", "baseline")
new_pop_data <- pop_data %>%
select("state_territory_tribal_nation", "2023")
new_pop_data <- new_pop_data[-(1:8), ]
new_pop_data <- new_pop_data[-(52:59), ]
# Get rid of characters in file
new_pop_data$state_territory_tribal_nation <- gsub("\\.", "", new_pop_data$state_territory_tribal_nation)
# Rename columns
colnames(new_pop_data) <- c("state_territory_tribal_nation", "total_population_2023")
# Make states lowercase (to merge later)
new_pop_data$state_territory_tribal_nation <- tolower(new_pop_data$state_territory_tribal_nation)
head(new_pop_data)
## # A tibble: 6 Ă— 2
## state_territory_tribal_nation total_population_2023
## <chr> <dbl>
## 1 alabama 5117673
## 2 alaska 736510
## 3 arizona 7473027
## 4 arkansas 3069463
## 5 california 39198693
## 6 colorado 5901339
Merge the data frames:
# Merge the data on state column
# Fix Delaware spelling in funding data
fund_data$state_territory_tribal_nation[fund_data$state_territory_tribal_nation == "deleware"] <- "delaware"
# Use states/territories from the US Census dataset (trusted source)
df_merged <- merge(new_pop_data, fund_data, by = "state_territory_tribal_nation", all.x = TRUE)
df_merged
## state_territory_tribal_nation total_population_2023 total_funding_billions
## 1 alabama 5117673 3.0000
## 2 alaska 736510 3.7000
## 3 arizona 7473027 3.5000
## 4 arkansas 3069463 2.8000
## 5 california 39198693 18.4000
## 6 colorado 5901339 3.2000
## 7 connecticut 3643023 2.5000
## 8 delaware 1036423 0.7920
## 9 district of columbia 687324 1.1000
## 10 florida 22904868 8.2000
## 11 georgia 11064432 5.0000
## 12 hawaii 1441387 1.0000
## 13 idaho 1971122 1.2000
## 14 illinois 12642259 8.4000
## 15 indiana 6880131 3.4000
## 16 iowa 3218414 2.4000
## 17 kansas 2951500 1.5000
## 18 kentucky 4550595 3.9000
## 19 louisiana 4588071 4.3000
## 20 maine 1399646 1.1000
## 21 maryland 6217062 2.7000
## 22 massachusetts 7066568 3.6000
## 23 michigan 10083356 5.2000
## 24 minnesota 5753048 2.7000
## 25 mississippi 2943172 2.3000
## 26 missouri 6208038 3.8000
## 27 montana 1131302 3.3000
## 28 nebraska 1987864 1.3000
## 29 nevada 3214363 1.7000
## 30 new hampshire 1402199 0.7518
## 31 new jersey 9379642 5.1000
## 32 new mexico 2121164 2.6000
## 33 new york 19737367 10.1000
## 34 north carolina 10881189 4.5000
## 35 north dakota 789047 1.8000
## 36 ohio 11824034 6.6000
## 37 oklahoma 4063882 2.9000
## 38 oregon 4253653 2.3000
## 39 pennsylvania 13017721 8.1000
## 40 rhode island 1103429 1.1000
## 41 south carolina 5387830 2.3000
## 42 south dakota 918305 1.3000
## 43 tennessee 7148304 3.7000
## 44 texas 30727890 14.2000
## 45 utah 3443222 1.8000
## 46 vermont 648708 0.8521
## 47 virginia 8734685 4.5000
## 48 washington 7857320 4.0000
## 49 west virginia 1770495 2.0000
## 50 wisconsin 5930405 2.8000
## 51 wyoming 585067 2.3000
We now have the data for the first question in a tidy format where each row represent a unique state/territory. Population size plays a big role in the amount of funds though, so it should be taken into consideration. Therefore let’s normalize the data and create a per capita feature.
# Normalize the data
# Add a per capita column - total funds / total population
df_merged <- df_merged |>
mutate(
funding_per_capita = (total_funding_billions * 1000000000) / total_population_2023,
)
head(df_merged)
## state_territory_tribal_nation total_population_2023 total_funding_billions
## 1 alabama 5117673 3.0
## 2 alaska 736510 3.7
## 3 arizona 7473027 3.5
## 4 arkansas 3069463 2.8
## 5 california 39198693 18.4
## 6 colorado 5901339 3.2
## funding_per_capita
## 1 586.2039
## 2 5023.6928
## 3 468.3510
## 4 912.2117
## 5 469.4034
## 6 542.2498
We now have data that considers the funds given the population. Yet, for the story of this data, seeing the exact funds numbers won’t be needed. We just need a general idea of the allocation of funds. So let’s set up some bins to simplify the visualization:
# Create bins for simplification
df_merged_bins <- mutate(
df_merged,
funding_per_capita_bins = cut(
funding_per_capita,
breaks = c(0, 400, 600, 800, 1000, 6000),
labels = c("< $400", "$400 to $600", "$600 to $800", "$800 to $1000", "> $1000"),
right = FALSE
)
)
head(df_merged_bins)
## state_territory_tribal_nation total_population_2023 total_funding_billions
## 1 alabama 5117673 3.0
## 2 alaska 736510 3.7
## 3 arizona 7473027 3.5
## 4 arkansas 3069463 2.8
## 5 california 39198693 18.4
## 6 colorado 5901339 3.2
## funding_per_capita funding_per_capita_bins
## 1 586.2039 $400 to $600
## 2 5023.6928 > $1000
## 3 468.3510 $400 to $600
## 4 912.2117 $800 to $1000
## 5 469.4034 $400 to $600
## 6 542.2498 $400 to $600
Now let’s look at the election data for the second part:
# Clean data
# Keep states/territory from the Associated Press
colnames(election_data)[colnames(election_data) == 'state'] <- 'state_territory_tribal_nation'
election_data$state_territory_tribal_nation <- tolower(election_data$state_territory_tribal_nation)
election_data_merged <- merge(election_data, df_merged_bins, by = "state_territory_tribal_nation", all.x = TRUE)
election_data_merged
## state_territory_tribal_nation state_abr trump_pct biden_pct trump_vote
## 1 alabama AL 62.2 36.7 1441170
## 2 alaska AK 53.1 43.0 189543
## 3 arizona AZ 49.1 49.4 1661686
## 4 arkansas AR 62.4 34.8 760647
## 5 california CA 34.3 63.5 5982194
## 6 colorado CO 41.9 55.4 1364471
## 7 connecticut CT 39.2 59.3 715291
## 8 delaware DE 39.8 58.8 200603
## 9 district of columbia DC 5.4 93.0 18586
## 10 florida FL 51.2 47.9 5668731
## 11 georgia GA 49.3 49.5 2461837
## 12 hawaii HI 34.3 63.7 196864
## 13 idaho ID 63.9 33.1 554128
## 14 illinois IL 40.5 57.6 2438943
## 15 indiana IN 57.1 41.0 1729516
## 16 iowa IA 53.2 45.0 897672
## 17 kansas KS 56.5 41.3 752933
## 18 kentucky KY 62.1 36.2 1326646
## 19 louisiana LA 58.5 39.9 1255776
## 20 maine ME 44.0 53.1 360480
## 21 maryland MD 32.4 65.8 976414
## 22 massachusetts MA 32.3 65.9 1167202
## 23 michigan MI 47.8 50.6 2649842
## 24 minnesota MN 45.4 52.6 1484056
## 25 mississippi MS 57.6 41.1 756789
## 26 missouri MO 56.8 41.4 1718282
## 27 montana MT 56.9 40.6 343647
## 28 nebraska NE 58.5 39.4 556846
## 29 nevada NV 47.7 50.1 669890
## 30 new hampshire NH 45.5 52.9 365660
## 31 new jersey NJ 41.4 57.3 1883260
## 32 new mexico NM 43.5 54.3 401894
## 33 new york NY 41.5 57.1 3005636
## 34 north carolina NC 50.1 48.7 2758775
## 35 north dakota ND 65.5 31.9 235595
## 36 ohio OH 53.3 45.3 3154834
## 37 oklahoma OK 65.4 32.3 1020280
## 38 oregon OR 40.7 56.9 958448
## 39 pennsylvania PA 48.8 50.0 3378263
## 40 rhode island RI 38.8 59.7 199922
## 41 south carolina SC 55.1 43.4 1385103
## 42 south dakota SD 61.8 35.6 261043
## 43 tennessee TN 60.7 37.4 1849556
## 44 texas TX 52.1 46.5 5890347
## 45 utah UT 58.2 37.7 865140
## 46 vermont VT 30.8 66.4 112704
## 47 virginia VA 44.2 54.4 1962430
## 48 washington WA 39.0 58.4 1584651
## 49 west virginia WV 68.6 29.7 545382
## 50 wisconsin WI 48.9 49.6 1610065
## 51 wyoming WY 70.4 26.7 193559
## biden_vote trump_win biden_win total_population_2023 total_funding_billions
## 1 849624 1 0 5117673 3.0000
## 2 153502 1 0 736510 3.7000
## 3 1672143 0 1 7473027 3.5000
## 4 423932 1 0 3069463 2.8000
## 5 11082293 0 1 39198693 18.4000
## 6 1804196 0 1 5901339 3.2000
## 7 1080680 0 1 3643023 2.5000
## 8 296268 0 1 1036423 0.7920
## 9 317323 0 1 687324 1.1000
## 10 5297045 1 0 22904868 8.2000
## 11 2474507 0 1 11064432 5.0000
## 12 366130 0 1 1441387 1.0000
## 13 287031 1 0 1971122 1.2000
## 14 3463260 0 1 12642259 8.4000
## 15 1242413 1 0 6880131 3.4000
## 16 759061 1 0 3218414 2.4000
## 17 551144 1 0 2951500 1.5000
## 18 772474 1 0 4550595 3.9000
## 19 856034 1 0 4588071 4.3000
## 20 434966 0 1 1399646 1.1000
## 21 1985023 0 1 6217062 2.7000
## 22 2382202 0 1 7066568 3.6000
## 23 2804040 0 1 10083356 5.2000
## 24 1717077 0 1 5753048 2.7000
## 25 539508 1 0 2943172 2.3000
## 26 1252902 1 0 6208038 3.8000
## 27 244836 1 0 1131302 3.3000
## 28 374583 1 0 1987864 1.3000
## 29 703486 0 1 3214363 1.7000
## 30 424935 0 1 1402199 0.7518
## 31 2608327 0 1 9379642 5.1000
## 32 501614 0 1 2121164 2.6000
## 33 4137641 0 1 19737367 10.1000
## 34 2684292 1 0 10881189 4.5000
## 35 114902 1 0 789047 1.8000
## 36 2679165 1 0 11824034 6.6000
## 37 503890 1 0 4063882 2.9000
## 38 1340383 0 1 4253653 2.3000
## 39 3459923 0 1 13017721 8.1000
## 40 307486 0 1 1103429 1.1000
## 41 1091541 1 0 5387830 2.3000
## 42 150471 1 0 918305 1.3000
## 43 1139332 1 0 7148304 3.7000
## 44 5259126 1 0 30727890 14.2000
## 45 560282 1 0 3443222 1.8000
## 46 242820 0 1 648708 0.8521
## 47 2413568 0 1 8734685 4.5000
## 48 2369612 0 1 7857320 4.0000
## 49 235984 1 0 1770495 2.0000
## 50 1630673 0 1 5930405 2.8000
## 51 73491 1 0 585067 2.3000
## funding_per_capita funding_per_capita_bins
## 1 586.2039 $400 to $600
## 2 5023.6928 > $1000
## 3 468.3510 $400 to $600
## 4 912.2117 $800 to $1000
## 5 469.4034 $400 to $600
## 6 542.2498 $400 to $600
## 7 686.2433 $600 to $800
## 8 764.1668 $600 to $800
## 9 1600.4097 > $1000
## 10 358.0025 < $400
## 11 451.8985 $400 to $600
## 12 693.7762 $600 to $800
## 13 608.7903 $600 to $800
## 14 664.4382 $600 to $800
## 15 494.1766 $400 to $600
## 16 745.7089 $600 to $800
## 17 508.2162 $400 to $600
## 18 857.0308 $800 to $1000
## 19 937.2130 $800 to $1000
## 20 785.9130 $600 to $800
## 21 434.2887 $400 to $600
## 22 509.4411 $400 to $600
## 23 515.7013 $400 to $600
## 24 469.3164 $400 to $600
## 25 781.4698 $600 to $800
## 26 612.1097 $600 to $800
## 27 2916.9930 > $1000
## 28 653.9683 $600 to $800
## 29 528.8762 $400 to $600
## 30 536.1578 $400 to $600
## 31 543.7308 $400 to $600
## 32 1225.7421 > $1000
## 33 511.7197 $400 to $600
## 34 413.5577 $400 to $600
## 35 2281.2329 > $1000
## 36 558.1851 $400 to $600
## 37 713.6034 $600 to $800
## 38 540.7117 $400 to $600
## 39 622.2287 $600 to $800
## 40 996.8924 $800 to $1000
## 41 426.8880 $400 to $600
## 42 1415.6517 > $1000
## 43 517.6053 $400 to $600
## 44 462.1209 $400 to $600
## 45 522.7662 $400 to $600
## 46 1313.5340 > $1000
## 47 515.1874 $400 to $600
## 48 509.0794 $400 to $600
## 49 1129.6276 > $1000
## 50 472.1431 $400 to $600
## 51 3931.1737 > $1000
# Merge in the data
election_data_merged <- election_data_merged |>
select(state_territory_tribal_nation, biden_win, total_funding_billions, funding_per_capita)
election_data_merged <- election_data_merged |>
mutate(
results = ifelse(biden_win == 1, "Biden", "Trump")
)
# Final dataset
head(election_data_merged)
## state_territory_tribal_nation biden_win total_funding_billions
## 1 alabama 0 3.0
## 2 alaska 0 3.7
## 3 arizona 1 3.5
## 4 arkansas 0 2.8
## 5 california 1 18.4
## 6 colorado 1 3.2
## funding_per_capita results
## 1 586.2039 Trump
## 2 5023.6928 Trump
## 3 468.3510 Biden
## 4 912.2117 Trump
## 5 469.4034 Biden
## 6 542.2498 Biden
For this let’s just do a quick EDA (given the main point of this assignment are the visualizations).
summary(election_data_merged)
## state_territory_tribal_nation biden_win total_funding_billions
## Length:51 Min. :0.0000 Min. : 0.7518
## Class :character 1st Qu.:0.0000 1st Qu.: 1.8000
## Mode :character Median :1.0000 Median : 2.8000
## Mean :0.5098 Mean : 3.7568
## 3rd Qu.:1.0000 3rd Qu.: 4.1500
## Max. :1.0000 Max. :18.4000
## funding_per_capita results
## Min. : 358.0 Length:51
## 1st Qu.: 509.3 Class :character
## Median : 586.2 Mode :character
## Mean : 896.9
## 3rd Qu.: 821.5
## Max. :5023.7
summary(df_merged_bins)
## state_territory_tribal_nation total_population_2023 total_funding_billions
## Length:51 Min. : 585067 Min. : 0.7518
## Class :character 1st Qu.: 1870808 1st Qu.: 1.8000
## Mode :character Median : 4550595 Median : 2.8000
## Mean : 6604044 Mean : 3.7568
## 3rd Qu.: 7665174 3rd Qu.: 4.1500
## Max. :39198693 Max. :18.4000
## funding_per_capita funding_per_capita_bins
## Min. : 358.0 < $400 : 1
## 1st Qu.: 509.3 $400 to $600 :25
## Median : 586.2 $600 to $800 :12
## Mean : 896.9 $800 to $1000: 4
## 3rd Qu.: 821.5 > $1000 : 9
## Max. :5023.7
Since this is comparing US states, territories and tribal nations, my first data visualization will be a bar graph. For this graph, I want to emphasize the population. A cartogram would be cool to show this feature, but I do not have geometry data (which I believe I would need). Instead I am going to show population by the size of the bars themselves. To do this, I chose the estimated population on the y axis, with the x axis being the states. I then chose to sort by descending population so one can instantly see the largest and smallest populated states. I will also flip the axes, to make it easier to read the states.
To show the funding, I chose to use my bins. The reason I did this is because for this story, it doesn’t seem as relevant to know the exact numbers for funding. We just want to know the general trend in allocation. Additionally, I would like to use a gradient, and using a color gradient for continuous data can make the visualization hard to read. I chose different gradients of blue, and desaturated/lightened them to make the colors easier to look at. Since we really want to know which states are getting the most money, I chose to darken the highest bin of funding. That way the eye is drawn to the highest, and the lowest bin has the lightest color. This makes the coloring more intuitive.
I chose to use the funding per capita since population is important for this data.
# This graph is inspired by the Fundamentals of Data Visualization textbook
df_merged_bins <- df_merged_bins |>
arrange(desc(total_population_2023))
# Lighten the colors so they're not too harsh to look at
# Darken the highest bin so the eye focuses on it
funding_colors_bars <- c(desaturate(lighten(c("#ADD8E6", "#87CEEB", "#6495ED", "#4169E1"), .4), .8), darken("#56B4E9", .3))
funding_colors_axis <- c(rep("gray30", 4), darken("#56B4E9", .4))
funding_fontface <- c(rep("plain", 4), "bold")
ggplot(df_merged_bins, aes(x = reorder(state_territory_tribal_nation, total_population_2023), y = total_population_2023, fill = funding_per_capita_bins)) +
geom_col() +
scale_y_continuous(
labels = c(0, 1, 2, 3, 4),
name = "Estimated Population in 2023 in Millions",
) +
scale_fill_manual(
values = funding_colors_bars,
breaks = c("< $400", "$400 to $600", "$600 to $800", "$800 to $1000", "> $1000"),
name = "IIJA Funding Per Capita"
) +
coord_flip() +
theme(
text = element_text(color = "gray30"),
axis.text.x = element_text(color = "gray30"),
axis.title.y = element_blank(),
axis.line.y = element_blank(),
axis.ticks.length = unit(0, "pt"),
axis.text.y = element_text(
size = 8,
),
legend.position.inside = c(.56, .68),
legend.background = element_rect(fill = "#ffffffb0")
) +
labs(title = "Allocation of IIJA Funding (Per Capita) for the United States")
Story visualization breakdown:
The story of this data is that there is a bias towards the least populated states in the US in terms of allocation of IIJA funds. Why might this be? Possibly the more rural states actually need more funds for their infrastructure compared to the states with larger cities.
Drawbacks/possible future improvements (with more time):
The reason why I chose to use R for this assignment was to try my own cartogram heatmap which I learned from the Fundamentals of Data Visualization textbook. I personally really like this style of visualization as it easily shows a visual of the US. The downside of this graph is that it doesn’t visually show population size, but for this type of graph, I’m assuming the audience is somewhat familiar with the US. Additionally, the data I’m using still incorporates the population size.
To make the cartogram heatmap, I used geom_statebins which automatically places the states in the correct position which is very convenient. I don’t need to add on geometry data. For this graph I used the same bins for funding to easily show the difference in funds.
# Bonus graph
# This graph is inspired by the Fundamentals of Data Visualization textbook
# geom_statebins only supports the 50 states, to my knowledge
state_names <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming")
df_merged_bins_updated <- df_merged_bins
colnames(df_merged_bins_updated)[colnames(df_merged_bins_updated) == 'state_territory_tribal_nation'] <- 'state'
filtered_df_merged_bins_updated <- df_merged_bins_updated %>%
filter(state %in% tolower(state_names))
head(filtered_df_merged_bins_updated)
## state total_population_2023 total_funding_billions funding_per_capita
## 1 california 39198693 18.4 469.4034
## 2 texas 30727890 14.2 462.1209
## 3 florida 22904868 8.2 358.0025
## 4 new york 19737367 10.1 511.7197
## 5 pennsylvania 13017721 8.1 622.2287
## 6 illinois 12642259 8.4 664.4382
## funding_per_capita_bins
## 1 $400 to $600
## 2 $400 to $600
## 3 < $400
## 4 $400 to $600
## 5 $600 to $800
## 6 $600 to $800
filtered_df_merged_bins_updated$state <- str_to_title(filtered_df_merged_bins_updated$state)
filter(filtered_df_merged_bins_updated) %>%
ggplot(aes(state = state, fill = funding_per_capita_bins)) +
geom_statebins(lbl_size = 14/.pt) +
expand_limits(x = -1.3) + # make space for legend
coord_equal(expand = FALSE) +
scale_fill_discrete_sequential(
h1 = -83, h2 = 20, c1 = 30, cmax = 40, c2 = 0, l1 = 20, l2 = 100, p1 = 1, p2 = 1.2, rev = TRUE,
name = "IIJA Funding Per Capita",
nmax = 7,
order = 2:6,
guide = guide_legend(
override.aes = list(colour = "white", size = 1),
reverse = TRUE
)
) +
theme(
panel.grid.major = element_blank(), # Removes grid lines
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.title.x = element_blank(), # Remove x-axis title
axis.text.x = element_blank(), # Remove x-axis text (labels)
axis.title.y = element_blank(), # Remove y-axis title
axis.text.y = element_blank(), # Remove y-axis text (labels)
axis.ticks.x = element_blank(), # Remove x-axis tick marks
axis.ticks.y = element_blank(), # Remove y-axis tick marks
legend.background = element_blank(),
legend.position = c(0, 1),
legend.justification = c(0, 1),
legend.spacing.x = grid::unit(2, "pt"),
legend.spacing.y = grid::unit(2, "pt"),
legend.title = element_text(hjust = 0.5),
legend.key.width = grid::unit(18, "pt"),
legend.key.height = grid::unit(15, "pt")
)
## Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
## 3.5.0.
## ℹ Please use the `legend.position.inside` argument of `theme()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Story visualization breakdown:
Story of this data is that there is a bias towards the least populated states in the US in terms of allocation of IIJA funds.
Drawbacks of this graph:
Overall this graph is very pleasing and clear to me, but not might be to all audiences.
Since this part of the analysis also looks at comparing states, I will choose to use another bar graph. For this question, I want to focus on the allocation of funds, so I will make the total funds my y axis, with the fill for my bars representing the 2020 presidential election results. For the election results, I will choose blue for Biden which is the usual color for the democratic party and red for Trump for the republican party. I will also lighten the colors for this graph as well to make the colors less harsh for the eyes to look at.
For the first graph, let’s just look at total funding (not taking into account population size).
# Election bar graph
colors_bars <- c(lighten("blue", amount = 0.2), lighten("red", amount = 0.2))
ggplot(election_data_merged, aes(x = reorder(state_territory_tribal_nation, total_funding_billions), y = total_funding_billions, fill = results)) +
geom_col() +
scale_y_continuous(
name = "IIJA Funding (in Billions)",
) +
scale_fill_manual(
values = colors_bars,
name = "Election Results"
) +
coord_flip() +
theme(
text = element_text(color = "gray30"),
axis.text.x = element_text(color = "gray30"),
axis.title.y = element_blank(),
axis.line.y = element_blank(),
axis.ticks.length = unit(0, "pt"),
axis.text.y = element_text(
size = 8,
),
legend.position.inside = c(.56, .68),
legend.background = element_rect(fill = "#ffffffb0")
) +
labs(title = "Allocation of IIJA Funding (in Billions) for the United States")
Story visualization breakdown:
Story of this data is that the highest funded states are Biden supporters. Yet, this data does not take into account population size which does matter.
Drawbacks/future improvements:
Now let’s take into consideration population by looking at per capita:
# Per capita election bar graph
# Final visualization
ggplot(election_data_merged, aes(x = reorder(state_territory_tribal_nation, funding_per_capita), y = funding_per_capita, fill = results)) +
geom_col() +
scale_y_continuous(
name = "IIJA Funding (US Dollars Per Capita)",
) +
scale_fill_manual(
values = colors_bars,
name = "Election Results"
) +
coord_flip() +
theme(
text = element_text(color = "gray30"),
axis.text.x = element_text(color = "gray30"),
axis.title.y = element_blank(),
axis.line.y = element_blank(),
axis.ticks.length = unit(0, "pt"),
axis.text.y = element_text(
size = 8,
),
legend.position.inside = c(.56, .68),
legend.background = element_rect(fill = "#ffffffb0")
) +
labs(title = "Allocation of IIJA Funding (US Dollars Per Capita) for the United States")
Story visualization breakdown:
Story of this data is that the highest funded states (per capita) are Trump supporter states. This tells a different story than the previous graph. This tells us Biden is giving more money to states that did not support him back in 2020. This could indicate that Biden was trying to win these states by giving them more resources, despite their lower population size.
Drawbacks/future improvements:
Since this takes population into consideration, this would be the final visualization I would use.
Overall this assignment gave very good practice in creating detailed graphs. What these graphs showed us is that the least populated states are getting higher funds per capita. Although it may seem the states with the highest populations are getting the most funds, if you take into account the amount of people in each state, the funds are a lot less for the states with bigger cities. One reason for this could be due to infrastructure issues in rural areas that need the funds. Another reason could be the story from our 2nd set of graphs related to the election results. Possibly the Biden administration wanted to win over states that did not vote for him in 2020. The data comes from 2023, so it’s possible this allocation was heavily related to the 2024 election year.
Future Directions: