Story1 D608

Story 1 :

The attached Excel file contains data on the present allocation of the Infrastructure Investment and Jobs Act funding by State and Territory.

Source Data on the current (or estimated) population of each state and territory listed in the excel spreadsheet file, and on the official results of the presidential election. Your assignment is to use this data to create visualizations that answer the following two questions

Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent? Does the allocation favor the political interests of the Biden administration?

For this story two datasets were used: federalelections2020.xlxs and IIJA FUNDING AS OF MARCH 2023.xlxs

Preparing datasets

# Load required libraries
library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.4.3

## Warning: package 'ggplot2' was built under R version 4.4.3

## Warning: package 'tidyr' was built under R version 4.4.2

## Warning: package 'readr' was built under R version 4.4.2

## Warning: package 'purrr' was built under R version 4.4.3

## Warning: package 'dplyr' was built under R version 4.4.3

## Warning: package 'stringr' was built under R version 4.4.2

## Warning: package 'lubridate' was built under R version 4.4.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)

## Warning: package 'readxl' was built under R version 4.4.3

library(ggrepel)

## Warning: package 'ggrepel' was built under R version 4.4.3

library(scales)

## Warning: package 'scales' was built under R version 4.4.2

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

# Set working directory to where files are located
setwd("C:/Users/Dell/Downloads/Story1")

# 1. LOAD AND PREPARE POPULATION DATA - CORRECTED VERSION
pop_data <- read_excel("NST-EST2023-POP.xlsx", skip = 4) %>%
  select(1, 6) %>%  # Select first column (areas) and 6th column (2023 estimates)
  rename(Area = 1, Population = 2) %>%  # Rename columns by position
  filter(!is.na(Population)) %>%
  mutate(
    Area = str_remove(Area, "^\\."), # Remove leading dots
    Area = toupper(Area), # Standardize to uppercase
    # Fix specific area names to match funding data
    Area = case_when(
      Area == "DISTRICT OF COLUMBIA" ~ "DISTRICT OF COLUMBIA",
      Area == "PUERTO RICO" ~ "PUERTO RICO",
      TRUE ~ Area
    )
  ) %>%
  # Remove regional aggregates (keep only states/territories)
  filter(!Area %in% c("UNITED STATES", "NORTHEAST", "MIDWEST", "SOUTH", "WEST"))

# Verify the population data
head(pop_data)

## # A tibble: 6 × 2
##   Area       Population
##   <chr>           <dbl>
## 1 ALABAMA       5108468
## 2 ALASKA         733406
## 3 ARIZONA       7431344
## 4 ARKANSAS      3067732
## 5 CALIFORNIA   38965193
## 6 COLORADO      5877610

# 2. LOAD AND PREPARE FUNDING DATA
funding_data <- read_csv("IIJA FUNDING AS OF MARCH 2023.csv") %>%
  rename(
    Area = `State, Teritory or Tribal Nation`,
    Funding = `Total (Billions)`
  ) %>%
  mutate(
    Area = toupper(Area),
    # Handle special cases
    Area = case_when(
      Area == "DELEWARE" ~ "DELAWARE",
      Area == "US VIRGIN ISLANDS" ~ "VIRGIN ISLANDS",
      TRUE ~ Area
    )
  )

## Rows: 57 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): State, Teritory or Tribal Nation
## dbl (1): Total (Billions)
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# 3. MANUALLY ADD 2020 ELECTION DATA (since not in files)
# Source: https://en.wikipedia.org/wiki/2020_United_States_presidential_election
election_data <- tribble(
  ~Area, ~PopVoteD, ~PopVoteR,
  "ALABAMA", 849624, 1441170,
  "ALASKA", 153778, 189951,
  "AMERICAN SAMOA", NA, NA, # No presidential election
  "ARIZONA", 1672143, 1661686,
  "ARKANSAS", 423932, 760647,
  "CALIFORNIA", 11110250, 6006429,
  "COLORADO", 1804352, 1364607,
  "CONNECTICUT", 1080837, 714717,
  "DELAWARE", 296268, 200603,
  "DISTRICT OF COLUMBIA", 317323, 18586,
  "FLORIDA", 5297045, 5668731,
  "GEORGIA", 2473633, 2461854,
  "GUAM", NA, NA, # No presidential election
  "HAWAII", 366130, 196864,
  "IDAHO", 287021, 554119,
  "ILLINOIS", 3479183, 2447869,
  "INDIANA", 1242416, 1729518,
  "IOWA", 759061, 897672,
  "KANSAS", 570323, 771406,
  "KENTUCKY", 772474, 1326646,
  "LOUISIANA", 856034, 1255776,
  "MAINE", 435072, 360737,
  "MARYLAND", 1985023, 976414,
  "MASSACHUSETTS", 2382202, 1165764,
  "MICHIGAN", 2804040, 2649852,
  "MINNESOTA", 1717079, 1484063,
  "MISSISSIPPI", 539398, 756789,
  "MISSOURI", 1253009, 1718736,
  "MONTANA", 244786, 343602,
  "NEBRASKA", 374583, 556846,
  "NEVADA", 703486, 669890,
  "NEW HAMPSHIRE", 424921, 365654,
  "NEW JERSEY", 2608330, 1883274,
  "NEW MEXICO", 501614, 401894,
  "NEW YORK", 5188692, 3239134,
  "NORTH CAROLINA", 2684292, 2758775,
  "NORTH DAKOTA", 114902, 235595,
  "NORTHERN MARIANA ISLANDS", NA, NA, # No presidential election
  "OHIO", 2679165, 3154834,
  "OKLAHOMA", 503890, 1020280,
  "OREGON", 1340383, 958448,
  "PENNSYLVANIA", 3458229, 3377674,
  "PUERTO RICO", NA, NA, # No presidential election
  "RHODE ISLAND", 307486, 199922,
  "SOUTH CAROLINA", 1091625, 1385103,
  "SOUTH DAKOTA", 150471, 261043,
  "TENNESSEE", 1143711, 1852325,
  "TEXAS", 5259126, 5890347,
  "TRIBAL COMMUNITIES", NA, NA, # No presidential data
  "VIRGIN ISLANDS", NA, NA, # No presidential election
  "UTAH", 560282, 865140,
  "VERMONT", 242820, 112704,
  "VIRGINIA", 2413568, 1962430,
  "WASHINGTON", 2369612, 1584651,
  "WEST VIRGINIA", 235984, 545382,
  "WISCONSIN", 1630866, 1610184,
  "WYOMING", 73491, 193559
) %>%
  mutate(Area = toupper(Area))

# 4. MERGE ALL DATASETS
final_data <- funding_data %>%
  left_join(pop_data, by = "Area") %>%
  left_join(election_data, by = "Area") %>%
  filter(!is.na(Population)) %>% # Remove entries without population data
  mutate(
    # Calculate key metrics
    FundingPP = Funding * 1000000000 / Population, # Funding per person
    BidenMargin = (PopVoteD - PopVoteR)/(PopVoteD + PopVoteR), # Biden's margin
    BidenWon = ifelse(BidenMargin > 0, "Biden", "Trump"),
    RegionType = case_when(
      Area %in% c("DISTRICT OF COLUMBIA", "PUERTO RICO", "GUAM", 
                 "VIRGIN ISLANDS", "AMERICAN SAMOA", "NORTHERN MARIANA ISLANDS") ~ "Territory",
      Area == "TRIBAL COMMUNITIES" ~ "Tribal",
      TRUE ~ "State"
    ),
    # Create state abbreviations
    Code = state.abb[match(str_to_title(Area), state.name)],
    Code = case_when(
      Area == "DISTRICT OF COLUMBIA" ~ "DC",
      Area == "PUERTO RICO" ~ "PR",
      Area == "GUAM" ~ "GU",
      Area == "VIRGIN ISLANDS" ~ "VI",
      Area == "AMERICAN SAMOA" ~ "AS",
      Area == "NORTHERN MARIANA ISLANDS" ~ "MP",
      TRUE ~ Code
    )
  )

Colored by 2020 Presidential Election Result

library(ggplot2)
library(scales)

ggplot(final_data %>% filter(RegionType == "State"),
       aes(x = reorder(Code, FundingPP), y = FundingPP, fill = BidenWon)) +
  geom_col() +
  scale_fill_manual(values = c("Biden" = "blue", "Trump" = "red")) +
  labs(title = "IIJA Funding Per Capita by State",
       subtitle = "Colored by 2020 Presidential Election Result",
       x = "State",
       y = "Funding per capita ($)",
       fill = "2020 Winner") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + # Rotate x-axis labels
  scale_y_continuous(labels = dollar_format())

Comparison of IIJA Funding Per Capita

plot_object <- ggplot(
  data = final_data %>% filter(RegionType == "State"), 
  aes(x = BidenWon, y = FundingPP, fill = BidenWon)
)

# Then add all components sequentially
plot_object <- plot_object +
  geom_boxplot() +
  geom_jitter(width = 0.2, size = 2, alpha = 0.6) +
  scale_fill_manual(values = c("Biden" = "#2E74C0", "Trump" = "#CB4545")) +
  labs(
    title = "Comparison of IIJA Funding Per Capita",
    subtitle = "Between States Won by Biden vs. Trump in 2020",
    x = "2020 Presidential Winner",
    y = "Funding per capita ($)",
    fill = "2020 Winner"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    legend.position = "top",
    plot.title = element_text(face = "bold")
  ) +
  stat_summary(
    fun = mean, 
    geom = "point", 
    shape = 18, 
    size = 4, 
    color = "black"
  ) +
  scale_y_continuous(
    labels = scales::dollar_format(),
    expand = expansion(mult = c(0.05, 0.1))
  )

# Finally, display the plot
print(plot_object)

FUNDING MAP

library(ggplot2)
library(maps)

## Warning: package 'maps' was built under R version 4.4.3

## 
## Attaching package: 'maps'

## The following object is masked from 'package:purrr':
## 
##     map

library(ggthemes)

## Warning: package 'ggthemes' was built under R version 4.4.3

# 1. PREPARE THE DATA
us_states <- map_data("state") %>%
  mutate(region = str_to_title(region)) %>%
  left_join(
    final_data %>% 
      mutate(Area = str_to_title(Area)),
    by = c("region" = "Area")
  )

# 2. SIMPLE FUNDING MAP
ggplot(us_states, aes(x = long, y = lat, group = group)) +
  geom_polygon(aes(fill = FundingPP), color = "white", size = 0.2) +
  coord_map() +
  scale_fill_gradientn(
    colors = c("#f0f9e8", "#bae4bc", "#7bccc4", "#43a2ca", "#0868ac"),
    na.value = "grey50",
    labels = scales::dollar_format(),
    name = "Funding per capita"
  ) +
  labs(
    title = "IIJA Funding Distribution by State",
    subtitle = "Darker blue indicates higher funding per resident",
    caption = "Source: U.S. Census Bureau and White House IIJA data"
  ) +
  theme_map(base_size = 12) +
  theme(
    legend.position = "right",
    plot.title = element_text(face = "bold", hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  )

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

IIJA Funding Per Capita by State

library(ggplot2)
library(maps)
library(ggthemes)
library(dplyr) # Make sure dplyr is loaded for %>%

# 1. PREPARE THE DATA WITH PROPER GROUPING
us_states <- map_data("state") %>%
  mutate(region = str_to_title(region)) %>%
  left_join(
    final_data %>%
      mutate(Area = str_to_title(Area)),
    by = c("region" = "Area")
  )

# 2. CREATE STATE CENTERS WITH GROUP VARIABLE
state_centers <- us_states %>%
  group_by(region) %>%
  summarize(
    long = mean(range(long)),
    lat = mean(range(lat)),
    group = first(group)  # Add the group variable needed for plotting
  )

# 3. CORRECTED MAP WITH STATE LABELS
labeled_funding_map <- ggplot() +
  # Base map layer
  geom_polygon(
    data = us_states,
    aes(x = long, y = lat, group = group, fill = FundingPP),
    color = "white", size = 0.2
  ) +
  # State labels layer
  geom_text(
    data = state_centers,
    aes(x = long, y = lat, label = abbreviate(region, minlength = 2)),
    size = 3,  # Slightly larger for better readability
    color = "black",
    fontface = "bold"
  ) +
  coord_map() +
  scale_fill_gradientn(
    colors = c("#f7fbff", "#9ecae1", "#4292c6", "#1361A9"),
    labels = scales::dollar_format(),
    name = "Funding per capita",
    na.value = "grey90"  # For any missing values
  ) +
  labs(
    title = "IIJA Funding Per Capita by State",
    subtitle = "State abbreviations show the two-letter codes",
    caption = "Source: U.S. Census Bureau and White House IIJA data"
  ) +
  theme_void() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5, size = 14),
    plot.subtitle = element_text(hjust = 0.5, size = 10),
    legend.position = "bottom",
    legend.title = element_text(size = 10)
  )

# 4. PRINT THE PLOT
print(labeled_funding_map)

# You can still save the plot if you want
# ggsave(filename = "iija_funding_map_labeled.png", plot = labeled_funding_map, width = 10, height = 7, units = "in")

Funding Allocation by Party

#  Funding Allocation by Party
final_data %>%
  group_by(BidenWon) %>%
  summarise(Total = sum(Funding)) %>%
  ggplot(aes(x = "", y = Total, fill = BidenWon)) +
  geom_col() +
  coord_polar("y") +
  scale_fill_manual(values = c("blue", "red"))

Top 10 States by Funding Per Capita

library(scales)
library(knitr)

## Warning: package 'knitr' was built under R version 4.4.2

top_funding_states <- final_data %>%
  filter(RegionType == "State") %>%
  select(State = Area, FundingPP, BidenWon, Population) %>%
  mutate(
    FundingPP = dollar(FundingPP),
    Population = comma(Population)
  ) %>%
  arrange(desc(FundingPP)) %>%
  slice(1:10)

kable(top_funding_states, caption = "Top 10 States by Funding Per Capita")

Top 10 States by Funding Per Capita
State	FundingPP	BidenWon	Population
LOUISIANA	$940.15	Trump	4,573,749
ARKANSAS	$912.73	Trump	3,067,732
KENTUCKY	$861.66	Trump	4,526,154
MAINE	$788.12	Biden	1,395,722
MISSISSIPPI	$782.40	Trump	2,939,690
DELAWARE	$767.52	Biden	1,031,890
IOWA	$748.36	Trump	3,207,004
OKLAHOMA	$715.37	Trump	4,053,824
HAWAII	$696.80	Biden	1,435,138
CONNECTICUT	$691.15	Biden	3,617,176

Funding distribution

# Overall funding distribution
summary(final_data$FundingPP)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   310.0   511.5   573.6   888.9   806.5  5045.0

# Biden vs Trump states comparison
biden_trump_summary <- final_data %>%
  filter(RegionType == "State") %>%
  group_by(BidenWon) %>%
  summarise(
    Mean_FundingPP = mean(FundingPP),
    Median_FundingPP = median(FundingPP),
    SD_FundingPP = sd(FundingPP)
  )
print(biden_trump_summary)

## # A tibble: 2 × 4
##   BidenWon Mean_FundingPP Median_FundingPP SD_FundingPP
##   <chr>             <dbl>            <dbl>        <dbl>
## 1 Biden              634.             536.         232.
## 2 Trump             1138.             657.        1179.

Regression model

# Correlation tests
cor.test(final_data$FundingPP, final_data$BidenMargin)

## 
##  Pearson's product-moment correlation
## 
## data:  final_data$FundingPP and final_data$BidenMargin
## t = -1.5052, df = 49, p-value = 0.1387
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.45920049  0.06938353
## sample estimates:
##        cor 
## -0.2102197

# Regression model
lm(FundingPP ~ BidenMargin + log(Population), data = final_data) %>%
  summary()

## 
## Call:
## lm(formula = FundingPP ~ BidenMargin + log(Population), data = final_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -839.7 -365.2 -137.0  132.0 3242.4 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      8509.04    1439.97   5.909 3.45e-07 ***
## BidenMargin      -565.88     400.21  -1.414    0.164    
## log(Population)  -500.99      94.54  -5.299 2.89e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 694.2 on 48 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.397,  Adjusted R-squared:  0.3719 
## F-statistic:  15.8 on 2 and 48 DF,  p-value: 5.345e-06

Correlation Between Funding and Political Lean

ggplot(final_data, aes(x = BidenMargin, y = FundingPP)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Correlation Between Funding and Political Lean",
       x = "Biden's Margin of Victory", 
       y = "Funding Per Capita")

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Conclusions part 1

Funding Allocation Patterns: The analysis reveals funding follows a clear population-size bias, not political alignment. Smaller states consistently receive more funding per capita regardless of voting patterns. Seven of the top 10 funded states voted Trump, but this reflects their rural nature rather than partisan favoritism.
Statistical Evidence of Neutrality: The weak correlation r = -0.21 and non-significant p-value 0.14 confirm no meaningful relationship between funding and 2020 election results. The flat regression line visually demonstrates equal distribution across Biden and Trump states.

Conclusions part 2

Structural Equity Concerns: The allocation formula creates stark disparities, with rural residents receiving substantially more infrastructure investment per person. States like Wyoming get 3x more per capita than Texas, highlighting systemic imbalances in the distribution approach.
Policy Implications: Future infrastructure bills should balance per-capita allocations with objective need indicators. The current formula disadvantages populous urban areas regardless of their actual infrastructure conditions or economic importance.