DATA 608: Story 1

Problem Statement

The accompanying Excel file provides data on the current distribution of Infrastructure Investment and Jobs Act (IIJA) funding across U.S. States and Territories.

This analysis aims to explore the following questions through data visualizations:

Is the funding distribution equitable when adjusted for the population of each State and Territory, or does it reveal potential bias?
Does the allocation appear to align with or favor the political interests of the Biden administration?

Note:

Rationale for Excluding U.S. Territories from This Analysis

U.S. territories such as Puerto Rico, Guam, and American Samoa are excluded from this analysis due to their unique political status. These territories lack voting representation in Congress and do not participate in presidential elections, limiting their influence on federal policy and funding decisions.

This exclusion is grounded in the Insular Cases—early 20th-century Supreme Court rulings that determined the U.S. Constitution does not fully apply to territories. As legal scholar Christina Duffy Burnett notes:

“The Insular Cases installed a doctrine of territorial deannexation… shaping life for more than four million U.S. citizens in Puerto Rico and the other remaining nonstate possessions.”

As a result, territories are often excluded from federal funding formulas, making them unsuitable for inclusion in an analysis focused on population-based or politically influenced funding equity.

Data Collection and Preparation

Installed libraries:

suppressPackageStartupMessages({
  library(dplyr)
  library(readr)
  library(readxl)
  library(tidyverse)
  library(ggplot2)
  library(scales)
  library(tidyr)
})

Load the IIJA funding data into a dataframe and display its contents:

iija_data <- read_excel("IIJA FUNDING AS OF MARCH 2023.xlsx")
print(iija_data)

## # A tibble: 57 × 2
##    `State, Teritory or Tribal Nation` `Total (Billions)`
##    <chr>                                           <dbl>
##  1 ALABAMA                                        3     
##  2 ALASKA                                         3.7   
##  3 AMERICAN SAMOA                                 0.0686
##  4 ARIZONA                                        3.5   
##  5 ARKANSAS                                       2.8   
##  6 CALIFORNIA                                    18.4   
##  7 COLORADO                                       3.2   
##  8 CONNECTICUT                                    2.5   
##  9 DELEWARE                                       0.792 
## 10 DISTRICT OF COLUMBIA                           1.1   
## # ℹ 47 more rows

Clean the IIJA funding data by renaming columns and standardizing state names:

# Clean IIJA funding data 
iija_data <- read_excel("IIJA FUNDING AS OF MARCH 2023.xlsx")
colnames(iija_data) <- c("State", "Total")
iija_data$State <- trimws(iija_data$State)
iija_data$State <- tools::toTitleCase(tolower(iija_data$State))

print(iija_data)

## # A tibble: 57 × 2
##    State                  Total
##    <chr>                  <dbl>
##  1 Alabama               3     
##  2 Alaska                3.7   
##  3 American Samoa        0.0686
##  4 Arizona               3.5   
##  5 Arkansas              2.8   
##  6 California           18.4   
##  7 Colorado              3.2   
##  8 Connecticut           2.5   
##  9 Deleware              0.792 
## 10 District of Columbia  1.1   
## # ℹ 47 more rows

Load and clean the 2023 population data from the U.S. Census Bureau. This includes filtering for state-level entries, selecting relevant columns, renaming them, and formatting state names:

# Load and clean population data
population_data <- read_csv("population-2023.csv", show_col_types = FALSE)

## New names:
## • `` -> `...459`

print(population_data)

## # A tibble: 54 × 459
##    GEO_ID      NAME  S0101_C01_001E S0101_C01_001M S0101_C01_002E S0101_C01_002M
##    <chr>       <chr> <chr>          <chr>          <chr>          <chr>         
##  1 Geography   Geog… Estimate!!Tot… Margin of Err… Estimate!!Tot… Margin of Err…
##  2 0100000US   Unit… 334914896      *****          18333697       19410         
##  3 0400000US01 Alab… 5108468        *****          288019         3288          
##  4 0400000US02 Alas… 733406         *****          45211          1366          
##  5 0400000US04 Ariz… 7431344        *****          391142         1573          
##  6 0400000US05 Arka… 3067732        *****          176908         2503          
##  7 0400000US06 Cali… 38965193       *****          2086820        3832          
##  8 0400000US08 Colo… 5877610        *****          303775         2170          
##  9 0400000US09 Conn… 3617176        *****          180561         829           
## 10 0400000US10 Dela… 1031890        *****          54398          312           
## # ℹ 44 more rows
## # ℹ 453 more variables: S0101_C01_003E <chr>, S0101_C01_003M <chr>,
## #   S0101_C01_004E <chr>, S0101_C01_004M <chr>, S0101_C01_005E <chr>,
## #   S0101_C01_005M <chr>, S0101_C01_006E <chr>, S0101_C01_006M <chr>,
## #   S0101_C01_007E <chr>, S0101_C01_007M <chr>, S0101_C01_008E <chr>,
## #   S0101_C01_008M <chr>, S0101_C01_009E <chr>, S0101_C01_009M <chr>,
## #   S0101_C01_010E <chr>, S0101_C01_010M <chr>, S0101_C01_011E <chr>, …

clean_population <- population_data %>%
  filter(grepl("^0400000US", GEO_ID)) %>%
  select(NAME, S0101_C01_001E) %>%
  rename(State = NAME, Population_2023 = S0101_C01_001E) %>%
  mutate(State = trimws(State),
         State = tools::toTitleCase(tolower(State)))

print(clean_population)

## # A tibble: 52 × 2
##    State                Population_2023
##    <chr>                <chr>          
##  1 Alabama              5108468        
##  2 Alaska               733406         
##  3 Arizona              7431344        
##  4 Arkansas             3067732        
##  5 California           38965193       
##  6 Colorado             5877610        
##  7 Connecticut          3617176        
##  8 Delaware             1031890        
##  9 District of Columbia 678972         
## 10 Florida              22610726       
## # ℹ 42 more rows

Merge the cleaned IIJA funding and population datasets, then compute per capita funding for each state:

# Merge and calculate per capita funding
merged_data <- left_join(iija_data, clean_population, by = "State") %>%
  mutate(Funding_Per_Capita = (Total * 1e9) / as.numeric(Population_2023)) %>%
  filter(!is.na(Funding_Per_Capita))  # Remove rows with NA values

print(merged_data)

## # A tibble: 51 × 4
##    State                Total Population_2023 Funding_Per_Capita
##    <chr>                <dbl> <chr>                        <dbl>
##  1 Alabama                3   5108468                       587.
##  2 Alaska                 3.7 733406                       5045.
##  3 Arizona                3.5 7431344                       471.
##  4 Arkansas               2.8 3067732                       913.
##  5 California            18.4 38965193                      472.
##  6 Colorado               3.2 5877610                       544.
##  7 Connecticut            2.5 3617176                       691.
##  8 District of Columbia   1.1 678972                       1620.
##  9 Florida                8.2 22610726                      363.
## 10 Georgia                5   11029227                      453.
## # ℹ 41 more rows

Incorporate political alignment data from the MIT Election Data and Science Lab by identifying the winning party in each state during the 2020 U.S. presidential election, then merge it with the existing dataset

election_data <- read_csv("1976-2020-president.csv", show_col_types = FALSE)
print(election_data)

## # A tibble: 4,287 × 15
##     year state   state_po state_fips state_cen state_ic office       candidate  
##    <dbl> <chr>   <chr>         <dbl>     <dbl>    <dbl> <chr>        <chr>      
##  1  1976 ALABAMA AL                1        63       41 US PRESIDENT "CARTER, J…
##  2  1976 ALABAMA AL                1        63       41 US PRESIDENT "FORD, GER…
##  3  1976 ALABAMA AL                1        63       41 US PRESIDENT "MADDOX, L…
##  4  1976 ALABAMA AL                1        63       41 US PRESIDENT "BUBAR, BE…
##  5  1976 ALABAMA AL                1        63       41 US PRESIDENT "HALL, GUS"
##  6  1976 ALABAMA AL                1        63       41 US PRESIDENT "MACBRIDE,…
##  7  1976 ALABAMA AL                1        63       41 US PRESIDENT  <NA>      
##  8  1976 ALASKA  AK                2        94       81 US PRESIDENT "FORD, GER…
##  9  1976 ALASKA  AK                2        94       81 US PRESIDENT "CARTER, J…
## 10  1976 ALASKA  AK                2        94       81 US PRESIDENT "MACBRIDE,…
## # ℹ 4,277 more rows
## # ℹ 7 more variables: party_detailed <chr>, writein <lgl>,
## #   candidatevotes <dbl>, totalvotes <dbl>, version <dbl>, notes <lgl>,
## #   party_simplified <chr>

# Filter for 2020 presidential election
election_2020 <- election_data %>%
  filter(year == 2020, office == "US PRESIDENT")
print(election_2020)

## # A tibble: 547 × 15
##     year state   state_po state_fips state_cen state_ic office       candidate  
##    <dbl> <chr>   <chr>         <dbl>     <dbl>    <dbl> <chr>        <chr>      
##  1  2020 ALABAMA AL                1        63       41 US PRESIDENT "BIDEN, JO…
##  2  2020 ALABAMA AL                1        63       41 US PRESIDENT "TRUMP, DO…
##  3  2020 ALABAMA AL                1        63       41 US PRESIDENT "JORGENSEN…
##  4  2020 ALABAMA AL                1        63       41 US PRESIDENT  <NA>      
##  5  2020 ALASKA  AK                2        94       81 US PRESIDENT "BIDEN, JO…
##  6  2020 ALASKA  AK                2        94       81 US PRESIDENT "TRUMP, DO…
##  7  2020 ALASKA  AK                2        94       81 US PRESIDENT "JORGENSEN…
##  8  2020 ALASKA  AK                2        94       81 US PRESIDENT "DE LA FUE…
##  9  2020 ALASKA  AK                2        94       81 US PRESIDENT "PIERCE, B…
## 10  2020 ALASKA  AK                2        94       81 US PRESIDENT "JANOS, JA…
## # ℹ 537 more rows
## # ℹ 7 more variables: party_detailed <chr>, writein <lgl>,
## #   candidatevotes <dbl>, totalvotes <dbl>, version <dbl>, notes <lgl>,
## #   party_simplified <chr>

# Determine the winning party per state
state_winners <- election_2020 %>%
  group_by(state) %>%
  slice_max(order_by = candidatevotes, n = 1, with_ties = FALSE) %>%
  mutate(Political_Alignment = case_when(
    party_simplified == "DEMOCRAT" ~ "Democratic",
    party_simplified == "REPUBLICAN" ~ "Republican",
    TRUE ~ "Other"
  )) %>%
  select(State = state, Political_Alignment)
print(state_winners)

## # A tibble: 51 × 2
## # Groups:   State [51]
##    State                Political_Alignment
##    <chr>                <chr>              
##  1 ALABAMA              Republican         
##  2 ALASKA               Republican         
##  3 ARIZONA              Democratic         
##  4 ARKANSAS             Republican         
##  5 CALIFORNIA           Democratic         
##  6 COLORADO             Democratic         
##  7 CONNECTICUT          Democratic         
##  8 DELAWARE             Democratic         
##  9 DISTRICT OF COLUMBIA Democratic         
## 10 FLORIDA              Republican         
## # ℹ 41 more rows

state_winners$State <- tools::toTitleCase(tolower(trimws(state_winners$State)))
print(state_winners)

## # A tibble: 51 × 2
## # Groups:   State [51]
##    State                Political_Alignment
##    <chr>                <chr>              
##  1 Alabama              Republican         
##  2 Alaska               Republican         
##  3 Arizona              Democratic         
##  4 Arkansas             Republican         
##  5 California           Democratic         
##  6 Colorado             Democratic         
##  7 Connecticut          Democratic         
##  8 Delaware             Democratic         
##  9 District of Columbia Democratic         
## 10 Florida              Republican         
## # ℹ 41 more rows

# Merge with existing data
merged_data <- left_join(merged_data, state_winners, by = "State")
print(merged_data)

## # A tibble: 51 × 5
##    State            Total Population_2023 Funding_Per_Capita Political_Alignment
##    <chr>            <dbl> <chr>                        <dbl> <chr>              
##  1 Alabama            3   5108468                       587. Republican         
##  2 Alaska             3.7 733406                       5045. Republican         
##  3 Arizona            3.5 7431344                       471. Democratic         
##  4 Arkansas           2.8 3067732                       913. Republican         
##  5 California        18.4 38965193                      472. Democratic         
##  6 Colorado           3.2 5877610                       544. Democratic         
##  7 Connecticut        2.5 3617176                       691. Democratic         
##  8 District of Col…   1.1 678972                       1620. Democratic         
##  9 Florida            8.2 22610726                      363. Republican         
## 10 Georgia            5   11029227                      453. Democratic         
## # ℹ 41 more rows

merged_data <- merged_data %>%
  filter(!is.na(Political_Alignment))

print(merged_data)

## # A tibble: 50 × 5
##    State            Total Population_2023 Funding_Per_Capita Political_Alignment
##    <chr>            <dbl> <chr>                        <dbl> <chr>              
##  1 Alabama            3   5108468                       587. Republican         
##  2 Alaska             3.7 733406                       5045. Republican         
##  3 Arizona            3.5 7431344                       471. Democratic         
##  4 Arkansas           2.8 3067732                       913. Republican         
##  5 California        18.4 38965193                      472. Democratic         
##  6 Colorado           3.2 5877610                       544. Democratic         
##  7 Connecticut        2.5 3617176                       691. Democratic         
##  8 District of Col…   1.1 678972                       1620. Democratic         
##  9 Florida            8.2 22610726                      363. Republican         
## 10 Georgia            5   11029227                      453. Democratic         
## # ℹ 40 more rows

Data Analysis

Question 1: Is the funding distribution equitable when adjusted for the population of each State and Territory, or does it reveal potential bias?

To answer this question, I would like to do the following:

Calculate how much funding each state should receive based on its population.
Compare it to what it actually receives.
Highlight discrepancies that may suggest bias or inequity in allocation.

Used formula:

merged_data$Population_2023 / total_population — calculates the proportion of the total population that each state represents.
Multiplying that by total_funding gives the expected funding if funds were distributed strictly based on population share.

# data is in numeric format for calculations
merged_data$Population_2023 <- as.numeric(merged_data$Population_2023)

# Calculate total population and total funding
merged_data$Total_Funding <- merged_data$Funding_Per_Capita * merged_data$Population_2023
total_population <- sum(merged_data$Population_2023)
total_funding <- sum(merged_data$Total_Funding)

# Calculate expected funding based on population share
merged_data$Expected_Funding <- (merged_data$Population_2023 / total_population) * total_funding

# Calculate discrepancy
merged_data$Funding_Discrepancy <- merged_data$Total_Funding - merged_data$Expected_Funding

# Sort by discrepancy
merged_data <- merged_data[order(-merged_data$Funding_Discrepancy), ]

# View top overfunded and underfunded states
head(merged_data[, c("State", "Funding_Discrepancy")])

## # A tibble: 6 × 2
##   State        Funding_Discrepancy
##   <chr>                      <dbl>
## 1 Alaska               3280880960.
## 2 Montana              2652632678.
## 3 Wyoming              1966229334.
## 4 Louisiana            1686242692.
## 5 New Mexico           1391701807.
## 6 North Dakota         1352010329.

tail(merged_data[, c("State", "Funding_Discrepancy")])

## # A tibble: 6 × 2
##   State          Funding_Discrepancy
##   <chr>                        <dbl>
## 1 New York              -1084349827.
## 2 Georgia               -1302865039.
## 3 North Carolina        -1692150855.
## 4 Texas                 -3231701192.
## 5 California            -3867413001.
## 6 Florida               -4721336591.

# Format Funding_Discrepancy with commas
merged_data$Funding_Discrepancy_Formatted <- comma(merged_data$Funding_Discrepancy)
head(merged_data[, c("State", "Funding_Discrepancy_Formatted")])

## # A tibble: 6 × 2
##   State        Funding_Discrepancy_Formatted
##   <chr>        <chr>                        
## 1 Alaska       3,280,880,960                
## 2 Montana      2,652,632,678                
## 3 Wyoming      1,966,229,334                
## 4 Louisiana    1,686,242,692                
## 5 New Mexico   1,391,701,807                
## 6 North Dakota 1,352,010,329

tail(merged_data[, c("State", "Funding_Discrepancy_Formatted")])

## # A tibble: 6 × 2
##   State          Funding_Discrepancy_Formatted
##   <chr>          <chr>                        
## 1 New York       -1,084,349,827               
## 2 Georgia        -1,302,865,039               
## 3 North Carolina -1,692,150,855               
## 4 Texas          -3,231,701,192               
## 5 California     -3,867,413,001               
## 6 Florida        -4,721,336,591

# Split into overfunded and underfunded
overfunded <- merged_data[merged_data$Funding_Discrepancy > 0, ]
underfunded <- merged_data[merged_data$Funding_Discrepancy < 0, ]

# Define custom colors
party_colors <- c("Republican" = "red", "Democratic" = "blue")

# Plot overfunded states
ggplot(overfunded, aes(x = reorder(State, Funding_Discrepancy), y = Funding_Discrepancy, fill = Political_Alignment)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = party_colors) +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  labs(title = "Overfunded States by Political Alignment",
       x = "State",
       y = "Excess Funding (USD)") +
  theme_minimal()

# Plot underfunded states
ggplot(underfunded, aes(x = reorder(State, Funding_Discrepancy), y = Funding_Discrepancy, fill = Political_Alignment)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = party_colors) +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  labs(title = "Underfunded States by Political Alignment",
       x = "State",
       y = "Funding Shortfall (USD)") +
  theme_minimal()

Findings

Overfunded States:

The most overfunded states—Alaska, Montana, and Wyoming—are Republican-leaning and have small populations.
These states receive significantly more funding per capita than expected based on their population share.

Underfunded States:

The most underfunded states include California, Florida, Texas, New York, and North Carolina—all of which have large populations.
A majority of these underfunded states are Democratic-leaning, as shown by the prevalence of blue bars.
The funding shortfalls reach up to $4 billion, indicating substantial under-allocation.

Is the Distribution Equitable?

No, the funding distribution is not equitable. When adjusted for population, the data reveals a systematic bias:

Smaller, Republican-leaning states are consistently overfunded.
Larger, Democratic-leaning states are consistently underfunded.

Possible Sources of Bias:

Fixed-cost federal programs: These may disproportionately benefit smaller states regardless of population.
Political representation: Smaller states have equal Senate representation, which may influence funding formulas.
Historical funding structures: Legacy systems may not reflect current population dynamics.
Strategic allocation: Funding may be influenced by political priorities or efforts to maintain support in swing or rural states.

Question 2: Does the allocation appear to align with or favor the political interests of the Biden administration?

To answer this question, I am going to do the following:

Compute the total overfunding and underfunding by political alignment.
Visualize the results using a bar chart.

# Create helper columns
merged_data <- merged_data %>%
  mutate(
    Overfunded = ifelse(Funding_Discrepancy > 0, 1, 0),
    Underfunded = ifelse(Funding_Discrepancy < 0, 1, 0)
  )
print(merged_data)

## # A tibble: 50 × 11
##    State         Total Population_2023 Funding_Per_Capita Political_Alignment
##    <chr>         <dbl>           <dbl>              <dbl> <chr>              
##  1 Alaska          3.7          733406              5045. Republican         
##  2 Montana         3.3         1132812              2913. Republican         
##  3 Wyoming         2.3          584057              3938. Republican         
##  4 Louisiana       4.3         4573749               940. Republican         
##  5 New Mexico      2.6         2114371              1230. Democratic         
##  6 North Dakota    1.8          783926              2296. Republican         
##  7 Kentucky        3.9         4526154               862. Republican         
##  8 Illinois        8.4        12549689               669. Democratic         
##  9 Arkansas        2.8         3067732               913. Republican         
## 10 West Virginia   2           1770071              1130. Republican         
## # ℹ 40 more rows
## # ℹ 6 more variables: Total_Funding <dbl>, Expected_Funding <dbl>,
## #   Funding_Discrepancy <dbl>, Funding_Discrepancy_Formatted <chr>,
## #   Overfunded <dbl>, Underfunded <dbl>

# Summarize by political alignment
funding_summary <- merged_data %>%
  group_by(Political_Alignment) %>%
  summarise(
    Overfunded = sum(Overfunded),
    Underfunded = sum(Underfunded)
  )
print(funding_summary)

## # A tibble: 2 × 3
##   Political_Alignment Overfunded Underfunded
##   <chr>                    <dbl>       <dbl>
## 1 Democratic                   9          16
## 2 Republican                  16           9

# Reshape for stacked bar chart
funding_long <- funding_summary %>%
  pivot_longer(cols = c(Overfunded, Underfunded),
               names_to = "Funding_Status",
               values_to = "State_Count")
print(funding_long)

## # A tibble: 4 × 3
##   Political_Alignment Funding_Status State_Count
##   <chr>               <chr>                <dbl>
## 1 Democratic          Overfunded               9
## 2 Democratic          Underfunded             16
## 3 Republican          Overfunded              16
## 4 Republican          Underfunded              9

# Plot stacked bar chart
ggplot(funding_long, aes(x = Political_Alignment, y = State_Count, fill = Funding_Status)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = c("Overfunded" = "darkgreen", "Underfunded" = "darkred")) +
  labs(title = "Stacked Bar Chart: Overfunded vs Underfunded States by Political Alignment",
       x = "Political Alignment",
       y = "Number of States",
       fill = "Funding Status") +
  theme_minimal()

Observations

Democratic-aligned states, which represent the political base of the Biden administration, are more likely to be underfunded (16 states) than overfunded (9 states).
Republican-aligned states are more likely to be overfunded (16 states) than underfunded (9 states).

This pattern suggests that the funding allocation favors Republican states, not Democratic ones.

If the Biden administration were favoring its political allies, we would expect Democratic states to be more frequently overfunded. Since the opposite is true, the data suggests that the allocation does not reflect partisan favoritism toward the administration’s political interests — and may even reflect a bias in the opposite direction.

Final Answer to Question 2

Does the allocation appear to align with or favor the political interests of the Biden administration?

No, the allocation does not appear to align with or favor the political interests of the Biden administration.