Story 1 Assignment

This story is based on data on the present allocation of the Infrastructure Investment and Jobs Act (IIJA) funding by State and Territory. The goal is to use data visualizations to address the following questions:

Is the allocation equitable based on the population of each of the States and Territories, or is bias apparent? Does the allocation favor the political interests of the Biden administration? In addition to the provided data set for IIJA funding, this assignment also requires data on the current population estimates for US states and territories and 2020 election results.

Let’s first start with loading the IIJA funding data that was provided into a data frame.

Load Libraries

## Read the IIJA data from my GitHub link
urlfile <- 'https://raw.githubusercontent.com/baruab/baruab/main/IIJA_FUNDING_03_2023.csv'
datain <- read.csv(urlfile)
fund_data <- data.frame(datain)

Let’s look into the dataframe

head(fund_data)

The first column is the name of the state and the second column is the allocated IIJA amount of funding in billions.

Now let’s load the population dataset, which is gathered from data.census.gov website and uploaded in my Git account.

## Read the population data from my GitHub link
population_estimate <- read.csv("https://raw.githubusercontent.com/baruab/baruab/main/US_State_Boundaries.csv")
Select relevant columns from the state geo dataset
subset_population_data <- subset(population_estimate, select= c('NAME', 'STATE_ABBR', 'POP'))

head(subset_population_data)
Plot State Funding and Population
fund_data |>
  ggplot(aes(y = State_Teritory_Tribal_Nation, x = Total_Billions)) +
  geom_bar(stat = "identity") +
  labs(title = "IIJA Funding", 
       x = "Total Funding (Billions)", 
       y = "State, Territory, or Tribal Nation")

subset_population_data |>
  ggplot(aes(y = NAME, x = POP)) +
  geom_bar(stat = "identity") +
  labs(title = "State Population Estimates", 
       x = "Population Estimate", 
       y = "State, Territory, or Tribal Nation") +
  scale_x_continuous(labels = scales::comma)

Rename Column to match the column name to join with the Funding dataset
population_data  <- subset_population_data %>% 
  rename(
    State_Teritory_Tribal_Nation = NAME
    )

population_data %>% mutate_if(is.character, str_to_upper) -> population_data
head(population_data)
Calculate Funding By Population

Let’s join the two datasets by the State name to look into this by plotting the distribution per population for each state.

joined_fund_data <- fund_data |>
    left_join(population_data, by = "State_Teritory_Tribal_Nation") |>
    mutate("per_capita_funding" = (Total_Billions * 1000000000) / POP)
  
head(joined_fund_data)

Plot a bar graph to visualize the differences in the funding by population size

stats <- summary(joined_fund_data$per_capita_funding)

joined_fund_data |>
  ggplot(aes(y = State_Teritory_Tribal_Nation, x = per_capita_funding)) +
  geom_bar(stat = "identity") +
  labs(title = "Funding By Population Size", 
       x = "Funding By Population (Dollars)", 
       y = "State, Territory, or Tribal Nation") +
  geom_vline(xintercept = stats["Median"], color = "red", linetype = "dashed") +
  geom_text(aes(x = stats["Median"], y = -1, label = paste("Median =", round(stats["Median"], 2))),
            vjust = -0.5, hjust = 0.5, size = 4, color = "red")
## Warning: Removed 6 rows containing missing values (position_stack).

We can see from the visualization Alaska has a per capita funding of more than five thousand dollars. For the other states like Wyoming, Montana, Vermont, North Dakota, District of Colombia seem to have higher distributions for the number of people living in those states. On the other hand some states like Puerto Rico and Florida, have quite less per capita funding. This would seem to indicate that the distribution of funding according to the IIJA is not proportional to the population size of the state.

Get the Election 2020 data

## Read the election 2020 data from my GitHub link
election_data <- read.csv("https://raw.githubusercontent.com/baruab/baruab/main/Election_2020_Results.csv")

election_result_BY_STATE  <- election_data %>% 
  rename(
    STATE_ABBR = STATE
    )

head(election_result_BY_STATE)

Join with 2020 Election result data with the Funding data

joined_fund_data <- joined_fund_data |>
    left_join(election_result_BY_STATE, by = "STATE_ABBR")
    
head(joined_fund_data)
joined_fund_data |>
  group_by(WON) |>
  summarize(total = sum(Total_Billions)) |>
  mutate(perc_funding = total / sum(total)) |>
  ggplot(aes(x = WON, y = perc_funding * 100, fill = WON)) +
  geom_bar(stat = "identity") +
  labs(title = "Funding Per Party", 
       x = "Political Party", 
       y = "Percentage of Total Allocations") +
  scale_fill_manual(values = c("blue", "red", "grey")) +
  geom_text(aes(label = paste0(round(perc_funding*100, 0),'%'), vjust = -0.2))

There seems to be a bias towards Democratic states. Let’s take a closer look at funding per capita for each of these states based on their political affiliations.

# just states
joined_fund_data <- drop_na(joined_fund_data)

joined_fund_data |>
  ggplot(aes(y = reorder(State_Teritory_Tribal_Nation, per_capita_funding), x = per_capita_funding, fill = WON)) +
  geom_bar(stat = "identity") +
  labs(title = "Per Capita Funding", 
       x = "Funding (Dollars)", 
       y = "State",
       fill = "Elected Candidate 2020") +
  scale_fill_manual(values = c("blue", "red")) 

Based on the plot, majority of states with more funding seem to be have voted for Trump in the 2020 election. This would mean to indicate that Biden is not favoring his own political party in these allocations, unless we are missing more data for analysis.

Conclusion

Based on the above graphs, it does not seem to be a bias based on the political affiliation,but there is a large variance with regards to the per capita allocations for some states. Some states like California and Texas, received the largest amount of funding due to their large population size, but their per capita allocation is less compared to Alaska and Wyoming.

However the top four states with the greatest allocation seems to be influenced by President Biden’s political interests. More exploration would need to be done to determine other factors which may play a role in the allocation of these funds.