Mini Assignment 5

Import Libraries

library(tidycensus)
library(sf)
library(tmap)
library(tidyverse)
library(here)
library(knitr)
library(kableExtra)
library(glue)
library(tigris)
library(skimr)
library(broom)
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
data <- read.csv("Fatal.csv")

Racial Disparities in Fatal Police Encounters Across the United States

States where police–civilian interactions result in significantly higher or lower fatality rates for specific racial groups relative to population proportions

Check Data First

skim(data)
Data summary
Name data
Number of rows 31498
Number of columns 35
_______________________
Column type frequency:
character 27
logical 1
numeric 7
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Name 0 1.00 4 82 0 29859 0
Age 0 1.00 0 5 1221 112 0
Gender 0 1.00 0 11 144 4 0
Race 0 1.00 0 57 1 12 0
Race.with.imputations 862 0.97 0 23 6 10 0
Imputation.probability 881 0.97 0 19 3 6614 0
URL.of.image..PLS.NO.HOTLINKS. 0 1.00 0 373 16773 14668 0
Date.of.injury.resulting.in.death..month.day.year. 0 1.00 10 10 0 7736 0
Location.of.injury..address. 0 1.00 0 74 556 28893 0
Location.of.death..city. 0 1.00 0 30 36 6340 0
State 0 1.00 0 2 1 52 0
Location.of.death..county. 0 1.00 0 33 15 1536 0
Full.Address 0 1.00 0 103 1 29709 0
Latitude 0 1.00 0 17 1 29515 0
Agency.or.agencies.involved 0 1.00 0 266 78 6829 0
Highest.level.of.force 0 1.00 0 33 4 19 0
Name.Temporary 0 1.00 0 58 25969 5284 0
Armed.Unarmed 0 1.00 0 19 14419 10 0
Alleged.weapon 0 1.00 0 35 14421 269 0
Aggressive.physical.movement 0 1.00 0 42 14418 32 0
Fleeing.Not.fleeing 0 1.00 0 42 14419 26 0
Description.Temp 0 1.00 0 2239 27431 3870 0
URL.Temp 0 1.00 0 723 28281 3066 0
Brief.description 0 1.00 0 2239 2 29883 0
Dispositions.Exclusions.INTERNAL.USE..NOT.FOR.ANALYSIS 0 1.00 0 89 3 156 0
Intended.use.of.force..Developing. 0 1.00 0 22 3 9 0
Supporting.document.link 0 1.00 0 438 2 29269 0

Variable type: logical

skim_variable n_missing complete_rate mean count
X 31498 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Unique.ID 1 1.00 15749.00 9092.55 1.00 7875 15749.00 23623.00 31497.00 ▇▇▇▇▇
Location.of.death..zip.code. 182 0.99 58352.53 27966.03 1013.00 33147 60649.00 85033.00 99921.00 ▃▇▃▆▇
Longitude 1 1.00 -95.40 16.30 -165.59 -111 -90.56 -82.57 -67.27 ▁▁▅▇▇
UID.Temporary 25969 0.18 15464.08 6559.72 9759.00 11156 12549.00 19240.00 30340.00 ▇▁▁▁▂
X.1 31497 0.00 10895.00 NA 10895.00 10895 10895.00 10895.00 10895.00 ▁▁▇▁▁
Unique.ID.formula 31496 0.00 29497.00 2828.43 27497.00 28497 29497.00 30497.00 31497.00 ▇▁▁▁▇
Unique.identifier..redundant. 1 1.00 15749.00 9092.55 1.00 7875 15749.00 23623.00 31497.00 ▇▇▇▇▇

Research Question1. State-Level Racial Disparities in Fatal Police Encounters

Comparison of Black fatality proportions against the national average across U.S. states

The first analysis investigates whether certain U.S. states exhibit statistically significant racial disparities in fatal police encounters. Specifically, I examine whether the proportion of Black fatalities in police interactions differs from the national average. This question arises from the concern that, even when controlling for exposure to police interactions, some states may show disproportionately high fatality rates among Black individuals.

Clean Data

Check unique values of “Race” column.

unique(data$Race)
##  [1] "African-American/Black"                                   
##  [2] "Race unspecified"                                         
##  [3] "European-American/White"                                  
##  [4] "Hispanic/Latino"                                          
##  [5] "Christopher Anthony Alexander"                            
##  [6] "Asian/Pacific Islander"                                   
##  [7] "Native American/Alaskan"                                  
##  [8] "European-American/European-American/White"                
##  [9] "Middle Eastern"                                           
## [10] "African-American/Black African-American/Black Not imputed"
## [11] "european-American/White"                                  
## [12] ""

There were a few inconsistencies and typographical errors in the Race variable (e.g., “European-American/European-American/White”). I corrected them as shown below.

data_cleaned <- data %>%
  mutate(Race = case_when(
    Race == "European-American/European-American/White" ~ "European-American/White",
    Race == "African-American/Black African-American/Black Not imputed" ~ "African-American/Black",
    Race == "european-American/White" ~ "European-American/White",
    TRUE ~ Race
  ))

Check unique values of “State” column.

unique(data$State)
##  [1] "SC" "MS" "GA" "CA" "VA" "FL" "MD" "NJ" "MI" "PA" "LA" "CO" "WA" "TX" "AZ"
## [16] "KS" "NY" "OH" "MN" "MO" "NC" "IL" "UT" "TN" "IN" "NE" "NM" "AR" "KY" "WI"
## [31] "IA" "OR" "WY" "OK" "NV" "MT" "AL" "MA" "RI" "WV" "ID" "SD" "ME" "DC" "AK"
## [46] "NH" "ND" "HI" "VT" "DE" "CT" ""

There are 50 states plus Washington, D.C. (DC), along with some empty entries. For this analysis, the missing state values are not a major concern.

Compute the proportion of deaths by race across states

by_state <- data_cleaned %>%
  filter(!is.na(State), !is.na(Race), State !="", Race !="Race unspecified") %>%
  group_by(State) %>%
  summarize(
    n_state = n(),
    black_state = sum(Race == "African-American/Black"),
    white_state = sum(Race == "European-American/White"),
    hispanic_state = sum(Race == "Hispanic/Latino"),
    asian_state = sum(Race == "Asian/Pacific Islander"),
    NativeA_state = sum(Race == "Native American/Alaskan"),
    ME_state = sum(Race == "Middle Eastern"),
    prop_B = black_state / n_state,
    prop_W = white_state / n_state,
    prop_H = hispanic_state / n_state,
    prop_A = asian_state / n_state,
    prop_N = NativeA_state / n_state,
    prop_M = ME_state / n_state,
    .groups = "drop"
  )

Compute mean of proportion of death by race for whole states

race_means <- by_state %>%
  summarise(across(starts_with("prop_"), ~ mean(.x, na.rm = TRUE)))

Binomial Test: Comparing State Fatality Proportions to National Averages

For each state, we test whether the fatality proportion for a specific race (Black or White) differs significantly from the national average proportion. A binomial test is used to assess whether each state’s observed share of deaths for that race is statistically higher or lower than expected based on the national mean.

races <- c("B", "W")
for (r in races) {
  prop_col <- paste0("prop_", r)
  count_col <- switch(r,
                      B = "black_state",
                      W = "white_state")
  # p-value
  by_state[[paste0("pval_", r)]] <- mapply(function(x, n)
    binom.test(x, n, p = race_means[[prop_col]])$p.value,
    x = by_state[[count_col]], n = by_state$n_state)
  
  # Difference of Proportion
  by_state[[paste0("diff_", r)]] <- by_state[[prop_col]] - race_means[[prop_col]]
}
tmap_mode("view")

states_sf <- states(cb = TRUE) %>%
  st_transform(crs = 4326) %>%
  select(STUSPS, NAME, geometry)

map_data_sf <- states_sf %>%
  left_join(by_state, by = c("STUSPS" = "State"))

map_data_sf <- map_data_sf %>%
  mutate(
    category_B = case_when(
      pval_B < 0.05 & diff_B > 0 ~ "Higher (p<0.05)",
      pval_B < 0.05 & diff_B < 0 ~ "Lower (p<0.05)",
      TRUE ~ "Not significant"
    ),
    category_W = case_when(
      pval_W < 0.05 & diff_W > 0 ~ "Higher (p<0.05)",
      pval_W < 0.05 & diff_W < 0 ~ "Lower (p<0.05)",
      TRUE ~ "Not significant"
    )
  )

Plot map

tm_shape(map_data_sf) +
  tm_polygons(
    fill = "category_B",
    fill.scale = tm_scale(
      values = c(
        "Higher (p<0.05)" = "pink",
        "Lower (p<0.05)" = "skyblue",
        "Not significant" = "orange"
      )
    ),
    fill.legend = tm_legend(title = "Black Fatalities vs National Avg"),
    col = "black",
    lwd = 1.5
  ) +
  tm_title("Black Fatalities vs National Average (binom.test, p < 0.05)") +
  tm_layout(
    legend.position = c("right", "bottom"),
  )

The map illustrates regional disparities in African-American fatality rates compared to the national average. States in the eastern and southeastern United States show significantly higher proportions of Black fatalities, suggesting potential demographic risk factors in these regions. In contrast, most Midwestern and Western states, including those along the West Coast, exhibit lower-than-average fatality rates for African-Americans. This geographic divide highlights a possible east–west gradient in racial disparities in fatal police encounters across the United States.

tm_shape(map_data_sf) +
  tm_polygons(
    fill = "category_W",
    fill.scale = tm_scale(
      values = c(
        "Higher (p<0.05)" = "pink",
        "Lower (p<0.05)" = "skyblue",
        "Not significant" = "orange"
      )
    ),
    fill.legend = tm_legend(title = "White Fatalities vs National Avg"),
    col = "black",
    lwd = 1.5
  ) +
  tm_title("White Fatalities vs National Average (binom.test, p < 0.05)") +
  tm_layout(
    legend.position = c("right", "bottom"),
  )

This map illustrates the distribution of White fatalities relative to the national average. In contrast to the pattern observed for African-American fatalities, higher-than-average White fatality rates are concentrated in the western and central parts of the United States, while lower-than-average rates are more common along the eastern part and parts of the South. This inverse spatial pattern suggests that racial disparities in fatal police encounters may vary regionally.

Research Question 2. Is the Race Population Ratio Correlated with the Observed Fatality Patterns?

It is possible that the higher rate of African-American fatalities is associated with the proportion of Black residents in a given area. If this relationship holds statistically, the elevated fatality rates in certain regions may not necessarily indicate racial bias but could instead reflect demographic composition—regions with larger Black populations may naturally experience more incidents involving Black individuals. To examine this hypothesis, county-level demographic data from the U.S. Census will be utilized.

Download Census Data & Boundaries

tidycensus::census_api_key(Sys.getenv("CENSUS_API_KEY"))

#County Level
county_vars <- suppressMessages(
  get_acs(geography = "county",
          variables = c(race_ethnic_total = "B03002_001E", #Ethnicity
                        non_hispanic_white = "B03002_003E", #Ethnicity_White
                        african_american = "B03002_004E" #Ethnicity_Black
                        ),
          year = 2023,
          survey = "acs5", 
          geometry = TRUE,
          output = "wide"))

#County Boundary
county_boundary <- counties(progress_bar = F)

#State Boundary for map
state_boundary <- states(progress_bar = F)

Compute Racial Population Proportion by State and County

# Create new variables
county_race <- county_vars %>%
  transmute(
    GEOID,
    NAME,
    geometry,
    white_prop = non_hispanic_white/race_ethnic_total,
    black_prop = african_american/race_ethnic_total
    )

Merge Fatality Records with County Boundaries

# Cleaning Data
data_clean_coords <- data_cleaned %>%
  mutate(
    Latitude  = suppressWarnings(as.numeric(Latitude)),
    Longitude = suppressWarnings(as.numeric(Longitude))
  ) %>%
  filter(!is.na(Latitude), !is.na(Longitude))

# Points
pts <- st_as_sf(data_clean_coords,
                coords = c("Longitude", "Latitude"),
                crs = 4326,
                remove = FALSE)

# Matchcoord
pts  <- st_transform(pts,  st_crs(county_boundary))

# Spatial Join
pts_county <- st_join(pts, county_boundary[, c("GEOID", "NAME", "STATEFP")], join = st_within)

Compute County-Level Death Proportion and Merge with Census Data

# Compute deaths proportion by race by county
by_county <- pts_county %>%
  filter(!is.na(GEOID), !is.na(Race), Race !="Race unspecified") %>%
  group_by(GEOID) %>%
  summarize(
    n_county = n(),
    black_county = sum(Race == "African-American/Black"),
    white_county = sum(Race == "European-American/White"),
    prop_B = black_county / n_county,
    prop_W = white_county / n_county,
    .groups = "drop"
  )

# Merge with ACS data
merged_county <- county_race %>%
  left_join(st_drop_geometry(by_county), by = "GEOID")

# Drop rows have NA data
merged_county <- merged_county %>%
  filter(!is.na(n_county))

Correlation Between Population Share and Fatality Proportion by Race

merged_county %>% 
  ggplot(aes(x = white_prop, y = prop_W)) +
  geom_point(alpha = 0.5, size = 1.2) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "White Population Proportion", 
       y = "White death proportion", 
       title = "Pop proportion vs. White Fatality") +
  ggdark::dark_theme_gray() +
  ggpubr::stat_cor(method = "pearson", label.x = 0.1, label.y = 0.1)
## Inverted geom defaults of fill and color/colour.
## To change them back, use invert_geom_defaults().
## `geom_smooth()` using formula = 'y ~ x'

The scatter plot above shows a strong positive correlation between the proportion of White population and the proportion of White fatalities across counties. This indicates that areas with a higher percentage of White residents tend to have proportionally higher White fatality rates. In other words, the prevalence of White fatalities appears to largely reflect the demographic composition.

merged_county %>% 
  ggplot(aes(x = black_prop, y = prop_B)) +
  geom_point(alpha = 0.5, size = 1.2) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "Black Population Proportion", 
       y = "Black death proportion", 
       title = "Pop proportion vs. Black Fatality") +
  ggdark::dark_theme_gray() +
  ggpubr::stat_cor(method = "pearson", label.x = 0.1, label.y = 0.1)
## `geom_smooth()` using formula = 'y ~ x'

From the results above, we can observe that counties with a higher proportion of Black residents tend to have a higher proportion of Black fatalities. Overall, this suggests that racial disparities in fatalities may primarily reflect population composition rather than systemic bias. However, the correlation coefficient (r = 0.62) also indicates that some counties deviate from this general pattern, implying that localized factors may still exist such as discrimination.

Binomial Test: Identifying Statistically Unusual Counties

To identify counties where the proportion of Black fatalities is significantly different from what would be expected given the local Black population share, a binomial test was conducted. Here, the expected probability (p0) is defined as the proportion of Black residents in each county. If the observed number of Black deaths (black_county) significantly deviates from this expected proportion, the county is classified as “Higher (p < 0.05)” or “Lower (p < 0.05)” accordingly. Non-significant results were excluded for clearer visualization.

binom_county <- merged_county %>%
  mutate(
    p0 = pmin(pmax(black_prop, 1e-6), 1 - 1e-6), # Stabilize model (exclude 0 or 1)
    pval = mapply(function(x, n, p) binom.test(x, n, p = p)$p.value,
                  black_county, n_county, p0),
    p_adj = p.adjust(pval, method = "BH"),
    diff  = prop_B - p0,
    sig   = case_when(
      p_adj < 0.05 & diff > 0 ~ "Higher (p<0.05)",
      p_adj < 0.05 & diff < 0 ~ "Lower (p<0.05)",
      TRUE                    ~ "Not significant"
    )
  )

# Drop not significant
binom_county <- binom_county %>%
  filter(sig != "Not significant")
# Map
tm_shape(binom_county) +
  tm_polygons(
    fill = "sig",
    fill.scale = tm_scale_categorical(
      values = c(
        "Higher (p<0.05)" = "red",
        "Lower (p<0.05)" = "blue"
      )
    ),
    fill.legend = tm_legend(title = "Black Fatalities vs Black Population Ratio"),
    col = "black",
    lwd = 1
  ) +
  # State boundary overlay
  tm_shape(states_sf) +
  tm_borders(
    col = "black",
    lwd = 1.2
  ) +
  tm_title("Black Fatalities vs Black Population Ratio (binom.test, p < 0.05)") +
  tm_layout(
    legend.position = c("right", "bottom"),
  )

The map highlights counties where the proportion of Black fatalities is significantly higher than expected based on the local Black population share. Notably, clusters of such counties appear around New York City, Detroit, Chicago, and several parts of Florida. Interestingly, a large concentration is also observed across California’s major urban areas, particularly around the San Francisco Bay Area. This is remarkable given California’s reputation as a politically liberal state which is sensitive to racism, suggesting that elevated Black fatality rates can persist even in regions generally considered progressive.

Conclusion

Overall, the analysis indicates that as the proportion of Black residents increases, the proportion of Black fatalities in similar incidents also tends to rise. This suggests that, at a national level, there is no clear evidence of systematic racial bias in fatal encounters.

However, certain counties show statistically higher Black fatality rates, deviating from expected demographic patterns. It is important to note that the dataset includes cases where the cause of death may have been due to the individual’s own actions, meaning these regional differences cannot be conclusively interpreted as evidence of racial discrimination.