Import Libraries
library(tidycensus)
library(sf)
library(tmap)
library(tidyverse)
library(here)
library(knitr)
library(kableExtra)
library(glue)
library(tigris)
library(skimr)
library(broom)
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
data <- read.csv("Fatal.csv")
States where police–civilian interactions result in significantly higher or lower fatality rates for specific racial groups relative to population proportions
skim(data)
| Name | data |
| Number of rows | 31498 |
| Number of columns | 35 |
| _______________________ | |
| Column type frequency: | |
| character | 27 |
| logical | 1 |
| numeric | 7 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Name | 0 | 1.00 | 4 | 82 | 0 | 29859 | 0 |
| Age | 0 | 1.00 | 0 | 5 | 1221 | 112 | 0 |
| Gender | 0 | 1.00 | 0 | 11 | 144 | 4 | 0 |
| Race | 0 | 1.00 | 0 | 57 | 1 | 12 | 0 |
| Race.with.imputations | 862 | 0.97 | 0 | 23 | 6 | 10 | 0 |
| Imputation.probability | 881 | 0.97 | 0 | 19 | 3 | 6614 | 0 |
| URL.of.image..PLS.NO.HOTLINKS. | 0 | 1.00 | 0 | 373 | 16773 | 14668 | 0 |
| Date.of.injury.resulting.in.death..month.day.year. | 0 | 1.00 | 10 | 10 | 0 | 7736 | 0 |
| Location.of.injury..address. | 0 | 1.00 | 0 | 74 | 556 | 28893 | 0 |
| Location.of.death..city. | 0 | 1.00 | 0 | 30 | 36 | 6340 | 0 |
| State | 0 | 1.00 | 0 | 2 | 1 | 52 | 0 |
| Location.of.death..county. | 0 | 1.00 | 0 | 33 | 15 | 1536 | 0 |
| Full.Address | 0 | 1.00 | 0 | 103 | 1 | 29709 | 0 |
| Latitude | 0 | 1.00 | 0 | 17 | 1 | 29515 | 0 |
| Agency.or.agencies.involved | 0 | 1.00 | 0 | 266 | 78 | 6829 | 0 |
| Highest.level.of.force | 0 | 1.00 | 0 | 33 | 4 | 19 | 0 |
| Name.Temporary | 0 | 1.00 | 0 | 58 | 25969 | 5284 | 0 |
| Armed.Unarmed | 0 | 1.00 | 0 | 19 | 14419 | 10 | 0 |
| Alleged.weapon | 0 | 1.00 | 0 | 35 | 14421 | 269 | 0 |
| Aggressive.physical.movement | 0 | 1.00 | 0 | 42 | 14418 | 32 | 0 |
| Fleeing.Not.fleeing | 0 | 1.00 | 0 | 42 | 14419 | 26 | 0 |
| Description.Temp | 0 | 1.00 | 0 | 2239 | 27431 | 3870 | 0 |
| URL.Temp | 0 | 1.00 | 0 | 723 | 28281 | 3066 | 0 |
| Brief.description | 0 | 1.00 | 0 | 2239 | 2 | 29883 | 0 |
| Dispositions.Exclusions.INTERNAL.USE..NOT.FOR.ANALYSIS | 0 | 1.00 | 0 | 89 | 3 | 156 | 0 |
| Intended.use.of.force..Developing. | 0 | 1.00 | 0 | 22 | 3 | 9 | 0 |
| Supporting.document.link | 0 | 1.00 | 0 | 438 | 2 | 29269 | 0 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| X | 31498 | 0 | NaN | : |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Unique.ID | 1 | 1.00 | 15749.00 | 9092.55 | 1.00 | 7875 | 15749.00 | 23623.00 | 31497.00 | ▇▇▇▇▇ |
| Location.of.death..zip.code. | 182 | 0.99 | 58352.53 | 27966.03 | 1013.00 | 33147 | 60649.00 | 85033.00 | 99921.00 | ▃▇▃▆▇ |
| Longitude | 1 | 1.00 | -95.40 | 16.30 | -165.59 | -111 | -90.56 | -82.57 | -67.27 | ▁▁▅▇▇ |
| UID.Temporary | 25969 | 0.18 | 15464.08 | 6559.72 | 9759.00 | 11156 | 12549.00 | 19240.00 | 30340.00 | ▇▁▁▁▂ |
| X.1 | 31497 | 0.00 | 10895.00 | NA | 10895.00 | 10895 | 10895.00 | 10895.00 | 10895.00 | ▁▁▇▁▁ |
| Unique.ID.formula | 31496 | 0.00 | 29497.00 | 2828.43 | 27497.00 | 28497 | 29497.00 | 30497.00 | 31497.00 | ▇▁▁▁▇ |
| Unique.identifier..redundant. | 1 | 1.00 | 15749.00 | 9092.55 | 1.00 | 7875 | 15749.00 | 23623.00 | 31497.00 | ▇▇▇▇▇ |
The first analysis investigates whether certain U.S. states exhibit statistically significant racial disparities in fatal police encounters. Specifically, I examine whether the proportion of Black fatalities in police interactions differs from the national average. This question arises from the concern that, even when controlling for exposure to police interactions, some states may show disproportionately high fatality rates among Black individuals.
Check unique values of “Race” column.
unique(data$Race)
## [1] "African-American/Black"
## [2] "Race unspecified"
## [3] "European-American/White"
## [4] "Hispanic/Latino"
## [5] "Christopher Anthony Alexander"
## [6] "Asian/Pacific Islander"
## [7] "Native American/Alaskan"
## [8] "European-American/European-American/White"
## [9] "Middle Eastern"
## [10] "African-American/Black African-American/Black Not imputed"
## [11] "european-American/White"
## [12] ""
There were a few inconsistencies and typographical errors in the Race variable (e.g., “European-American/European-American/White”). I corrected them as shown below.
data_cleaned <- data %>%
mutate(Race = case_when(
Race == "European-American/European-American/White" ~ "European-American/White",
Race == "African-American/Black African-American/Black Not imputed" ~ "African-American/Black",
Race == "european-American/White" ~ "European-American/White",
TRUE ~ Race
))
Check unique values of “State” column.
unique(data$State)
## [1] "SC" "MS" "GA" "CA" "VA" "FL" "MD" "NJ" "MI" "PA" "LA" "CO" "WA" "TX" "AZ"
## [16] "KS" "NY" "OH" "MN" "MO" "NC" "IL" "UT" "TN" "IN" "NE" "NM" "AR" "KY" "WI"
## [31] "IA" "OR" "WY" "OK" "NV" "MT" "AL" "MA" "RI" "WV" "ID" "SD" "ME" "DC" "AK"
## [46] "NH" "ND" "HI" "VT" "DE" "CT" ""
There are 50 states plus Washington, D.C. (DC), along with some empty entries. For this analysis, the missing state values are not a major concern.
by_state <- data_cleaned %>%
filter(!is.na(State), !is.na(Race), State !="", Race !="Race unspecified") %>%
group_by(State) %>%
summarize(
n_state = n(),
black_state = sum(Race == "African-American/Black"),
white_state = sum(Race == "European-American/White"),
hispanic_state = sum(Race == "Hispanic/Latino"),
asian_state = sum(Race == "Asian/Pacific Islander"),
NativeA_state = sum(Race == "Native American/Alaskan"),
ME_state = sum(Race == "Middle Eastern"),
prop_B = black_state / n_state,
prop_W = white_state / n_state,
prop_H = hispanic_state / n_state,
prop_A = asian_state / n_state,
prop_N = NativeA_state / n_state,
prop_M = ME_state / n_state,
.groups = "drop"
)
race_means <- by_state %>%
summarise(across(starts_with("prop_"), ~ mean(.x, na.rm = TRUE)))
For each state, we test whether the fatality proportion for a specific race (Black or White) differs significantly from the national average proportion. A binomial test is used to assess whether each state’s observed share of deaths for that race is statistically higher or lower than expected based on the national mean.
races <- c("B", "W")
for (r in races) {
prop_col <- paste0("prop_", r)
count_col <- switch(r,
B = "black_state",
W = "white_state")
# p-value
by_state[[paste0("pval_", r)]] <- mapply(function(x, n)
binom.test(x, n, p = race_means[[prop_col]])$p.value,
x = by_state[[count_col]], n = by_state$n_state)
# Difference of Proportion
by_state[[paste0("diff_", r)]] <- by_state[[prop_col]] - race_means[[prop_col]]
}
tmap_mode("view")
states_sf <- states(cb = TRUE) %>%
st_transform(crs = 4326) %>%
select(STUSPS, NAME, geometry)
map_data_sf <- states_sf %>%
left_join(by_state, by = c("STUSPS" = "State"))
map_data_sf <- map_data_sf %>%
mutate(
category_B = case_when(
pval_B < 0.05 & diff_B > 0 ~ "Higher (p<0.05)",
pval_B < 0.05 & diff_B < 0 ~ "Lower (p<0.05)",
TRUE ~ "Not significant"
),
category_W = case_when(
pval_W < 0.05 & diff_W > 0 ~ "Higher (p<0.05)",
pval_W < 0.05 & diff_W < 0 ~ "Lower (p<0.05)",
TRUE ~ "Not significant"
)
)
tm_shape(map_data_sf) +
tm_polygons(
fill = "category_B",
fill.scale = tm_scale(
values = c(
"Higher (p<0.05)" = "pink",
"Lower (p<0.05)" = "skyblue",
"Not significant" = "orange"
)
),
fill.legend = tm_legend(title = "Black Fatalities vs National Avg"),
col = "black",
lwd = 1.5
) +
tm_title("Black Fatalities vs National Average (binom.test, p < 0.05)") +
tm_layout(
legend.position = c("right", "bottom"),
)
The map illustrates regional disparities in African-American fatality rates compared to the national average. States in the eastern and southeastern United States show significantly higher proportions of Black fatalities, suggesting potential demographic risk factors in these regions. In contrast, most Midwestern and Western states, including those along the West Coast, exhibit lower-than-average fatality rates for African-Americans. This geographic divide highlights a possible east–west gradient in racial disparities in fatal police encounters across the United States.
tm_shape(map_data_sf) +
tm_polygons(
fill = "category_W",
fill.scale = tm_scale(
values = c(
"Higher (p<0.05)" = "pink",
"Lower (p<0.05)" = "skyblue",
"Not significant" = "orange"
)
),
fill.legend = tm_legend(title = "White Fatalities vs National Avg"),
col = "black",
lwd = 1.5
) +
tm_title("White Fatalities vs National Average (binom.test, p < 0.05)") +
tm_layout(
legend.position = c("right", "bottom"),
)
This map illustrates the distribution of White fatalities relative to the national average. In contrast to the pattern observed for African-American fatalities, higher-than-average White fatality rates are concentrated in the western and central parts of the United States, while lower-than-average rates are more common along the eastern part and parts of the South. This inverse spatial pattern suggests that racial disparities in fatal police encounters may vary regionally.
Overall, the analysis indicates that as the proportion of Black residents increases, the proportion of Black fatalities in similar incidents also tends to rise. This suggests that, at a national level, there is no clear evidence of systematic racial bias in fatal encounters.
However, certain counties show statistically higher Black fatality rates, deviating from expected demographic patterns. It is important to note that the dataset includes cases where the cause of death may have been due to the individual’s own actions, meaning these regional differences cannot be conclusively interpreted as evidence of racial discrimination.