library(ggplot2)
library(tidyverse)
library(tidyr)
library(dplyr)
library(janitor)
library(stringr)
library(tidyverse)
library(sf)
library(maps)
library(tmap)
In this assignment, I aim to analyze gender differences in fatal police encounters in the United States, examine their spatial distribution, and explore how gender intersects with race in shaping these patterns.
path_raw <- "FATAL ENCOUNTERS DOT ORG SPREADSHEET (See Read me tab) - Form Responses.csv"
fe_raw <- readr::read_csv(
path_raw,
na = c("", "NA", "N/A", "None", "NULL"),
guess_max = 200000,
progress = FALSE
) %>%
clean_names() %>%
remove_empty(c("rows","cols"))
## New names:
## Rows: 31498 Columns: 35
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (28): Name, Age, Gender, Race, Race with imputations, Imputation probabi... dbl
## (6): Unique ID, Longitude, UID Temporary, ...33, Unique ID formula, Uni... lgl
## (1): ...32
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...32`
## • `` -> `...33`
I applied the following filtering rules:
# Count the number of NAs in each column
fe_raw %>% map_dbl(., ~sum(is.na(.x)))
## unique_id
## 1
## name
## 0
## age
## 1221
## gender
## 144
## race
## 1
## race_with_imputations
## 868
## imputation_probability
## 884
## url_of_image_pls_no_hotlinks
## 16773
## date_of_injury_resulting_in_death_month_day_year
## 0
## location_of_injury_address
## 557
## location_of_death_city
## 36
## state
## 1
## location_of_death_zip_code
## 182
## location_of_death_county
## 15
## full_address
## 1
## latitude
## 1
## longitude
## 1
## agency_or_agencies_involved
## 78
## highest_level_of_force
## 4
## uid_temporary
## 25969
## name_temporary
## 25969
## armed_unarmed
## 14427
## alleged_weapon
## 20546
## aggressive_physical_movement
## 18847
## fleeing_not_fleeing
## 14420
## description_temp
## 27431
## url_temp
## 28281
## brief_description
## 2
## dispositions_exclusions_internal_use_not_for_analysis
## 3
## intended_use_of_force_developing
## 3
## supporting_document_link
## 2
## x33
## 31497
## unique_id_formula
## 31496
## unique_identifier_redundant
## 1
# Check the unique values in gender
fe_raw %>% count(gender, sort = TRUE)
## # A tibble: 4 × 2
## gender n
## <chr> <int>
## 1 Male 28300
## 2 Female 3031
## 3 <NA> 144
## 4 Transgender 23
# Check the unique values in race
fe_raw %>% count(race, sort = TRUE)
## # A tibble: 12 × 2
## race n
## <chr> <int>
## 1 European-American/White 10614
## 2 Race unspecified 8779
## 3 African-American/Black 7008
## 4 Hispanic/Latino 4192
## 5 Asian/Pacific Islander 485
## 6 Native American/Alaskan 323
## 7 Middle Eastern 53
## 8 European-American/European-American/White 37
## 9 African-American/Black African-American/Black Not imputed 4
## 10 Christopher Anthony Alexander 1
## 11 european-American/White 1
## 12 <NA> 1
# Check the unique values in highest_level_of_force
fe_raw %>% count(highest_level_of_force, sort = TRUE)
## # A tibble: 19 × 2
## highest_level_of_force n
## <chr> <int>
## 1 Gunshot 22238
## 2 Vehicle 6624
## 3 Tasered 936
## 4 Medical emergency 397
## 5 Asphyxiated/Restrained 347
## 6 Drowned 203
## 7 Beaten/Bludgeoned with instrument 182
## 8 Drug overdose 182
## 9 Undetermined 101
## 10 Fell from a height 82
## 11 Other 65
## 12 Stabbed 52
## 13 Burned/Smoke inhalation 45
## 14 Chemical agent/Pepper spray 35
## 15 <NA> 4
## 16 Asphyxiation/Restrained 2
## 17 Asphyxiation/Restrain 1
## 18 Less-than-lethal force 1
## 19 Restrain/Asphyxiation 1
# examine latitute and longtitute
str(fe_raw$latitude)
## chr [1:31498] "34.7452955" "32.3793294" "32.3793294" "31.5307934" ...
str(fe_raw$longitude)
## num [1:31498] -80.4 -88.7 -88.7 -82.6 -117 ...
# Filter and clean
fe_filtered <- fe_raw %>%
filter(
!is.na(gender),
!str_detect(gender, "Transgender"),
!is.na(race),
!str_detect(race, "Race unspecified|Christopher Anthony Alexander"),
!is.na(highest_level_of_force)
) %>%
mutate(
latitude = as.numeric(latitude),
longitude = as.numeric(longitude)
) %>%
filter(
!is.na(latitude), !is.na(longitude),
between(latitude, -90, 90),
between(longitude, -180, 180)
)
cat("Rows kept after filtering:", nrow(fe_filtered), "/", nrow(fe_raw), "\n")
## Rows kept after filtering: 22673 / 31498
# Check the number of NAs in each column
fe_filtered %>% map_dbl(., ~sum(is.na(.x)))
## unique_id
## 0
## name
## 0
## age
## 211
## gender
## 0
## race
## 0
## race_with_imputations
## 7
## imputation_probability
## 22
## url_of_image_pls_no_hotlinks
## 7967
## date_of_injury_resulting_in_death_month_day_year
## 0
## location_of_injury_address
## 282
## location_of_death_city
## 18
## state
## 0
## location_of_death_zip_code
## 96
## location_of_death_county
## 10
## full_address
## 0
## latitude
## 0
## longitude
## 0
## agency_or_agencies_involved
## 43
## highest_level_of_force
## 0
## uid_temporary
## 18208
## name_temporary
## 18208
## armed_unarmed
## 8461
## alleged_weapon
## 13628
## aggressive_physical_movement
## 12034
## fleeing_not_fleeing
## 8455
## description_temp
## 19372
## url_temp
## 20101
## brief_description
## 1
## dispositions_exclusions_internal_use_not_for_analysis
## 2
## intended_use_of_force_developing
## 0
## supporting_document_link
## 1
## x33
## 22672
## unique_id_formula
## 22672
## unique_identifier_redundant
## 0
After checking the dataset, I found that the variables
gender
, race
, latitude
,
longitude
, and highest_level_of_force
contain
no missing values. Therefore, I will proceed with the subsequent
analysis.
At first, I would like to see the gender distribution.
gender_plot <- fe_filtered %>%
count(gender, name = "n") %>%
mutate(pct = n / sum(n) * 100)
gender_plot
## # A tibble: 2 × 3
## gender n pct
## <chr> <int> <dbl>
## 1 Female 2242 9.89
## 2 Male 20431 90.1
ggplot(gender_plot, aes(x = gender, y = pct, fill = gender)) +
geom_col(width = 0.6) +
geom_text(aes(label = paste0(round(pct, 1), "%")),
vjust = -0.3, size = 4) +
scale_y_continuous(limits = c(0, 100), expand = c(0, 0)) +
labs(
title = "Gender Differences",
x = "Gender",
y = "Percentage"
) +
theme_minimal(base_size = 10) +
theme(
legend.position = "none",
plot.title = element_text(hjust = 0.5)
)
To better focus on severe use-of-force incidents, this section examines only cases where the highest level of force was recorded as Gunshot. The goal is to compare the gender distribution in these fatal shootings.
fe_force <- fe_filtered %>%
mutate(
force_group = ifelse(highest_level_of_force == "Gunshot", "Gunshot", "Other")
) %>%
count(gender, force_group, name = "n") %>%
group_by(gender) %>%
mutate(pct = n / sum(n) * 100)
fe_force
## # A tibble: 4 × 4
## # Groups: gender [2]
## gender force_group n pct
## <chr> <chr> <int> <dbl>
## 1 Female Gunshot 940 41.9
## 2 Female Other 1302 58.1
## 3 Male Gunshot 15672 76.7
## 4 Male Other 4759 23.3
ggplot(fe_force, aes(x = gender, y = n, fill = force_group)) +
geom_col(width = 0.6, color = "white") +
geom_text(
aes(
label = paste0(round(pct, 1), "%"),
group = force_group
),
position = position_stack(vjust = 0.5),
color = "white", size = 3
) +
labs(
title = "Gunshot Proportion",
x = "Gender",
y = "Number of Cases",
fill = "Force Type"
) +
theme_minimal(base_size = 10) +
theme(
plot.title = element_text(hjust = 0.5)
)
The bar plot reveals pronounced gender differences in fatal police encounters. Male victims far outnumber female victims, reflecting the overall gender imbalance in the dataset. Among men, approximately 76.7% of deaths were caused by gunfire, indicating that shootings are the predominant mechanism of fatal force used against males. In contrast, only about 41.9% of female deaths resulted from gunfire, with the majority linked to other forms of force. This pattern suggests that male fatalities are more likely to occur in armed or high-threat situations, whereas female fatalities often arise in less weapon-intensive encounters, highlighting both the quantitative and qualitative gender disparities in fatal police use of force.
This section maps the geographic distribution of fatal police encounters across the United States. By plotting incident locations by gender, we can visually assess whether the spatial patterns of male and female fatalities differ — for instance, whether cases are more concentrated in certain states or metropolitan regions.
Individual incident locations (latitude–longitude) reveal precise spatial patterns, but they are often too granular and scattered to identify broader geographic trends. By aggregating to states, we can compare how gender disparities in fatal encounters vary across different political, legal, and institutional contexts.
State-level analysis is also theoretically and policy relevant. Law enforcement systems, criminal justice policies, and data reporting standards in the United States are largely organized and regulated at the state level. Differences in state laws, policing practices, demographic composition, and levels of urbanization may all contribute to variations in the frequency and characteristics of fatal incidents.
# Keep only valid coordinates
fe_map <- fe_filtered %>%
filter(!is.na(latitude), !is.na(longitude))
fe_sf <- st_as_sf(fe_map, coords = c("longitude", "latitude"), crs = 4326, remove = FALSE)
# Get states sf
sf::sf_use_s2(FALSE)
states_sf <- maps::map("state", fill = TRUE, plot = FALSE) %>%
st_as_sf() %>%
st_set_crs(4326) %>%
mutate(state_lower = ID)
# Spatial join
fe_by_state <- fe_sf %>%
st_join(states_sf, join = st_within, left = FALSE) %>%
st_drop_geometry()
# Calculate the percentage
state_gender <- fe_by_state %>%
count(state_lower, gender, name = "n") %>%
group_by(state_lower) %>%
mutate(total = sum(n), pct = n/total*100) %>%
ungroup()
female_share <- state_gender %>%
filter(gender == "Female") %>%
transmute(state_lower, female_n = n, total, female_share = female_n/total*100)
# Spatial join
states_choro <- states_sf %>%
left_join(female_share, by = "state_lower")
# Visualize
tmap_mode("view")
tm_shape(states_choro) +
tm_polygons(
col = "female_share",
palette = "Blues",
style = "cont",
id = "state_lower",
popup.vars = c(
"Female share (%)" = "female_share",
"Female n" = "female_n",
"Total" = "total"
)
) +
tm_layout(title = "Female share by state")
# get points for states
state_cent <- states_choro %>%
st_point_on_surface() %>%
select(state_lower, total)
# Visualize
tm_shape(states_choro) +
tm_polygons(
col = "female_share",
palette = "Blues",
style = "cont",
id = "state_lower",
popup.vars = c(
"Female share (%)" = "female_share",
"Female n" = "female_n",
"Total cases" = "total"
),
title = "Female share (%)"
) +
tm_shape(state_cent) +
tm_bubbles(size = "total", col = "gray20", alpha = 0.5,
scale = 1.2, title.size = "Total cases") +
tm_basemap("Esri.WorldGrayCanvas") +
tm_view(alpha = 0.95)
female victims constitute a small minority of all fatal encounters in every state, but the proportion varies regionally. Kansas stands out with the highest female share, followed by several states in the Southeast and Midwest such as Mississippi and Iowa. In contrast, most Western and Northeastern states show relatively lower percentages of female victims. Importantly, states with the largest total number of cases—such as California, Texas, and Florida—do not necessarily have higher female proportions, indicating that incident volume and gender composition are not directly linked. These regional patterns suggest that gendered dynamics in fatal police encounters are shaped by state-level contexts, including demographic structures, policing practices, and reporting systems.
This section examines the intersection of gender and race among fatal police encounters. By cross-tabulating these two variables, we can assess whether gender differences in fatal incidents vary across racial groups.
# Reclassify and clean the data
fe_clean_race <- fe_filtered %>%
mutate(
race_clean = str_to_lower(str_trim(race)),
race_clean = case_when(
str_detect(race_clean, "white|european") ~ "White",
str_detect(race_clean, "black|african") ~ "Black",
str_detect(race_clean, "hispanic|latino") ~ "Hispanic/Latino",
str_detect(race_clean, "asian|pacific") ~ "Asian/PI",
str_detect(race_clean, "native|american indian|alaska") ~ "Native/AIAN",
str_detect(race_clean, "middle east|arab") ~ "Middle Eastern"
)
)
gender_race_clean <- fe_clean_race %>%
count(gender, race_clean, name = "n") %>%
group_by(gender) %>%
mutate(pct = n / sum(n) * 100)
gender_race_clean
## # A tibble: 12 × 4
## # Groups: gender [2]
## gender race_clean n pct
## <chr> <chr> <int> <dbl>
## 1 Female Asian/PI 59 2.63
## 2 Female Black 595 26.5
## 3 Female Hispanic/Latino 336 15.0
## 4 Female Middle Eastern 4 0.178
## 5 Female Native/AIAN 35 1.56
## 6 Female White 1213 54.1
## 7 Male Asian/PI 426 2.09
## 8 Male Black 6409 31.4
## 9 Male Hispanic/Latino 3830 18.7
## 10 Male Middle Eastern 49 0.240
## 11 Male Native/AIAN 288 1.41
## 12 Male White 9429 46.2
To make the visualization clearer and more interpretable, race categories are simplified. I display the three major racial groups (White, Black, and Hispanic/Latino) that together account for the vast majority of cases, while all remaining smaller categories are grouped into “Other”. This approach allows for easier comparison across genders without losing the broader racial context.
# Simplify race categories
gender_race_simplified <- fe_clean_race %>%
mutate(
race_simplified = case_when(
race_clean %in% c("White", "Black", "Hispanic/Latino") ~ race_clean,
TRUE ~ "Other"
)
) %>%
count(gender, race_simplified, name = "n") %>%
group_by(gender) %>%
mutate(pct = n / sum(n) * 100)
# Visualize
ggplot(gender_race_simplified, aes(x = gender, y = pct, fill = race_simplified)) +
geom_col(color = "white", width = 0.7) +
geom_text(aes(label = paste0(round(pct, 1), "%")),
position = position_stack(vjust = 0.5),
color = "white", size = 3) +
scale_y_continuous(limits = c(0, 100), expand = c(0, 0)) +
labs(
title = "Racial composition within each gender",
x = "Gender",
y = "Percentage",
fill = "Race"
) +
theme_minimal(base_size = 10) +
theme(plot.title = element_text(hjust = 0.5))
The plot reveals clear gendered patterns in the racial composition of fatal police encounters. Among female victims, White individuals account for the majority (54.1%), followed by Black (26.5%) and Hispanic/Latino (15%) victims. In contrast, among male victims, the proportion of Black individuals increases to about 31%, and Hispanic/Latino to nearly 19%, while White males make up a smaller share (46.2%). This indicates that racial disparities are more pronounced among men, with men of color—particularly Black and Hispanic men—representing a larger proportion of fatal encounters relative to women. Overall, the figure highlights that both race and gender intersect to shape patterns of exposure to fatal police force, with minority men being the most disproportionately affected group.