library(ggplot2)
library(tidyverse)
library(tidyr)
library(dplyr)
library(janitor)
library(stringr)
library(tidyverse)
library(sf)
library(maps)
library(tmap)

Fatal Encounters

In this assignment, I aim to analyze gender differences in fatal police encounters in the United States, examine their spatial distribution, and explore how gender intersects with race in shaping these patterns.

Preprocess

Import data

path_raw <- "FATAL ENCOUNTERS DOT ORG SPREADSHEET (See Read me tab) - Form Responses.csv"
fe_raw <- readr::read_csv(
  path_raw,
  na = c("", "NA", "N/A", "None", "NULL"),
  guess_max = 200000,
  progress = FALSE
) %>%
  clean_names() %>%
  remove_empty(c("rows","cols"))
## New names:
## Rows: 31498 Columns: 35
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (28): Name, Age, Gender, Race, Race with imputations, Imputation probabi... dbl
## (6): Unique ID, Longitude, UID Temporary, ...33, Unique ID formula, Uni... lgl
## (1): ...32
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...32`
## • `` -> `...33`

Data cleaning

I applied the following filtering rules:

  1. Gender filtering
  • Excluded observations with missing (NA) gender information.
  • Excluded cases labeled as “Transgender”. This category is not removed due to lack of importance, but because their sample size is extremely small within this dataset. Including very few cases can lead to unstable estimates, wide confidence intervals, and potential disclosure risks in subgroup analyses.
  1. Race filtering
  • Excluded observations with missing (NA) race information.
  • Excluded cases labeled as “Race unspecified”. These entries lack clear racial identification, which prevents meaningful intersectional comparisons between gender and race categories.
  1. Force information filtering
  • Keep only cases with non-missing values in highest_level_of_force, since this variable is essential for later analysis.
  1. Geolocation cleaning
  • Convert latitude and longitude to numeric format.
  • Exclude observations with missing coordinates or coordinates outside the valid geographic range (latitude between −90 and 90, longitude between −180 and 180).
# Count the number of NAs in each column
fe_raw %>% map_dbl(., ~sum(is.na(.x)))
##                                             unique_id 
##                                                     1 
##                                                  name 
##                                                     0 
##                                                   age 
##                                                  1221 
##                                                gender 
##                                                   144 
##                                                  race 
##                                                     1 
##                                 race_with_imputations 
##                                                   868 
##                                imputation_probability 
##                                                   884 
##                          url_of_image_pls_no_hotlinks 
##                                                 16773 
##      date_of_injury_resulting_in_death_month_day_year 
##                                                     0 
##                            location_of_injury_address 
##                                                   557 
##                                location_of_death_city 
##                                                    36 
##                                                 state 
##                                                     1 
##                            location_of_death_zip_code 
##                                                   182 
##                              location_of_death_county 
##                                                    15 
##                                          full_address 
##                                                     1 
##                                              latitude 
##                                                     1 
##                                             longitude 
##                                                     1 
##                           agency_or_agencies_involved 
##                                                    78 
##                                highest_level_of_force 
##                                                     4 
##                                         uid_temporary 
##                                                 25969 
##                                        name_temporary 
##                                                 25969 
##                                         armed_unarmed 
##                                                 14427 
##                                        alleged_weapon 
##                                                 20546 
##                          aggressive_physical_movement 
##                                                 18847 
##                                   fleeing_not_fleeing 
##                                                 14420 
##                                      description_temp 
##                                                 27431 
##                                              url_temp 
##                                                 28281 
##                                     brief_description 
##                                                     2 
## dispositions_exclusions_internal_use_not_for_analysis 
##                                                     3 
##                      intended_use_of_force_developing 
##                                                     3 
##                              supporting_document_link 
##                                                     2 
##                                                   x33 
##                                                 31497 
##                                     unique_id_formula 
##                                                 31496 
##                           unique_identifier_redundant 
##                                                     1
# Check the unique values in gender
fe_raw %>% count(gender, sort = TRUE)
## # A tibble: 4 × 2
##   gender          n
##   <chr>       <int>
## 1 Male        28300
## 2 Female       3031
## 3 <NA>          144
## 4 Transgender    23
# Check the unique values in race
fe_raw %>% count(race, sort = TRUE)
## # A tibble: 12 × 2
##    race                                                          n
##    <chr>                                                     <int>
##  1 European-American/White                                   10614
##  2 Race unspecified                                           8779
##  3 African-American/Black                                     7008
##  4 Hispanic/Latino                                            4192
##  5 Asian/Pacific Islander                                      485
##  6 Native American/Alaskan                                     323
##  7 Middle Eastern                                               53
##  8 European-American/European-American/White                    37
##  9 African-American/Black African-American/Black Not imputed     4
## 10 Christopher Anthony Alexander                                 1
## 11 european-American/White                                       1
## 12 <NA>                                                          1
# Check the unique values in highest_level_of_force
fe_raw %>% count(highest_level_of_force, sort = TRUE)
## # A tibble: 19 × 2
##    highest_level_of_force                n
##    <chr>                             <int>
##  1 Gunshot                           22238
##  2 Vehicle                            6624
##  3 Tasered                             936
##  4 Medical emergency                   397
##  5 Asphyxiated/Restrained              347
##  6 Drowned                             203
##  7 Beaten/Bludgeoned with instrument   182
##  8 Drug overdose                       182
##  9 Undetermined                        101
## 10 Fell from a height                   82
## 11 Other                                65
## 12 Stabbed                              52
## 13 Burned/Smoke inhalation              45
## 14 Chemical agent/Pepper spray          35
## 15 <NA>                                  4
## 16 Asphyxiation/Restrained               2
## 17 Asphyxiation/Restrain                 1
## 18 Less-than-lethal force                1
## 19 Restrain/Asphyxiation                 1
# examine latitute and longtitute
str(fe_raw$latitude)
##  chr [1:31498] "34.7452955" "32.3793294" "32.3793294" "31.5307934" ...
str(fe_raw$longitude)
##  num [1:31498] -80.4 -88.7 -88.7 -82.6 -117 ...
# Filter and clean
fe_filtered <- fe_raw %>%
  filter(
    !is.na(gender),
    !str_detect(gender, "Transgender"),
    !is.na(race),
    !str_detect(race, "Race unspecified|Christopher Anthony Alexander"),
    !is.na(highest_level_of_force)
  ) %>%
  mutate(
    latitude = as.numeric(latitude),
    longitude = as.numeric(longitude)
  ) %>%
  filter(
    !is.na(latitude), !is.na(longitude),
    between(latitude, -90, 90),
    between(longitude, -180, 180)
  )

cat("Rows kept after filtering:", nrow(fe_filtered), "/", nrow(fe_raw), "\n")
## Rows kept after filtering: 22673 / 31498
# Check the number of NAs in each column
fe_filtered %>% map_dbl(., ~sum(is.na(.x)))
##                                             unique_id 
##                                                     0 
##                                                  name 
##                                                     0 
##                                                   age 
##                                                   211 
##                                                gender 
##                                                     0 
##                                                  race 
##                                                     0 
##                                 race_with_imputations 
##                                                     7 
##                                imputation_probability 
##                                                    22 
##                          url_of_image_pls_no_hotlinks 
##                                                  7967 
##      date_of_injury_resulting_in_death_month_day_year 
##                                                     0 
##                            location_of_injury_address 
##                                                   282 
##                                location_of_death_city 
##                                                    18 
##                                                 state 
##                                                     0 
##                            location_of_death_zip_code 
##                                                    96 
##                              location_of_death_county 
##                                                    10 
##                                          full_address 
##                                                     0 
##                                              latitude 
##                                                     0 
##                                             longitude 
##                                                     0 
##                           agency_or_agencies_involved 
##                                                    43 
##                                highest_level_of_force 
##                                                     0 
##                                         uid_temporary 
##                                                 18208 
##                                        name_temporary 
##                                                 18208 
##                                         armed_unarmed 
##                                                  8461 
##                                        alleged_weapon 
##                                                 13628 
##                          aggressive_physical_movement 
##                                                 12034 
##                                   fleeing_not_fleeing 
##                                                  8455 
##                                      description_temp 
##                                                 19372 
##                                              url_temp 
##                                                 20101 
##                                     brief_description 
##                                                     1 
## dispositions_exclusions_internal_use_not_for_analysis 
##                                                     2 
##                      intended_use_of_force_developing 
##                                                     0 
##                              supporting_document_link 
##                                                     1 
##                                                   x33 
##                                                 22672 
##                                     unique_id_formula 
##                                                 22672 
##                           unique_identifier_redundant 
##                                                     0

After checking the dataset, I found that the variables gender, race, latitude, longitude, and highest_level_of_force contain no missing values. Therefore, I will proceed with the subsequent analysis.

Data analysis

Gender-focused analysis

Gender distribution

At first, I would like to see the gender distribution.

gender_plot <- fe_filtered %>%
  count(gender, name = "n") %>%
  mutate(pct = n / sum(n) * 100)
gender_plot
## # A tibble: 2 × 3
##   gender     n   pct
##   <chr>  <int> <dbl>
## 1 Female  2242  9.89
## 2 Male   20431 90.1
ggplot(gender_plot, aes(x = gender, y = pct, fill = gender)) +
  geom_col(width = 0.6) +
  geom_text(aes(label = paste0(round(pct, 1), "%")), 
            vjust = -0.3, size = 4) +
  scale_y_continuous(limits = c(0, 100), expand = c(0, 0)) +
  labs(
    title = "Gender Differences",
    x = "Gender",
    y = "Percentage"
  ) +
  theme_minimal(base_size = 10) +
  theme(
    legend.position = "none",
    plot.title = element_text(hjust = 0.5)
  )

Gender differences by highest level of force

To better focus on severe use-of-force incidents, this section examines only cases where the highest level of force was recorded as Gunshot. The goal is to compare the gender distribution in these fatal shootings.

fe_force <- fe_filtered %>%
  mutate(
    force_group = ifelse(highest_level_of_force == "Gunshot", "Gunshot", "Other")
  ) %>%
  count(gender, force_group, name = "n") %>%
  group_by(gender) %>%
  mutate(pct = n / sum(n) * 100)

fe_force
## # A tibble: 4 × 4
## # Groups:   gender [2]
##   gender force_group     n   pct
##   <chr>  <chr>       <int> <dbl>
## 1 Female Gunshot       940  41.9
## 2 Female Other        1302  58.1
## 3 Male   Gunshot     15672  76.7
## 4 Male   Other        4759  23.3
ggplot(fe_force, aes(x = gender, y = n, fill = force_group)) +
  geom_col(width = 0.6, color = "white") +
  geom_text(
    aes(
      label = paste0(round(pct, 1), "%"),
      group = force_group
    ),
    position = position_stack(vjust = 0.5),
    color = "white", size = 3
  ) +
  labs(
    title = "Gunshot Proportion",
    x = "Gender",
    y = "Number of Cases",
    fill = "Force Type"
  ) +
  theme_minimal(base_size = 10) +
  theme(
    plot.title = element_text(hjust = 0.5)
  )

Gender-focused findings

The bar plot reveals pronounced gender differences in fatal police encounters. Male victims far outnumber female victims, reflecting the overall gender imbalance in the dataset. Among men, approximately 76.7% of deaths were caused by gunfire, indicating that shootings are the predominant mechanism of fatal force used against males. In contrast, only about 41.9% of female deaths resulted from gunfire, with the majority linked to other forms of force. This pattern suggests that male fatalities are more likely to occur in armed or high-threat situations, whereas female fatalities often arise in less weapon-intensive encounters, highlighting both the quantitative and qualitative gender disparities in fatal police use of force.

Spatial distribution of fatal encounters by gender

This section maps the geographic distribution of fatal police encounters across the United States. By plotting incident locations by gender, we can visually assess whether the spatial patterns of male and female fatalities differ — for instance, whether cases are more concentrated in certain states or metropolitan regions.

Conduct analysis at the state level

Individual incident locations (latitude–longitude) reveal precise spatial patterns, but they are often too granular and scattered to identify broader geographic trends. By aggregating to states, we can compare how gender disparities in fatal encounters vary across different political, legal, and institutional contexts.

State-level analysis is also theoretically and policy relevant. Law enforcement systems, criminal justice policies, and data reporting standards in the United States are largely organized and regulated at the state level. Differences in state laws, policing practices, demographic composition, and levels of urbanization may all contribute to variations in the frequency and characteristics of fatal incidents.

# Keep only valid coordinates
fe_map <- fe_filtered %>%
  filter(!is.na(latitude), !is.na(longitude))

fe_sf <- st_as_sf(fe_map, coords = c("longitude", "latitude"), crs = 4326, remove = FALSE)
# Get states sf
sf::sf_use_s2(FALSE)

states_sf <- maps::map("state", fill = TRUE, plot = FALSE) %>%
  st_as_sf() %>%
  st_set_crs(4326) %>%
  mutate(state_lower = ID)

# Spatial join
fe_by_state <- fe_sf %>%
  st_join(states_sf, join = st_within, left = FALSE) %>%
  st_drop_geometry()
# Calculate the percentage
state_gender <- fe_by_state %>%
  count(state_lower, gender, name = "n") %>%
  group_by(state_lower) %>%
  mutate(total = sum(n), pct = n/total*100) %>%
  ungroup()

female_share <- state_gender %>%
  filter(gender == "Female") %>%
  transmute(state_lower, female_n = n, total, female_share = female_n/total*100)

# Spatial join
states_choro <- states_sf %>%
  left_join(female_share, by = "state_lower")
# Visualize
tmap_mode("view")

tm_shape(states_choro) +
  tm_polygons(
    col = "female_share",
    palette = "Blues",
    style = "cont",
    id = "state_lower",
    popup.vars = c(
      "Female share (%)" = "female_share",
      "Female n" = "female_n",
      "Total" = "total"
    )
  ) +
  tm_layout(title = "Female share by state")
# get points for states
state_cent <- states_choro %>%
  st_point_on_surface() %>%
  select(state_lower, total)

# Visualize
tm_shape(states_choro) +
  tm_polygons(
    col = "female_share",
    palette = "Blues",
    style = "cont",
    id = "state_lower",
    popup.vars = c(
      "Female share (%)" = "female_share",
      "Female n"         = "female_n",
      "Total cases"      = "total"
    ),
    title = "Female share (%)"
  ) +
  tm_shape(state_cent) +
  tm_bubbles(size = "total", col = "gray20", alpha = 0.5,
             scale = 1.2, title.size = "Total cases") +
  tm_basemap("Esri.WorldGrayCanvas") +
  tm_view(alpha = 0.95)

Spatial distribution findings:

female victims constitute a small minority of all fatal encounters in every state, but the proportion varies regionally. Kansas stands out with the highest female share, followed by several states in the Southeast and Midwest such as Mississippi and Iowa. In contrast, most Western and Northeastern states show relatively lower percentages of female victims. Importantly, states with the largest total number of cases—such as California, Texas, and Florida—do not necessarily have higher female proportions, indicating that incident volume and gender composition are not directly linked. These regional patterns suggest that gendered dynamics in fatal police encounters are shaped by state-level contexts, including demographic structures, policing practices, and reporting systems.

Intersectional analysis: Gender × Race

This section examines the intersection of gender and race among fatal police encounters. By cross-tabulating these two variables, we can assess whether gender differences in fatal incidents vary across racial groups.

# Reclassify and clean the data
fe_clean_race <- fe_filtered %>%
  mutate(
    race_clean = str_to_lower(str_trim(race)),
    race_clean = case_when(
      str_detect(race_clean, "white|european") ~ "White",
      str_detect(race_clean, "black|african") ~ "Black",
      str_detect(race_clean, "hispanic|latino") ~ "Hispanic/Latino",
      str_detect(race_clean, "asian|pacific") ~ "Asian/PI",
      str_detect(race_clean, "native|american indian|alaska") ~ "Native/AIAN",
      str_detect(race_clean, "middle east|arab") ~ "Middle Eastern"
    )
  )

gender_race_clean <- fe_clean_race %>%
  count(gender, race_clean, name = "n") %>%
  group_by(gender) %>%
  mutate(pct = n / sum(n) * 100)

gender_race_clean
## # A tibble: 12 × 4
## # Groups:   gender [2]
##    gender race_clean          n    pct
##    <chr>  <chr>           <int>  <dbl>
##  1 Female Asian/PI           59  2.63 
##  2 Female Black             595 26.5  
##  3 Female Hispanic/Latino   336 15.0  
##  4 Female Middle Eastern      4  0.178
##  5 Female Native/AIAN        35  1.56 
##  6 Female White            1213 54.1  
##  7 Male   Asian/PI          426  2.09 
##  8 Male   Black            6409 31.4  
##  9 Male   Hispanic/Latino  3830 18.7  
## 10 Male   Middle Eastern     49  0.240
## 11 Male   Native/AIAN       288  1.41 
## 12 Male   White            9429 46.2

To make the visualization clearer and more interpretable, race categories are simplified. I display the three major racial groups (White, Black, and Hispanic/Latino) that together account for the vast majority of cases, while all remaining smaller categories are grouped into “Other”. This approach allows for easier comparison across genders without losing the broader racial context.

# Simplify race categories
gender_race_simplified <- fe_clean_race %>%
  mutate(
    race_simplified = case_when(
      race_clean %in% c("White", "Black", "Hispanic/Latino") ~ race_clean,
      TRUE ~ "Other"
    )
  ) %>%
  count(gender, race_simplified, name = "n") %>%
  group_by(gender) %>%
  mutate(pct = n / sum(n) * 100)

# Visualize
ggplot(gender_race_simplified, aes(x = gender, y = pct, fill = race_simplified)) +
  geom_col(color = "white", width = 0.7) +
  geom_text(aes(label = paste0(round(pct, 1), "%")),
            position = position_stack(vjust = 0.5),
            color = "white", size = 3) +
  scale_y_continuous(limits = c(0, 100), expand = c(0, 0)) +
  labs(
    title = "Racial composition within each gender",
    x = "Gender",
    y = "Percentage",
    fill = "Race"
  ) +
  theme_minimal(base_size = 10) +
  theme(plot.title = element_text(hjust = 0.5))

Intersectional findings

The plot reveals clear gendered patterns in the racial composition of fatal police encounters. Among female victims, White individuals account for the majority (54.1%), followed by Black (26.5%) and Hispanic/Latino (15%) victims. In contrast, among male victims, the proportion of Black individuals increases to about 31%, and Hispanic/Latino to nearly 19%, while White males make up a smaller share (46.2%). This indicates that racial disparities are more pronounced among men, with men of color—particularly Black and Hispanic men—representing a larger proportion of fatal encounters relative to women. Overall, the figure highlights that both race and gender intersect to shape patterns of exposure to fatal police force, with minority men being the most disproportionately affected group.

My findings

  • Tell a narrative account of your findings (no more than 300 words), supported by at least four interesting graphics/maps.
  • Response: Across all analyses, a consistent gender disparity emerges in fatal police encounters in the United States. Quantitative summaries show that men overwhelmingly account for the majority of victims, with women representing less than ten percent of cases. Yet intersectional and spatial analyses reveal more complexity beneath this pattern. Women’s fatalities are disproportionately White, while Black and Hispanic men make up a larger share of male cases, reflecting compounding racialized risks. Spatially, the share of female victims varies by state—reaching its highest levels in Kansas and parts of the Southeast. Together, these findings suggest that gendered exposure to fatal police violence is not only numerically unequal but also geographically and racially differentiated, pointing to the importance of examining how local contexts shape patterns of police use of force.