R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ lubridate 1.9.4     ✔ tibble    3.3.0
## ✔ purrr     1.1.0     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)
library(ggpmisc)
## Loading required package: ggpp
## Registered S3 methods overwritten by 'ggpp':
##   method                  from   
##   heightDetails.titleGrob ggplot2
##   widthDetails.titleGrob  ggplot2
## 
## Attaching package: 'ggpp'
## 
## The following object is masked from 'package:ggplot2':
## 
##     annotate
library(usmap)
library(tmap)
library(stringr)
library(sf)
## Linking to GEOS 3.13.1, GDAL 3.11.0, PROJ 9.6.0; sf_use_s2() is TRUE
library(viridis)
## Loading required package: viridisLite

Instructions

Fatal Encounters

In 2020, after many frustrations in obtaining the data on police killings of civilians, Brian Burghart of USC began to collect his own data from various sources in the US. He called the database Fatal Encounters.

The database, as you will see, is quite comprehensive - including name, gender, race, location, disposition of the case, etc. Your assignment is to tell a compelling story about police killings in this country – it does not have to be comprehensive. Any particular aspect of this dataset might be interesting to highlight and tell a unique story about. There are many stories hidden in the data – find your own!

Directions

  1. Download the data from this link.
    • Click ‘Download FE Database’, which will direct you to Google Sheets.
    • Export the ‘Form Responses’ sheet into a CSV file (File -> Download -> .csv).
  2. Import the data into R and explore.
  3. Tell a narrative account of your findings (no more than 300 words), supported by at least four interesting graphics/maps.
## Loading data

fe_form_responses <- read.csv("FATAL ENCOUNTERS_Form Responses.csv")

# Clean the data: keep only rows with valid coordinates and state info
fe_clean <- fe_form_responses %>%
  filter(!is.na(State) & State != "",
         !is.na(Latitude) & !is.na(Longitude)) %>%
  mutate(Latitude = as.numeric(Latitude),
         Longitude = as.numeric(Longitude),
         State = toupper(State))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Latitude = as.numeric(Latitude)`.
## Caused by warning:
## ! NAs introduced by coercion

Plot 1

# Summarize by state
state_counts <- fe_clean %>%
  group_by(State) %>%
  summarise(total_killings = n()) %>%
  arrange(desc(total_killings))

# Select only top states for graph
top_states <- state_counts %>%
  top_n(10, total_killings) %>%
  arrange(total_killings)

ggplot(top_states, aes(x = reorder(State, total_killings), y = total_killings)) +
  geom_col(fill = "firebrick") +
  coord_flip() +
  labs(title = "Top 10 States with Most Police Killings",
       x = "State",
       y = "Total Killings (Absolute Counts)") +
  theme_minimal(base_size = 13)

The plot 1 shows a bar plot that explores the top ten (10) concentration of police killings across the United States. The results show that there is a higher concentration of the incidence of police killings in States such as California, Texas, and Florida. The finding here reflects the absolute counts of the incidence of police killing. While population size is a confounding variable, the sheer volume of incidents in these states warrants investigation into state-specific legal, institutional, and cultural factors.

Plot 2

# Convert Date to Date format
fe_form_responses$Date.of.injury.resulting.in.death <- as.Date(fe_form_responses$Date.of.injury.resulting.in.death, format = "%m/%d/%Y")

# Extract year and month
fe_form_responses$Year <- format(fe_form_responses$Date.of.injury.resulting.in.death, "%Y")
fe_form_responses$Month <- format(fe_form_responses$Date.of.injury.resulting.in.death, "%m")

# Count incidents by year
yearly_data <- fe_form_responses %>%
  group_by(Year) %>%
  summarise(Incidents = n())

# Create a line plot of incidents over time
ggplot(yearly_data, aes(x = Year, y = Incidents)) +
  geom_line(group = 1, color = "blue") +
  labs(title = "Police Killings Over Time",
       x = "Year",
       y = "Number of Incidents") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

The Plot 2 shows the trend of police killings over time from 1999 to 2021. The trend shows police killing to be a linearly increasing phenomena. The trend line indicates a stable but increasing incidence within the United States. After a period of increase in the early 2000s, the annual incidence becomes steady, demonstrating a form of stability at a rate of approximately 1,000 to 1,500 fatalities per year. The persistence of these incidence suggests that interventions have been insufficient to alter a change in the outcome of police police killings.

Plot 3

# Convert Age to numeric, handling non-numeric values
fe_form_responses$Age <- as.numeric(fe_form_responses$Age)
## Warning: NAs introduced by coercion
# Filter out rows with missing or non-numeric age data
age_data <- fe_form_responses %>%
  filter(!is.na(Age) & Age > 0)

# Create a histogram of age distribution
ggplot(age_data, aes(x = Age)) +
  geom_histogram(binwidth = 5, fill = "blue", color = "black", alpha = 0.7) +
  labs(title = "Age Distribution of Police Killing Victims",
       x = "Age",
       y = "Frequency")

The plot 3 shows the age distribution of police killing victims in the United States. The data is not normally distributed across the lifespan but is heavily right-skewed, with a sharp modal peak in the 20 to 40 year old range. This distribution is inconsistent with general mortality patterns and indicates that young adults face a disproportionately high risk of police involved death.

Plot 4

# Ensure Latitude and Longitude are numeric
fe_form_responses$Latitude <- as.numeric(fe_form_responses$Latitude)
## Warning: NAs introduced by coercion
fe_form_responses$Longitude <- as.numeric(fe_form_responses$Longitude)

# Filter out rows with missing coordinates or race information
map_data <- fe_form_responses %>%
  filter(!is.na(Latitude) & !is.na(Longitude) & !is.na(Race)) %>%
  st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326)

# Plot map
ggplot(data = map_data) +
  geom_sf(aes(color = Race), alpha = 0.6) +
  scale_color_manual(values = c("African-American/Black" = "red", "White" = "blue", "Other" = "green")) +
  theme_minimal() +
  labs(title = "Police Killings by Race",
       subtitle = "Geographic Distribution",
       color = "Race")

The map from Plot 4 shows police killings by race with a particular interest in African-American/Black. The map shows a clear regional pattern of the incidence of police killings in the United States. The occurrence of these incidence are particularly observed in states such as Georgia, Alabama, Louisiana, Florida, and also mostly scattered across the east side of the United States. This could be the case because of the share number of African-American/Blacks in those States. Comparing this to the Mid west shows a contrast in police killings

Plot 5

# Ensure Latitude and Longitude are numeric
fe_form_responses$Latitude <- as.numeric(fe_form_responses$Latitude)
fe_form_responses$Longitude <- as.numeric(fe_form_responses$Longitude)

# Filter & recode Race into desired groups
map_data_sel <- fe_form_responses %>%
  filter(!is.na(Latitude), !is.na(Longitude), !is.na(Race)) %>%
  mutate(
    Race2 = case_when(
      Race == "African-American/Black" ~ "African-American/Black",
      Race == "European-American/White" ~ "European-American/White",
      Race == "Hispanic/Latino" ~ "Hispanic/Latino",
      TRUE ~ "Other"
    )
  ) %>%
  st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326)

# Plot with tmap
tmap_mode("plot")
## ℹ tmap modes "plot" - "view"
## ℹ toggle with `tmap::ttm()`
tm_shape(map_data_sel) +
  tm_dots(
    col = "Race2",
    palette = c(
      "African-American/Black" = "red",
      "European-American/White" = "blue",
      "Hispanic/Latino" = "orange",
      "Other" = "gray"
    ),
    size = 0.1,
    alpha = 0.6,
    title = "Race"
  ) +
  tm_layout(
    title = "Police Killings by Selected Races",
    legend.outside = TRUE
  )
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_tm_dots()`: migrate the argument(s) related to the scale of the
## visual variable `fill` namely 'palette' (rename to 'values') to fill.scale =
## tm_scale(<HERE>).[v3->v4] `tm_dots()`: use 'fill' for the fill color of polygons/symbols
## (instead of 'col'), and 'col' for the outlines (instead of 'border.col').[v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.[tm_dots()] Argument `title` unknown.[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`

Plot 5 shows how police killings affect different racial groups with a particular focus on African-American/Black victims, European-American/White and Hispanic/Latino victims with the remaining racial groups categorised as others. The map here (plot 4) shows a clear regional pattern where the Midwest and parts of the west shows higher concentration of European-American/White victims while Hispanic/Latino victims are more frequent in the Southwest notably in Texas, Arizona, and California.

Plot 6

# Convert Age to numeric, handling non-numeric or missing entries
df_plot <- map_data_sel %>%
  mutate(Age = as.numeric(Age)) %>%
  filter(!is.na(Age))

ggplot(df_plot, aes(x = Race2, y = Age, fill = Race2)) +
  geom_boxplot() +
  scale_fill_manual(values = c(
    "African-American/Black" = "red",
    "European-American/White" = "blue",
    "Hispanic/Latino" = "orange",
    "Other" = "gray"
  )) +
  labs(
    title = "Age Distribution of Police Killing Victims by Race",
    x = "Race",
    y = "Age",
    fill = "Race"
  ) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

The boxplot from plot 6 on age distribution shows that across all racial groups, the median age of victims hovers around those in their early 30s which is often age individuals have their most productive years. African-American/Black and Hispanic/Latino victims tend to be slightly younger on average than their White counterparts, suggesting that younger men of color are particularly vulnerable. However, there are instance where victims are over 60 years which falls as outliers but goes on to suggest that police killings does not affect only selected age group or race. Plot 6 generally indicates that Younger men, particularly from marginalized racial groups, are more likely to die in police encounters.