Source: Scientific American

Introduction

  1. Police violence is a significant issue today, particularly concerning its impact on marginalized communities. The dataset used in this project was obtained from Mapping Police Violence, a database that tracks incidents of police killings in the United States. The dataset includes various variables such as the victim’s name, age, cause of death, race, gender, date of the incident, encounter type, and more.

There was a lot of cleaning done to create my visualizations. The cleaning process began with Excel files, which were converted to CSV for easier handling. Since the variables I wanted were spread across files, they were merged, focusing on aligning them by state. Then, I removed any missing values and converted ages to numbers for a scatter plot. For a ridge plot showing age distribution, I grouped ages into categories like teens (ages 13 to 19). Similarly, for a horizontal bar graph, I replaced missing values and filtered the data to focus on specific columns. Next, for an area chart, I grouped data by year and type. Finally, for a stacked bar graph, I replaced missing values and filtered the data accordingly.

The website features a section detailing their data collection process. Here, they explain how their dataset is sourced from official police use of force data collection programs in states such as California, Texas, and Virginia. They supplement this with nationwide data from crowdsourced databases like The Gun Violence Archive and the Fatal Encounters database. Additionally, they conducted comprehensive research, scouring social media, obituaries, criminal records databases, police reports, and other sources to ascertain the race of 90 percent of all victims in their database.I chose this topic and dataset because police violence is a pressing social issue that crosses with systemic racism and injustice. It’s important to talk about these issues because they affect real people’s lives and have long-lasting consequences.

# Load required libraries


library(tidyverse)
library(highcharter)
library(lubridate)
library(leaflet)
library(ggridges)
library(ggthemes)
library(htmlwidgets)
library(plotly)

# Set working directory
setwd("/Users/rebeccambaho/Downloads/project 3")
# Load datasets
Policekillings <- read_csv("Policekillings.csv")
MergedDataset <- read_csv("MergedDataset.csv")

Base plot

# Clean and wrangle data
cleaned_data <- MergedDataset %>%
  filter(!is.na(`Victim's age`), !is.na(`# Black people killed`)) %>%
  mutate(`Victim's age` = as.numeric(ifelse(is.na(`Victim's age`), NA, `Victim's age`))) %>%
  filter(!is.na(`Victim's age`), `Victim's age` > 0) 
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Victim's age = as.numeric(ifelse(is.na(`Victim's age`), NA,
##   `Victim's age`))`.
## Caused by warning:
## ! NAs introduced by coercion
# Scatter plot with theme_solarized
scatter_plot <- ggplot(cleaned_data, aes(x = `Victim's age`, y = `# Black people killed`)) +
  geom_point(color = "darkorange") +
  labs(title = "Police Killings Among Black People by Age",
       x = "Victim's Age",
       y = "Number of Black People Killed",
       caption = "Source: Mapping police voilence") +  
  theme_solarized() +  
  scale_colour_solarized('brown')  

scatter_plot

The scatterplot shows how the age of victims relates to the number of black people killed by police. Each dot represents one incident, with the victim’s age on the horizontal axis and the number of black people killed on the vertical axis. The graph results are quite interesting because they display a clear trend. It seems like there is a straight-line relationship between the variables.

# Linear regression
model <- lm(`# Black people killed` ~ `Victim's age`, data = cleaned_data)
summary(model)
## 
## Call:
## lm(formula = `# Black people killed` ~ `Victim's age`, data = cleaned_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -141.09  -86.92  -28.23  130.07  167.49 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    145.01547    2.80079  51.777  < 2e-16 ***
## `Victim's age`  -0.23120    0.07124  -3.245  0.00118 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 102.2 on 12151 degrees of freedom
## Multiple R-squared:  0.000866,   Adjusted R-squared:  0.0007838 
## F-statistic: 10.53 on 1 and 12151 DF,  p-value: 0.001176
# Diagnostic plots
plot(model)

The linear regression analysis tells us that there’s a connection between the age of victims and the number of black people killed by police. Specifically, as victims get older, the number of black people killed tends to decrease. For every year increase in age, there’s about a quarter-person fewer being killed. However, this model isn’t great at explaining why police killings happen among black people, as it only explains a tiny bit of the differences we see in the data. While the relationship between age and killings appears real (because of the low p-value), age alone doesn’t tell us much about why police killings happen. There are likely other important factors that aren’t included in this analysis.

Plot 1: Ridge plot

# Data Cleaning and Manipulation
MergedDataset_Clean <- MergedDataset %>%
  filter(!is.na(`Victim's age`) & !is.na(`# Black people killed`)) %>%
  select(`Victim's age`, `# Black people killed`)
# Print the cleaned dataset to check for missing values
print(MergedDataset_Clean)
## # A tibble: 12,704 × 2
##    `Victim's age` `# Black people killed`
##    <chr>                            <dbl>
##  1 45                                   4
##  2 23                                   4
##  3 27                                   4
##  4 35                                   4
##  5 28                                   4
##  6 21                                   4
##  7 27                                   4
##  8 40                                   4
##  9 28                                   4
## 10 23                                   4
## # ℹ 12,694 more rows
categorize_age <- function(age) {
  if (is.na(age)) {
    return("Unknown")
  } else if (age < 1) {
    return("Babies")
  } else if (age >= 1 & age <= 12) {
    return("Children")
  } else if (age >= 13 & age <= 19) {
    return("Teen")
  } else if (age >= 20 & age <= 39) {
    return("Adult")
  } else if (age >= 40 & age <= 59) {
    return("Middle Age")
  } else if (age >= 60) {
    return("Senior")
  } else {
    return("Unknown")
  }
}
# Apply age categorization and remove Unknown category
MergedDataset_Clean <- MergedDataset_Clean %>%
  mutate(Age_Group = sapply(`Victim's age`, categorize_age)) %>%
  filter(!is.na(Age_Group))
# create the ridge plot
ggplot(MergedDataset_Clean, aes(x = `# Black people killed`, y = Age_Group, fill = Age_Group)) +
  geom_density_ridges(alpha = 0.6) + 
  theme_stata() +  # Add theme_stata
  theme(legend.position = "none", 
        axis.title.y = element_text(hjust = 0.5), 
        axis.title.x = element_text(hjust = 0.5),
        axis.text.y = element_text(angle = 0, hjust = 1),
        plot.caption = element_text(size = 8, hjust = 1)) + 
  labs(title = "Distribution of Black People Killed by Age Group",
       caption = "Source: Mapping Police Violence") +  
  scale_fill_brewer(palette = "YlGnBu")
## Picking joint bandwidth of 30.1

The graph illustrates the density distribution of black people killed across various age groups. It shows how the number of fatalities varies among different age categories. We can see that children seemed to be the highest.

Plot 2: Horizontal Bar Graph

Link to plot: https://public.tableau.com/views/Horizontalbargraphofpolicekillingsbyrace/HorizontalBarGraph?:language=en-US&:sid=&:display_count=n&:origin=viz_share_link

This Tableau visualization depicts the number of police killings categorized by race and gender, presented as a horizontal double bar chart. Each racial group is depicted along the vertical axis, with separate bars for different sexes. The length of each bar corresponds to the count of police killings, facilitating a straightforward comparison between different racial groups and genders. From the graph, we gather several insights. Firstly, more men were affected than any other sex. Particularly, white men experienced the highest rate of victimization compared to any other racial group, which was surprising.

Plot 3: Area Chart

# Define main encounter types
main_encounter_types <- c("Violent Crime", "Non-Violent Offense", "Domestic Disturbance", "Person with a Weapon", "Traffic Stop", "Mental Health/Welfare Check", "Other Crimes Against People", "None/Unknown")
# Data Cleaning and Manipulation
Policekillings_clean <- Policekillings %>%
  filter(!is.na(`Date of Incident (month/day/year)`), 
         !is.na(`Encounter Type`)) %>%
  mutate(Date_of_Incident = lubridate::mdy(`Date of Incident (month/day/year)`),
         Year = lubridate::year(Date_of_Incident),
         Encounter_Type = stringr::str_extract(`Encounter Type`, 
                                                paste(main_encounter_types, collapse = "|"))) %>%
  filter(!is.na(Encounter_Type)) %>%
  count(Year, Encounter_Type, name = "Police_Killings_Count", .groups = "drop")
# Aggregate data by year and encounter type
Policekillings_aggregated <- Policekillings_clean %>%
  group_by(Year, Encounter_Type) %>%
  summarise(Police_Killings_Count = sum(Police_Killings_Count), .groups = "drop")
# Convert encounter type to factor to maintain order
Policekillings_aggregated$Encounter_Type <- factor(Policekillings_aggregated$Encounter_Type,
                                                   levels = main_encounter_types)
# Create the Highcharter chart with a source caption
hc <- highchart() %>%
  hc_chart(type = "area", marginTop = 80) %>%  # Increase marginTop to create space for the title
  hc_title(text = "Police Killings Over Years by Encounter Type") %>%
  hc_xAxis(categories = unique(Policekillings_aggregated$Year)) %>%
  hc_yAxis(title = list(text = "Number of Police Killings")) %>%
  hc_plotOptions(area = list(stacking = "normal", lineColor = "#ffffff",
                             lineWidth = 0.2)) %>%
  hc_add_series(data = Policekillings_aggregated,
                type = "area",
                hcaes(x = Year, y = Police_Killings_Count, 
                      group = Encounter_Type,
                      name = Encounter_Type),  
                showInLegend = TRUE,
                shape = "square") %>%  
  hc_legend(align = "right", verticalAlign = "top", layout = "vertical", y = 50) %>%  # Adjust legend position
  hc_colors(c("rgb(141,211,199)", "rgb(255,255,179)", "rgb(190,186,218)", 
              "rgb(251,128,114)", "rgb(128,177,211)", "rgb(253,180,98)", 
              "rgb(179,222,105)", "rgb(252,205,229)")) %>%
  hc_add_theme(hc_theme_ffx()) %>%  # Add FFX theme
  hc_credits(enabled = TRUE, text = "Source: Mapping police violence", position = list(align = "right", x = -10))

# Print the chart
hc

The graph illustrates the fluctuation in the number of police killings across various encounter types over the years. Each color represents a distinct encounter type, such as Violent Crime or Traffic Stop. By analyzing the chart, we gain valuable insights into trends in law enforcement incidents. It is evident that encounters involving violent crime were the most prevalent. Additionally, 2023 stands out as a peak year for encounters, although other years also exhibit notable peaks.

Plot 4: Stacked Bar Graph

# Data Preprocessing
MergedDataset <- MergedDataset %>%
  mutate(Race = ifelse(is.na(`Victim's race`), "Unknown", `Victim's race`),
         Fleeing = ifelse(is.na(`Fleeing (Source: WaPo and Review of Cases Not Included in WaPo Database)`), "Unknown", 
                          `Fleeing (Source: WaPo and Review of Cases Not Included in WaPo Database)`),
         Fleeing = ifelse(Fleeing %in% c("Foot", "Car"), "Fleeing", Fleeing))
# Filter the dataset based on race categories and fleeing statuses
MergedDataset <- MergedDataset %>%
  filter(Race %in% c("White", "Black", "Hispanic", "Asian", "Pacific Islander", "Unknown Race", "Native American"),
         Fleeing %in% c("Not Fleeing", "Fleeing"))

#  Calculate the counts for each combination
counts <- MergedDataset %>%
  group_by(Race, Fleeing) %>%
  summarise(count = n(), .groups = 'drop') # explicitly setting .groups to 'drop'
# Check the structure of the counts data frame
str(counts)
## tibble [12 × 3] (S3: tbl_df/tbl/data.frame)
##  $ Race   : chr [1:12] "Asian" "Asian" "Black" "Black" ...
##  $ Fleeing: chr [1:12] "Fleeing" "Not Fleeing" "Fleeing" "Not Fleeing" ...
##  $ count  : int [1:12] 38 135 942 1420 598 1107 47 91 17 23 ...
# Pivot the data for highcharter
pivot_data <- counts %>%
  pivot_wider(names_from = Fleeing,
              values_from = count,
              values_fill = 0)
# Create the stacked bar chart
plot1 <- highchart() |> 
  hc_chart(type = "bar") |> 
  hc_title(text = "Police Killing Victims: Armed Status by Race and Fleeing Type", style = list(color = "white")) |> 
  hc_xAxis(categories = pivot_data$Race, labels = list(style = list(color = "white"))) |> 
  hc_yAxis(title = list(text = "Count", style = list(color = "white")), labels = list(style = list(color = "white"))) |> 
  hc_legend(enabled = TRUE, itemStyle = list(color = "white")) |> 
  hc_plotOptions(series = list(stacking = "normal")) |> 
  hc_add_series(name = "Not Fleeing", data = pivot_data$`Not Fleeing`, color = "blue") |> 
  hc_add_series(name = "Fleeing", data = pivot_data$Fleeing, color = "aliceblue") |> 
  hc_add_theme(hc_theme_monokai()) |>  # Apply the monokai theme
  hc_tooltip(shared = TRUE, borderColor = "black", style = list(color = "white"), backgroundColor = "black") |> 
  hc_caption(text = "Source: Mapping police voilence", align = "right", style = list(color = "white"))  

# Display the chart
plot1

This stacked bar graph shows the number of police killing victims based on their race and whether they were fleeing or not. The graph reveals that the number of victims who were not fleeing (depicted in blue) surpasses those who were fleeing (depicted in light blue)

Plot 5: Map

# Calculate total number of victims per state
data <- read_csv("MergedDataset.csv")
state_victims <- aggregate(`Victim's name` ~ State, data, FUN = length)
# Create a color palette
palette <- colorNumeric(palette = "Reds", domain = state_victims$`Victim's name`)
leaflet(data) %>%
  addTiles() %>%
  addCircleMarkers(
    ~Longitude, ~Latitude,
    radius = 5,
    color = ~palette(state_victims$`Victim's name`[match(State, state_victims$State)]),
    fillOpacity = 0.7,
    weight = 1,
    popup = ~paste("State: ", State, "<br>",
                   "Total Victims: ", state_victims$`Victim's name`[match(State, state_victims$State)], "<br>",
                   "Victim's name: ", `Victim's name`, "<br>",
                   "Victim's age: ", `Victim's age`, "<br>",
                   "Date of Incident: ", `Date of Incident (month/day/year)`, "<br>",
                   "Cause of death: ", `Cause of death`, "<br>",
                   "Media description: ", `Media description of the circumstances surrounding the death`)
  ) %>%
  addLegend("bottomright", pal = palette, values = state_victims$`Victim's name`, 
            title = "Total Victims", opacity = 1) %>%
  addControl(
    position = "topleft",
    html = "<h3>Map of Police Killings across America</h3>"
  ) %>%
  addControl(
    position = "bottomleft",
    html = "<p>Source: Mapping police violence</p>"
  )
## Warning in validateCoords(lng, lat, funcName): Data contains 158836 rows with
## either missing or invalid lat/lon values and will be ignored

The visualization depicts incidents of police violence in the United States, with each point representing a victim. Hovering over a point reveals details such as the victim’s age and cause of death. The graph indicates a higher concentration of victims in the eastern regions, with comparatively fewer incidents recorded in the western and central plains. But larger amount of victims happen to be on the west.

Summary Essay

  1. The ACLU found that police violence disproportionately affects communities of color, especially Black and Indigenous people. Black individuals are more likely to be killed by police, even when unarmed, and many of these incidents involve minor issues or mental health crises. This highlights the need for different approaches to public safety and crisis response. This pattern of discrimination extends to schools, where police often target Black, Brown, Indigenous, and disabled students. The Department of Education’s delay in releasing recent data on this issue makes it harder for communities to push for changes like reducing police presence and increasing mental health support in schools. It’s crucial to act quickly to ensure fairness, transparency, and better policies that prioritize the well-being of all students (Resendes, 2023).

  2. I found it intriguing to observe that white individuals experienced the highest victimization rate compared to other racial groups in the bar plot visualization. The results from the map were also fascinating, as they revealed a concentration of activities on the eastern side but more severe damage on the western side. I wanted to include several aspects in the analysis, like a packed bubble chart using R or a streamgraph. Unfortunately, I faced technical issues with R, and the streamgraph package wasn’t compatible with my R setup, preventing me from creating this visualization.

References

Resendes, W. (2023, February 27). Police in schools continue to target black, Brown, and indigenous students with disabilities. the Trump administration has data that’s likely to prove it.: ACLU. American Civil Liberties Union. https://www.aclu.org/news/criminal-law-reform/police-in-schools-continue-to-target-black-brown-and-indigenous-students-with-disabilities-the-trump-administration-has-data-thats-likely-to-prove-it