police shooting

Introduction

This dataset, sourced from Kaggle— a platform for publicly available datasets—covers police-related gun violence from 2015 to 2022. The objective of this analysis is to explore the geographic regions, neighborhoods, and demographic groups most impacted by police shootings. In particular, I aim to investigate potential differences in the number of incidents involving males versus females, track changes in the frequency of shootings over time, and analyze whether the demographic profile of those affected has shifted during this period. By examining these trends, the goal is to gain deeper insights into the patterns of police violence and its evolving impact on different communities

library(tidyverse)
library(tidyr)
library(leaflet)
library(sf)
library(tigris)
library(lubridate)
library(scales)
library(highcharter)
library(reshape2)
library(viridis)
setwd("C:/Users/eyong/Downloads/police shooting")
police <- read_csv("US Police shootings in from 2015-22.csv")
data(police)

selcting the columns and cleaning the Na’s

police_clean <- police %>%
  # Convert date to proper format and extract year
  mutate(
    date = as.Date(date),
    year = year(date),
    manner_of_death = factor(manner_of_death, levels = c("shot", "shot and Tasered")),
    signs_of_mental_illness = as.logical(signs_of_mental_illness),
    armed = factor(armed)
  ) %>%
  # Select relevant columns
  select(date, year, manner_of_death, city, state, flee, threat_level, body_camera,
         signs_of_mental_illness, armed, longitude, latitude, age, race, gender) %>%
  # Remove any rows with NA in from choosen columns
  filter(
    !is.na(longitude), !is.na(latitude), !is.na(manner_of_death), !is.na(armed),
    !is.na(age), !is.na(race), !is.na(signs_of_mental_illness), !is.na(gender),
    !is.na(flee), !is.na(threat_level)
  )

1. Time Analysis

Incidents by year I want to examin how many police shooting where recored for years 2015 to 2022 ## creating a count for each year

yearly_incidents <- police_clean %>%
  group_by(year) %>%
  summarise(count = n())

Create an interactive Highcharts plot

highchart() %>%
  hc_chart(type = "line") %>%
  hc_title(text = "Police Shooting Incidents by Year (2015-2022)") %>%
  hc_xAxis(categories = yearly_incidents$year) %>%
  hc_yAxis(title = list(text = "Number of Incidents")) %>%
  hc_series(
    list(
      name = "Incidents",
      data = yearly_incidents$count,
      color = "blue",
      marker = list(symbol = "circle", radius = 5, lineColor = "red", lineWidth = 2)
    )
  ) %>%
  hc_tooltip(
    headerFormat = '<b>{point.key}</b><br>',
    pointFormat = '{series.name}: {point.y}'
  )

##. Age Distribution Analysis

ggplot(police_clean, aes(x = age)) +
  geom_histogram(binwidth = 5, fill = "red", color = "black") +
  theme_minimal() +
  labs(title = "Age Distribution of Shooting Victims",
       x = "Age",
       y = "Count")

Aggregate the data by state, gender, and race, including combinations of race and gender

state_incidents <- police_clean %>%
  group_by(state) %>%
  summarise(
    count = n(),  # Total number of incidents per state
    avg_age = mean(age, na.rm = TRUE),  # Average age of individuals involved
    shot_count = sum(manner_of_death == "shot", na.rm = TRUE),  # Count of incidents involving shooting
    taser_count = sum(manner_of_death == "shot and Tasered", na.rm = TRUE),  # Count of incidents involving taser
    mental_illness_count = sum(signs_of_mental_illness == TRUE, na.rm = TRUE),  # Count of incidents with mental illness signs
    
    # Gender-based counts
    gender_F_count = sum(gender == "F", na.rm = TRUE),
    gender_M_count = sum(gender == "M", na.rm = TRUE),
    
    # Race-based counts
    race_W_count = sum(race == "W", na.rm = TRUE),
    race_B_count = sum(race == "B", na.rm = TRUE),
    race_H_count = sum(race == "H", na.rm = TRUE),
    race_A_count = sum(race == "A", na.rm = TRUE),
    race_O_count = sum(race == "O", na.rm = TRUE),
    
    # Gender and race combinations
    race_B_M_count = sum(race == "B" & gender == "M", na.rm = TRUE),  # Black Male
    race_B_F_count = sum(race == "B" & gender == "F", na.rm = TRUE),  # Black Female
    race_W_M_count = sum(race == "W" & gender == "M", na.rm = TRUE),  # White Male
    race_W_F_count = sum(race == "W" & gender == "F", na.rm = TRUE),  # White Female
    race_H_M_count = sum(race == "H" & gender == "M", na.rm = TRUE),  # Hispanic Male
    race_H_F_count = sum(race == "H" & gender == "F", na.rm = TRUE),  # Hispanic Female
    race_A_M_count = sum(race == "A" & gender == "M", na.rm = TRUE),  # Asian Male
    race_A_F_count = sum(race == "A" & gender == "F", na.rm = TRUE),  # Asian Female
    race_O_M_count = sum(race == "O" & gender == "M", na.rm = TRUE),  # Other Male
    race_O_F_count = sum(race == "O" & gender == "F", na.rm = TRUE)   # Other Female
  ) %>%
  ungroup()

# Check the resulting dataframe to confirm the columns
head(state_incidents)

## # A tibble: 6 × 23
##   state count avg_age shot_count taser_count mental_illness_count gender_F_count
##   <chr> <int>   <dbl>      <int>       <int>                <int>          <int>
## 1 AK       31    32.7         30           1                    6              1
## 2 AL       93    40.4         85           8                   23              8
## 3 AR       66    39.9         64           2                    9              0
## 4 AZ      218    35.9        209           9                   46             19
## 5 CA      743    34.9        693          50                  178             39
## 6 CO      180    35.9        176           4                   25              5
## # ℹ 16 more variables: gender_M_count <int>, race_W_count <int>,
## #   race_B_count <int>, race_H_count <int>, race_A_count <int>,
## #   race_O_count <int>, race_B_M_count <int>, race_B_F_count <int>,
## #   race_W_M_count <int>, race_W_F_count <int>, race_H_M_count <int>,
## #   race_H_F_count <int>, race_A_M_count <int>, race_A_F_count <int>,
## #   race_O_M_count <int>, race_O_F_count <int>

Regression analysis

# Multiple linear regression
model_1 <- lm(count ~ avg_age + shot_count + gender_F_count +  mental_illness_count +gender_F_count + gender_M_count + race_W_count + race_B_count+gender_M_count+race_H_count +race_A_count +race_O_count + race_B_M_count+race_B_F_count+race_W_M_count+race_W_F_count+ race_H_M_count+ race_H_F_count+race_A_M_count+race_A_F_count+race_O_M_count+race_O_F_count, data = state_incidents)
## Model for race
model_race <- lm(count ~ race_W_count + race_B_count+gender_M_count+race_H_count +race_A_count +race_O_count + race_B_M_count+race_B_F_count+race_W_M_count+race_W_F_count+ race_H_M_count+ race_H_F_count+race_A_M_count+race_A_F_count+race_O_M_count+race_O_F_count, data = state_incidents)
summary(model_race)

## 
## Call:
## lm(formula = count ~ race_W_count + race_B_count + gender_M_count + 
##     race_H_count + race_A_count + race_O_count + race_B_M_count + 
##     race_B_F_count + race_W_M_count + race_W_F_count + race_H_M_count + 
##     race_H_F_count + race_A_M_count + race_A_F_count + race_O_M_count + 
##     race_O_F_count, data = state_incidents)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59048 -0.13429 -0.01197  0.07828  1.57680 
## 
## Coefficients: (5 not defined because of singularities)
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -0.03503    0.09146  -0.383 0.703833    
## race_W_count    1.03410    0.02914  35.487  < 2e-16 ***
## race_B_count    1.04130    0.06474  16.083  < 2e-16 ***
## gender_M_count  1.07233    0.02971  36.099  < 2e-16 ***
## race_H_count    1.24194    0.09632  12.894 1.21e-15 ***
## race_A_count    0.82543    0.22745   3.629 0.000815 ***
## race_O_count    1.17778    0.46091   2.555 0.014621 *  
## race_B_M_count -1.11676    0.07346 -15.201  < 2e-16 ***
## race_B_F_count       NA         NA      NA       NA    
## race_W_M_count -1.10640    0.04464 -24.787  < 2e-16 ***
## race_W_F_count       NA         NA      NA       NA    
## race_H_M_count -1.32721    0.09697 -13.687  < 2e-16 ***
## race_H_F_count       NA         NA      NA       NA    
## race_A_M_count -0.84620    0.26161  -3.235 0.002483 ** 
## race_A_F_count       NA         NA      NA       NA    
## race_O_M_count -1.28489    0.50288  -2.555 0.014633 *  
## race_O_F_count       NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3449 on 39 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 5.831e+05 on 11 and 39 DF,  p-value: < 2.2e-16

## Model for race for hispanic ,black,white males

model_race_major <- lm(count ~ race_H_M_count + race_W_M_count +race_B_M_count , data = state_incidents)
summary(model_race_major)

## 
## Call:
## lm(formula = count ~ race_H_M_count + race_W_M_count + race_B_M_count, 
##     data = state_incidents)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.302 -2.588 -1.168  2.383 17.120 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1.53205    1.02919   1.489    0.143    
## race_H_M_count  1.17392    0.01912  61.386   <2e-16 ***
## race_W_M_count  1.06970    0.02887  37.052   <2e-16 ***
## race_B_M_count  1.03501    0.03615  28.629   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.45 on 47 degrees of freedom
## Multiple R-squared:  0.9988, Adjusted R-squared:  0.9987 
## F-statistic: 1.283e+04 on 3 and 47 DF,  p-value: < 2.2e-16

Plotting the data and the regression lines

state_incidents_long <- state_incidents %>%
  select(count, race_H_M_count, race_W_M_count, race_B_M_count) %>%
  pivot_longer(cols = starts_with("race"), 
               names_to = "race_category", 
               values_to = "race_count") %>%
  mutate(race_name = case_when(
    race_category == "race_H_M_count" ~ "Hispanic",
    race_category == "race_W_M_count" ~ "White",
    race_category == "race_B_M_count" ~ "Black",
    TRUE ~ "Other"
  ))

# Create the plot with a legend for race names
ggplot(state_incidents_long, aes(x = race_count, y = count, color = race_name)) +
  geom_point(alpha = 0.5) + # Scatter plot for race count vs count
  geom_smooth(method = "lm", se = FALSE) + # Regression line for each race category
  labs(title = "Incident Count vs. Race Category Counts",
       x = "Race Category Count",
       y = "Incident Count",
       color = "Race") + # Legend title
  scale_color_manual(values = c("Hispanic" = "blue", "White" = "red", "Black" = "green")) + # Manually set colors
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

Explantion

As the data shows, regression analysis suggests that Black males have a higher likelihood of being shot

Convert states_shp to a spatial dataframe

states_shp <- states(cb = TRUE) %>%
  st_as_sf()

## Retrieving data for the year 2021

##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |==                                                                    |   4%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |=====                                                                 |   7%  |                                                                              |=====                                                                 |   8%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=========                                                             |  12%  |                                                                              |===========                                                           |  16%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  23%  |                                                                              |================                                                      |  24%  |                                                                              |=================                                                     |  24%  |                                                                              |=================                                                     |  25%  |                                                                              |===================                                                   |  27%  |                                                                              |====================                                                  |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  34%  |                                                                              |=========================                                             |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |===============================                                       |  44%  |                                                                              |================================                                      |  46%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  49%  |                                                                              |=====================================                                 |  54%  |                                                                              |======================================                                |  54%  |                                                                              |======================================                                |  55%  |                                                                              |=======================================                               |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |=========================================                             |  59%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  63%  |                                                                              |=============================================                         |  64%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |=================================================                     |  70%  |                                                                              |==================================================                    |  71%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  77%  |                                                                              |========================================================              |  80%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |===========================================================           |  84%  |                                                                              |================================================================      |  91%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |======================================================================| 100%

# Merge state spatial data with the aggregated incident data
state_geo_data <- states_shp %>%
  left_join(state_incidents, by = c("STUSPS" = "state"))

# Ensure there are no missing values in the merged data
state_geo_data <- state_geo_data %>%
  filter(!is.na(count))

pal <- colorNumeric(
  palette = "Reds",  
  domain = state_geo_data$count, 
  na.color = "gray"  
)

Map

state_geo_data_filtered <- state_geo_data %>%
  filter(count > 0)

leaflet(state_geo_data_filtered) %>%
  addProviderTiles(providers$Esri.WorldStreetMap) %>%
  addPolygons(
    fillColor = ~pal(count),
    weight = 1,
    opacity = 1,
    color = "white",
    dashArray = "3",
    fillOpacity = 0.7,
    popup = ~paste0(
      "<strong>State: </strong>", STUSPS, "<br>",
      "<strong>Incidents: </strong>", count, "<br>",
      "<strong>Average Age: </strong>", round(avg_age, 1), "<br>",
      "<strong>Shooting Incidents: </strong>", shot_count, "<br>",
      "<strong>Taser Incidents: </strong>", taser_count, "<br>",
      "<strong>Incidents with Mental Illness: </strong>", mental_illness_count, "<br>",
      "<strong>Female Incidents: </strong>", gender_F_count, "<br>",
      "<strong>Male Incidents: </strong>", gender_M_count, "<br>",
      "<strong>White Incidents: </strong>", race_W_count, "<br>",
      "<strong>Black Incidents: </strong>", race_B_count, "<br>",
      "<strong>Hispanic Incidents: </strong>", race_H_count, "<br>",
      "<strong>Asian Incidents: </strong>", race_A_count, "<br>",
      "<strong>Other Race Incidents: </strong>", race_O_count, "<br>",
      "<strong>Black Male Incidents: </strong>", race_B_M_count, "<br>",
      "<strong>Black Female Incidents: </strong>", race_B_F_count, "<br>",
      "<strong>White Male Incidents: </strong>", race_W_M_count, "<br>",
      "<strong>White Female Incidents: </strong>", race_W_F_count, "<br>",
      "<strong>Hispanic Male Incidents: </strong>", race_H_M_count, "<br>",
      "<strong>Hispanic Female Incidents: </strong>", race_H_F_count, "<br>",
      "<strong>Asian Male Incidents: </strong>", race_A_M_count, "<br>",
      "<strong>Asian Female Incidents: </strong>", race_A_F_count, "<br>",
      "<strong>Other Male Incidents: </strong>", race_O_M_count, "<br>",
      "<strong>Other Female Incidents: </strong>", race_O_F_count
    )
  ) %>%
  addLegend(
    position = "bottomright",
    pal = pal,
    values = ~count,
    title = "Total Incidents by State",
    opacity = 1
  
  ) %>%
  setView(lng = -98.5795, lat = 39.8283, zoom = 3)  # Center the map on the U.S.

## Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
## Need '+proj=longlat +datum=WGS84'

conclusion

As seen in the map displaying shooting incidents between 2015 and 2022, California and Texas consistently report the highest number of shootings. By interacting with the map, users can explore detailed information, such as the number of victims, the racial demographics of those affected, and the gender breakdown (male vs. female) of those involved in these incidents.

Regression analysis suggests that Black males are statistically more likely to be involved in shootings, yet the map also reveals a higher number of shooting incidents among White males. This disparity can be attributed to several factors, including missing or incomplete data. The reliability of these figures is uncertain because we cannot fully verify the sources or methods used in data collection. Additionally, it’s important to consider that White Americans constitute the largest demographic group in the U.S., which could skew the data when comparing incidents across racial groups.

Despite these fluctuations, there has been a significant decline in shooting incidents from 2015 to 2022, suggesting that certain policy interventions or government actions may be having a positive impact. Alternatively, the reduction in incidents could be partly due to the COVID-19 pandemic, which led to restrictions on movement and social activities, potentially lowering overall violence during that time.

However, this dataset has limitations, with numerous missing values and potential variables that are not accounted for, meaning the data doesn’t provide a complete or fully accurate picture. While we can only work with the available data, I recommend consulting additional sources to gain a better understanding of the broader trends in gun violence.

For instance, a report from the Centers for Disease Control and Prevention (CDC) on gun violence statistics provides a comprehensive look at trends in firearm-related deaths across the United States: CDC - Gun Violence Statistics.

Additionally, the Gun Violence Archive is a valuable resource for up-to-date information on gun-related incidents in the U.S. Their interactive map allows users to explore shooting events by location, date, and other variables: Gun Violence Archive.

To understand the broader policy context, you can explore articles that examine gun violence prevention strategies. The Violence Policy Center has a detailed overview of how different states approach gun laws and their effectiveness in curbing violence: Violence Policy Center - Gun Violence Prevention.

For a more in-depth analysis of the social and psychological factors behind gun violence, consider reading The Atlantic’s article on the rise of mass shootings in the U.S., and how policy changes might be able to address the root causes: The Atlantic - Mass Shootings in America.

Finally, for insight into the challenges of analyzing gun violence data, an article from Nature highlights issues related to data gaps and how these affect our understanding of public health crises like gun violence: Nature - Data and Statistics on Gun Violence.

Challenges I Encountered

One of the main challenges I faced with this dataset was the presence of many categorical variables that needed to be converted into numeric values for analysis. This required careful handling to ensure accurate data processing and representation. Additionally, while performing linear regression analysis, it wasn’t immediately clear which groups had the highest likelihood of being affected by gun violence. However, after plotting the data, it became evident that Black males were disproportionately represented in the results. The visualization of the data allowed me to observe trends and make more informed conclusions about the factors influencing shooting incidents.