This dataset, sourced from Kaggle— a platform for publicly available datasets—covers police-related gun violence from 2015 to 2022. The objective of this analysis is to explore the geographic regions, neighborhoods, and demographic groups most impacted by police shootings. In particular, I aim to investigate potential differences in the number of incidents involving males versus females, track changes in the frequency of shootings over time, and analyze whether the demographic profile of those affected has shifted during this period. By examining these trends, the goal is to gain deeper insights into the patterns of police violence and its evolving impact on different communities
library(tidyverse)
library(tidyr)
library(leaflet)
library(sf)
library(tigris)
library(lubridate)
library(scales)
library(highcharter)
library(reshape2)
library(viridis)
setwd("C:/Users/eyong/Downloads/police shooting")
police <- read_csv("US Police shootings in from 2015-22.csv")
data(police)
police_clean <- police %>%
# Convert date to proper format and extract year
mutate(
date = as.Date(date),
year = year(date),
manner_of_death = factor(manner_of_death, levels = c("shot", "shot and Tasered")),
signs_of_mental_illness = as.logical(signs_of_mental_illness),
armed = factor(armed)
) %>%
# Select relevant columns
select(date, year, manner_of_death, city, state, flee, threat_level, body_camera,
signs_of_mental_illness, armed, longitude, latitude, age, race, gender) %>%
# Remove any rows with NA in from choosen columns
filter(
!is.na(longitude), !is.na(latitude), !is.na(manner_of_death), !is.na(armed),
!is.na(age), !is.na(race), !is.na(signs_of_mental_illness), !is.na(gender),
!is.na(flee), !is.na(threat_level)
)
Incidents by year I want to examin how many police shooting where recored for years 2015 to 2022 ## creating a count for each year
yearly_incidents <- police_clean %>%
group_by(year) %>%
summarise(count = n())
highchart() %>%
hc_chart(type = "line") %>%
hc_title(text = "Police Shooting Incidents by Year (2015-2022)") %>%
hc_xAxis(categories = yearly_incidents$year) %>%
hc_yAxis(title = list(text = "Number of Incidents")) %>%
hc_series(
list(
name = "Incidents",
data = yearly_incidents$count,
color = "blue",
marker = list(symbol = "circle", radius = 5, lineColor = "red", lineWidth = 2)
)
) %>%
hc_tooltip(
headerFormat = '<b>{point.key}</b><br>',
pointFormat = '{series.name}: {point.y}'
)
##. Age Distribution Analysis
ggplot(police_clean, aes(x = age)) +
geom_histogram(binwidth = 5, fill = "red", color = "black") +
theme_minimal() +
labs(title = "Age Distribution of Shooting Victims",
x = "Age",
y = "Count")
state_incidents <- police_clean %>%
group_by(state) %>%
summarise(
count = n(), # Total number of incidents per state
avg_age = mean(age, na.rm = TRUE), # Average age of individuals involved
shot_count = sum(manner_of_death == "shot", na.rm = TRUE), # Count of incidents involving shooting
taser_count = sum(manner_of_death == "shot and Tasered", na.rm = TRUE), # Count of incidents involving taser
mental_illness_count = sum(signs_of_mental_illness == TRUE, na.rm = TRUE), # Count of incidents with mental illness signs
# Gender-based counts
gender_F_count = sum(gender == "F", na.rm = TRUE),
gender_M_count = sum(gender == "M", na.rm = TRUE),
# Race-based counts
race_W_count = sum(race == "W", na.rm = TRUE),
race_B_count = sum(race == "B", na.rm = TRUE),
race_H_count = sum(race == "H", na.rm = TRUE),
race_A_count = sum(race == "A", na.rm = TRUE),
race_O_count = sum(race == "O", na.rm = TRUE),
# Gender and race combinations
race_B_M_count = sum(race == "B" & gender == "M", na.rm = TRUE), # Black Male
race_B_F_count = sum(race == "B" & gender == "F", na.rm = TRUE), # Black Female
race_W_M_count = sum(race == "W" & gender == "M", na.rm = TRUE), # White Male
race_W_F_count = sum(race == "W" & gender == "F", na.rm = TRUE), # White Female
race_H_M_count = sum(race == "H" & gender == "M", na.rm = TRUE), # Hispanic Male
race_H_F_count = sum(race == "H" & gender == "F", na.rm = TRUE), # Hispanic Female
race_A_M_count = sum(race == "A" & gender == "M", na.rm = TRUE), # Asian Male
race_A_F_count = sum(race == "A" & gender == "F", na.rm = TRUE), # Asian Female
race_O_M_count = sum(race == "O" & gender == "M", na.rm = TRUE), # Other Male
race_O_F_count = sum(race == "O" & gender == "F", na.rm = TRUE) # Other Female
) %>%
ungroup()
# Check the resulting dataframe to confirm the columns
head(state_incidents)
## # A tibble: 6 × 23
## state count avg_age shot_count taser_count mental_illness_count gender_F_count
## <chr> <int> <dbl> <int> <int> <int> <int>
## 1 AK 31 32.7 30 1 6 1
## 2 AL 93 40.4 85 8 23 8
## 3 AR 66 39.9 64 2 9 0
## 4 AZ 218 35.9 209 9 46 19
## 5 CA 743 34.9 693 50 178 39
## 6 CO 180 35.9 176 4 25 5
## # ℹ 16 more variables: gender_M_count <int>, race_W_count <int>,
## # race_B_count <int>, race_H_count <int>, race_A_count <int>,
## # race_O_count <int>, race_B_M_count <int>, race_B_F_count <int>,
## # race_W_M_count <int>, race_W_F_count <int>, race_H_M_count <int>,
## # race_H_F_count <int>, race_A_M_count <int>, race_A_F_count <int>,
## # race_O_M_count <int>, race_O_F_count <int>
# Multiple linear regression
model_1 <- lm(count ~ avg_age + shot_count + gender_F_count + mental_illness_count +gender_F_count + gender_M_count + race_W_count + race_B_count+gender_M_count+race_H_count +race_A_count +race_O_count + race_B_M_count+race_B_F_count+race_W_M_count+race_W_F_count+ race_H_M_count+ race_H_F_count+race_A_M_count+race_A_F_count+race_O_M_count+race_O_F_count, data = state_incidents)
## Model for race
model_race <- lm(count ~ race_W_count + race_B_count+gender_M_count+race_H_count +race_A_count +race_O_count + race_B_M_count+race_B_F_count+race_W_M_count+race_W_F_count+ race_H_M_count+ race_H_F_count+race_A_M_count+race_A_F_count+race_O_M_count+race_O_F_count, data = state_incidents)
summary(model_race)
##
## Call:
## lm(formula = count ~ race_W_count + race_B_count + gender_M_count +
## race_H_count + race_A_count + race_O_count + race_B_M_count +
## race_B_F_count + race_W_M_count + race_W_F_count + race_H_M_count +
## race_H_F_count + race_A_M_count + race_A_F_count + race_O_M_count +
## race_O_F_count, data = state_incidents)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.59048 -0.13429 -0.01197 0.07828 1.57680
##
## Coefficients: (5 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03503 0.09146 -0.383 0.703833
## race_W_count 1.03410 0.02914 35.487 < 2e-16 ***
## race_B_count 1.04130 0.06474 16.083 < 2e-16 ***
## gender_M_count 1.07233 0.02971 36.099 < 2e-16 ***
## race_H_count 1.24194 0.09632 12.894 1.21e-15 ***
## race_A_count 0.82543 0.22745 3.629 0.000815 ***
## race_O_count 1.17778 0.46091 2.555 0.014621 *
## race_B_M_count -1.11676 0.07346 -15.201 < 2e-16 ***
## race_B_F_count NA NA NA NA
## race_W_M_count -1.10640 0.04464 -24.787 < 2e-16 ***
## race_W_F_count NA NA NA NA
## race_H_M_count -1.32721 0.09697 -13.687 < 2e-16 ***
## race_H_F_count NA NA NA NA
## race_A_M_count -0.84620 0.26161 -3.235 0.002483 **
## race_A_F_count NA NA NA NA
## race_O_M_count -1.28489 0.50288 -2.555 0.014633 *
## race_O_F_count NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3449 on 39 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 5.831e+05 on 11 and 39 DF, p-value: < 2.2e-16
## Model for race for hispanic ,black,white males
model_race_major <- lm(count ~ race_H_M_count + race_W_M_count +race_B_M_count , data = state_incidents)
summary(model_race_major)
##
## Call:
## lm(formula = count ~ race_H_M_count + race_W_M_count + race_B_M_count,
## data = state_incidents)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.302 -2.588 -1.168 2.383 17.120
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.53205 1.02919 1.489 0.143
## race_H_M_count 1.17392 0.01912 61.386 <2e-16 ***
## race_W_M_count 1.06970 0.02887 37.052 <2e-16 ***
## race_B_M_count 1.03501 0.03615 28.629 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.45 on 47 degrees of freedom
## Multiple R-squared: 0.9988, Adjusted R-squared: 0.9987
## F-statistic: 1.283e+04 on 3 and 47 DF, p-value: < 2.2e-16
state_incidents_long <- state_incidents %>%
select(count, race_H_M_count, race_W_M_count, race_B_M_count) %>%
pivot_longer(cols = starts_with("race"),
names_to = "race_category",
values_to = "race_count") %>%
mutate(race_name = case_when(
race_category == "race_H_M_count" ~ "Hispanic",
race_category == "race_W_M_count" ~ "White",
race_category == "race_B_M_count" ~ "Black",
TRUE ~ "Other"
))
# Create the plot with a legend for race names
ggplot(state_incidents_long, aes(x = race_count, y = count, color = race_name)) +
geom_point(alpha = 0.5) + # Scatter plot for race count vs count
geom_smooth(method = "lm", se = FALSE) + # Regression line for each race category
labs(title = "Incident Count vs. Race Category Counts",
x = "Race Category Count",
y = "Incident Count",
color = "Race") + # Legend title
scale_color_manual(values = c("Hispanic" = "blue", "White" = "red", "Black" = "green")) + # Manually set colors
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
As the data shows, regression analysis suggests that Black males have a higher likelihood of being shot
states_shp <- states(cb = TRUE) %>%
st_as_sf()
## Retrieving data for the year 2021
## | | | 0% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |== | 4% | |=== | 4% | |=== | 5% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |========= | 12% | |=========== | 16% | |============= | 19% | |============== | 20% | |=============== | 21% | |=============== | 22% | |================ | 23% | |================ | 24% | |================= | 24% | |================= | 25% | |=================== | 27% | |==================== | 28% | |==================== | 29% | |===================== | 30% | |======================= | 33% | |======================== | 34% | |========================= | 35% | |========================= | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 38% | |=========================== | 39% | |============================= | 41% | |============================= | 42% | |=============================== | 44% | |================================ | 46% | |================================== | 48% | |================================== | 49% | |=================================== | 49% | |===================================== | 54% | |====================================== | 54% | |====================================== | 55% | |======================================= | 55% | |======================================= | 56% | |========================================= | 59% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 63% | |============================================= | 64% | |============================================== | 66% | |=============================================== | 67% | |================================================= | 70% | |================================================== | 71% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 77% | |======================================================== | 80% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 82% | |========================================================== | 83% | |=========================================================== | 84% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 93% | |======================================================================| 100%
# Merge state spatial data with the aggregated incident data
state_geo_data <- states_shp %>%
left_join(state_incidents, by = c("STUSPS" = "state"))
# Ensure there are no missing values in the merged data
state_geo_data <- state_geo_data %>%
filter(!is.na(count))
pal <- colorNumeric(
palette = "Reds",
domain = state_geo_data$count,
na.color = "gray"
)
state_geo_data_filtered <- state_geo_data %>%
filter(count > 0)
leaflet(state_geo_data_filtered) %>%
addProviderTiles(providers$Esri.WorldStreetMap) %>%
addPolygons(
fillColor = ~pal(count),
weight = 1,
opacity = 1,
color = "white",
dashArray = "3",
fillOpacity = 0.7,
popup = ~paste0(
"<strong>State: </strong>", STUSPS, "<br>",
"<strong>Incidents: </strong>", count, "<br>",
"<strong>Average Age: </strong>", round(avg_age, 1), "<br>",
"<strong>Shooting Incidents: </strong>", shot_count, "<br>",
"<strong>Taser Incidents: </strong>", taser_count, "<br>",
"<strong>Incidents with Mental Illness: </strong>", mental_illness_count, "<br>",
"<strong>Female Incidents: </strong>", gender_F_count, "<br>",
"<strong>Male Incidents: </strong>", gender_M_count, "<br>",
"<strong>White Incidents: </strong>", race_W_count, "<br>",
"<strong>Black Incidents: </strong>", race_B_count, "<br>",
"<strong>Hispanic Incidents: </strong>", race_H_count, "<br>",
"<strong>Asian Incidents: </strong>", race_A_count, "<br>",
"<strong>Other Race Incidents: </strong>", race_O_count, "<br>",
"<strong>Black Male Incidents: </strong>", race_B_M_count, "<br>",
"<strong>Black Female Incidents: </strong>", race_B_F_count, "<br>",
"<strong>White Male Incidents: </strong>", race_W_M_count, "<br>",
"<strong>White Female Incidents: </strong>", race_W_F_count, "<br>",
"<strong>Hispanic Male Incidents: </strong>", race_H_M_count, "<br>",
"<strong>Hispanic Female Incidents: </strong>", race_H_F_count, "<br>",
"<strong>Asian Male Incidents: </strong>", race_A_M_count, "<br>",
"<strong>Asian Female Incidents: </strong>", race_A_F_count, "<br>",
"<strong>Other Male Incidents: </strong>", race_O_M_count, "<br>",
"<strong>Other Female Incidents: </strong>", race_O_F_count
)
) %>%
addLegend(
position = "bottomright",
pal = pal,
values = ~count,
title = "Total Incidents by State",
opacity = 1
) %>%
setView(lng = -98.5795, lat = 39.8283, zoom = 3) # Center the map on the U.S.
## Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
## Need '+proj=longlat +datum=WGS84'
As seen in the map displaying shooting incidents between 2015 and 2022, California and Texas consistently report the highest number of shootings. By interacting with the map, users can explore detailed information, such as the number of victims, the racial demographics of those affected, and the gender breakdown (male vs. female) of those involved in these incidents.
Regression analysis suggests that Black males are statistically more likely to be involved in shootings, yet the map also reveals a higher number of shooting incidents among White males. This disparity can be attributed to several factors, including missing or incomplete data. The reliability of these figures is uncertain because we cannot fully verify the sources or methods used in data collection. Additionally, it’s important to consider that White Americans constitute the largest demographic group in the U.S., which could skew the data when comparing incidents across racial groups.
Despite these fluctuations, there has been a significant decline in shooting incidents from 2015 to 2022, suggesting that certain policy interventions or government actions may be having a positive impact. Alternatively, the reduction in incidents could be partly due to the COVID-19 pandemic, which led to restrictions on movement and social activities, potentially lowering overall violence during that time.
However, this dataset has limitations, with numerous missing values and potential variables that are not accounted for, meaning the data doesn’t provide a complete or fully accurate picture. While we can only work with the available data, I recommend consulting additional sources to gain a better understanding of the broader trends in gun violence.
For instance, a report from the Centers for Disease Control and Prevention (CDC) on gun violence statistics provides a comprehensive look at trends in firearm-related deaths across the United States: CDC - Gun Violence Statistics.
Additionally, the Gun Violence Archive is a valuable resource for up-to-date information on gun-related incidents in the U.S. Their interactive map allows users to explore shooting events by location, date, and other variables: Gun Violence Archive.
To understand the broader policy context, you can explore articles that examine gun violence prevention strategies. The Violence Policy Center has a detailed overview of how different states approach gun laws and their effectiveness in curbing violence: Violence Policy Center - Gun Violence Prevention.
For a more in-depth analysis of the social and psychological factors behind gun violence, consider reading The Atlantic’s article on the rise of mass shootings in the U.S., and how policy changes might be able to address the root causes: The Atlantic - Mass Shootings in America.
Finally, for insight into the challenges of analyzing gun violence data, an article from Nature highlights issues related to data gaps and how these affect our understanding of public health crises like gun violence: Nature - Data and Statistics on Gun Violence.
One of the main challenges I faced with this dataset was the presence of many categorical variables that needed to be converted into numeric values for analysis. This required careful handling to ensure accurate data processing and representation. Additionally, while performing linear regression analysis, it wasn’t immediately clear which groups had the highest likelihood of being affected by gun violence. However, after plotting the data, it became evident that Black males were disproportionately represented in the results. The visualization of the data allowed me to observe trends and make more informed conclusions about the factors influencing shooting incidents.