I have data on fatal police shootings that includes details like the officer’s gender, age, mental illness involvement, and the state where the incident occurred. My plan is to first group the data to find which states have the highest number of fatal shootings. Then, I want to analyze the race of those involved and investigate whether mental illness played a role in the incidents. The data source I’m using is the New York Post.
Load the library and the dataset.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("~/DATA110")revise <-read_csv('fatal-police-shootings-data.csv') # download the data
Rows: 9497 Columns: 19
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (13): date, threat_type, flee_status, armed_with, city, county, state, l...
dbl (4): id, latitude, longitude, age
lgl (2): was_mental_illness_related, body_camera
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#install.packages("viridis") # install packages for colorslibrary(viridis)
Loading required package: viridisLite
Make all headers lowercase and remove spaces
names(revise) <-tolower(names(revise)) # lowercase the datanames(revise) <-gsub(" ","",names(revise)) # delete the spaceshead(revise)
# A tibble: 6 × 19
id date threat_type flee_status armed_with city county state latitude
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 3 1/2/2015 point not gun Shelt… Mason WA 47.2
2 4 1/2/2015 point not gun Aloha Washi… OR 45.5
3 5 1/3/2015 move not unarmed Wichi… Sedgw… KS 37.7
4 8 1/4/2015 point not replica San F… San F… CA 37.8
5 9 1/4/2015 point not other Evans Weld CO 40.4
6 11 1/4/2015 attack not gun Guthr… Logan OK 35.9
# ℹ 10 more variables: longitude <dbl>, location_precision <chr>, name <chr>,
# age <dbl>, gender <chr>, race <chr>, race_source <chr>,
# was_mental_illness_related <lgl>, body_camera <lgl>, agency_ids <chr>
Clean the data by states
gr_state <- revise|>select(state, id, threat_type, city, gender, race, was_mental_illness_related, age ) |>arrange(state)head(gr_state) # show the frame
# A tibble: 6 × 8
state id threat_type city gender race was_mental_illness_r…¹ age
<chr> <dbl> <chr> <chr> <chr> <chr> <lgl> <dbl>
1 AK 131 shoot Anchorage male W FALSE 33
2 AK 836 point Fairbanks male N FALSE 19
3 AK 816 shoot Fairbanks male N FALSE 33
4 AK 953 attack Kenai Penin… male W FALSE 49
5 AK 1166 threat Spenard male N TRUE 49
6 AK 1255 point Barrow male N FALSE 36
# ℹ abbreviated name: ¹was_mental_illness_related
Group the data by state
states <- gr_state |>group_by(state) |># group by state summarise(n_dead =n(), .groups ="drop") |>arrange(desc(n_dead)) # arrange rows by scorehead(states)
# A tibble: 6 × 2
state n_dead
<chr> <int>
1 CA 1319
2 TX 895
3 FL 608
4 AZ 426
5 GA 358
6 CO 342
Find the 10 states most affected by Fatal shootings.
top_states <- states |>slice_max(n_dead, n=10) # find the 10 states with the most police fatal shooting.
Plot of the 10 most fatal shootings states by police.
ggplot(top_states, aes(x = state, y = n_dead,fill= state))+geom_col() +scale_fill_viridis_d(option ='magma',name ="states", labels =c("Arizona","California","Colorado","Florida","Georgia","North Carolina","Ohio","Tennessee","Texas" ,"Washington"))+labs(title ='Top 10 States Most Affected by Fatal Police Shootings',x ="States",y ='Number of Deaths',caption ='Washington Posts' ) +theme(legend.key.size =unit(0.1, "cm"), # code from AI , adjuste the legend size legend.spacing.y =unit(0.5, "cm") )+theme_dark() # change the background
# A tibble: 6 × 3
state was_mental_illness_related illness
<chr> <lgl> <int>
1 AZ FALSE 359
2 AZ TRUE 67
3 CA FALSE 1058
4 CA TRUE 261
5 CO FALSE 301
6 CO TRUE 41
name = “top_states”, labels = c(“Arizona”,“California”,“Colorado”,“Florida”, head(mental_health)
Fatal Police Incidents and Mental Health
ggplot(mental_health, aes(x = state, y =illness, fill = was_mental_illness_related)) +geom_col(position ="stack") +# choose the plot stlylelabs(title ="Shooting case related by mental illness ",x ="States",y ="Number of Cases",fill ="Mental Illness Related? ",caption="Washington Post")+theme_light()
Essay
To clean and analyze my data, I first grouped all fatal police shootings by state, then identified the top 10 states with the highest number of incidents. After that, I broke the data down by race to see which racial groups were most affected. I also examined whether the cases were related to mental illness. Something that really surprised me was seeing Colorado in the top 10, even though it’s only the 21st most populated state. That imbalance made me wonder what other factors might be at play. One thing I wish the dataset included is whether or not the police officers involved were judged as guilty or not guilty after the incident. That kind of information would help us better understand accountability in these cases.