Project 1

Author

Djeneba kounta

Image

Description

I have data on fatal police shootings that includes details like the officer’s gender, age, mental illness involvement, and the state where the incident occurred. My plan is to first group the data to find which states have the highest number of fatal shootings. Then, I want to analyze the race of those involved and investigate whether mental illness played a role in the incidents. The data source I’m using is the New York Post.

Load the library and the dataset.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("~/DATA110")
revise <- read_csv('fatal-police-shootings-data.csv') # download the data
Rows: 9497 Columns: 19
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (13): date, threat_type, flee_status, armed_with, city, county, state, l...
dbl  (4): id, latitude, longitude, age
lgl  (2): was_mental_illness_related, body_camera

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#install.packages("viridis") # install packages for colors
library(viridis)
Loading required package: viridisLite

Make all headers lowercase and remove spaces

names(revise) <- tolower(names(revise)) # lowercase the data
names(revise) <- gsub(" ","",names(revise)) # delete the spaces
head(revise)
# A tibble: 6 × 19
     id date     threat_type flee_status armed_with city   county state latitude
  <dbl> <chr>    <chr>       <chr>       <chr>      <chr>  <chr>  <chr>    <dbl>
1     3 1/2/2015 point       not         gun        Shelt… Mason  WA        47.2
2     4 1/2/2015 point       not         gun        Aloha  Washi… OR        45.5
3     5 1/3/2015 move        not         unarmed    Wichi… Sedgw… KS        37.7
4     8 1/4/2015 point       not         replica    San F… San F… CA        37.8
5     9 1/4/2015 point       not         other      Evans  Weld   CO        40.4
6    11 1/4/2015 attack      not         gun        Guthr… Logan  OK        35.9
# ℹ 10 more variables: longitude <dbl>, location_precision <chr>, name <chr>,
#   age <dbl>, gender <chr>, race <chr>, race_source <chr>,
#   was_mental_illness_related <lgl>, body_camera <lgl>, agency_ids <chr>

Clean the data by states

gr_state <- revise|>
  select(state, id, threat_type, city, gender, race, was_mental_illness_related, age ) |>
  arrange(state)

head(gr_state) # show the frame 
# A tibble: 6 × 8
  state    id threat_type city         gender race  was_mental_illness_r…¹   age
  <chr> <dbl> <chr>       <chr>        <chr>  <chr> <lgl>                  <dbl>
1 AK      131 shoot       Anchorage    male   W     FALSE                     33
2 AK      836 point       Fairbanks    male   N     FALSE                     19
3 AK      816 shoot       Fairbanks    male   N     FALSE                     33
4 AK      953 attack      Kenai Penin… male   W     FALSE                     49
5 AK     1166 threat      Spenard      male   N     TRUE                      49
6 AK     1255 point       Barrow       male   N     FALSE                     36
# ℹ abbreviated name: ¹​was_mental_illness_related

Group the data by state

states <- gr_state |>
 group_by(state) |>  # group by state 
summarise(n_dead = n(), .groups = "drop") |>
arrange(desc(n_dead))  # arrange rows by score
head(states)
# A tibble: 6 × 2
  state n_dead
  <chr>  <int>
1 CA      1319
2 TX       895
3 FL       608
4 AZ       426
5 GA       358
6 CO       342

Find the 10 states most affected by Fatal shootings.

top_states <- states |>
  slice_max(n_dead, n=10) # find the 10 states with the most police fatal shooting.

Plot of the 10 most fatal shootings states by police.

ggplot(top_states, aes(x = state, y = n_dead,
                       fill= state))+
  geom_col() +
  scale_fill_viridis_d(
    option ='magma',
    name = "states", labels = c("Arizona","California","Colorado","Florida",
                                    "Georgia","North Carolina","Ohio","Tennessee","Texas" ,"Washington"))+
  labs(
    title ='Top 10 States Most Affected by Fatal Police Shootings',
    x = "States",
    y = 'Number of Deaths',
    caption = 'Washington Posts'
  ) +
  theme(
  legend.key.size = unit(0.1, "cm"),  # code from AI , adjuste the legend size       
  legend.spacing.y = unit(0.5, "cm") )+
  theme_dark() # change the background

10 most affected states by fatal police shootings

victimes_by_races <- revise |> # create a new frame 
  filter(state %in% top_states$state) |>
  filter(!is.na(race)) |> #  remove na
  group_by(state, race) |>
  summarise(numbre = n(), .groups = "drop") |>
  arrange(state, desc(numbre))

head(victimes_by_races)
# A tibble: 6 × 3
  state race  numbre
  <chr> <chr>  <int>
1 AZ    W        173
2 AZ    H        140
3 AZ    B         37
4 AZ    N         16
5 CA    H        509
6 CA    W        327

Top 10 States by Fatal Police Shootings

  ggplot(victimes_by_races) +
  geom_bar(aes(x=state, y=numbre , fill = race),
      position = "dodge", stat = "identity") +
 scale_fill_viridis_d(option = 'plasma',name = "races", labels = c("Asian", "Black", "Hispanic", "Native American", "Native American, Hispanic", "White", "White, Black, and Native American"
))+
  labs(fill = "Hate Crime Type",
       y = "Number of Police",
       title = "Police Fatal shooter races",
       caption = "Source:Washington Posts ")+theme(
  legend.key.size = unit(0.1, "cm"),  # code from AI , adjuste the legend size       
  legend.spacing.y = unit(0.5, "cm") )+
theme_bw()

Mental Illness Involvement in Fatal Police Incidents

mental_health <- revise |>
  filter(state %in% top_states$state) |>
  group_by(state, was_mental_illness_related) |>
  summarise(illness = n(), .groups = "drop") |>
  arrange(state) 
  
head(mental_health)
# A tibble: 6 × 3
  state was_mental_illness_related illness
  <chr> <lgl>                        <int>
1 AZ    FALSE                          359
2 AZ    TRUE                            67
3 CA    FALSE                         1058
4 CA    TRUE                           261
5 CO    FALSE                          301
6 CO    TRUE                            41

name = “top_states”, labels = c(“Arizona”,“California”,“Colorado”,“Florida”, head(mental_health)

Fatal Police Incidents and Mental Health

ggplot(mental_health, aes(x = state, y =illness, fill = was_mental_illness_related)) +
  geom_col(position = "stack") +  # choose the plot stlyle
  labs(
    title = "Shooting case related by mental illness ",
    x = "States",
    y = "Number of Cases",
     fill = "Mental Illness Related? ",
    caption= "Washington Post")+
theme_light()

Essay

To clean and analyze my data, I first grouped all fatal police shootings by state, then identified the top 10 states with the highest number of incidents. After that, I broke the data down by race to see which racial groups were most affected. I also examined whether the cases were related to mental illness. Something that really surprised me was seeing Colorado in the top 10, even though it’s only the 21st most populated state. That imbalance made me wonder what other factors might be at play. One thing I wish the dataset included is whether or not the police officers involved were judged as guilty or not guilty after the incident. That kind of information would help us better understand accountability in these cases.