Fatal Police Shootings – Project 1

Author

BG

Fatal Police Shootings

1. Brief Introduction

This project explores data about fatal police shootings in the United States. The dataset includes information such as date, state, age, race, and other details. The main goal is to understand basic patterns in the data and see how these events are distributed across different states.

In this analysis, I focus on counting the number of shootings per state and identifying the top 10 states with the highest numbers. I also use simple summaries and graphs to make the data easier to understand.

2. Load Library

library(zoo)

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
getwd()
[1] "/Users/bettyovalle/Desktop/College/007 – Spring 2026/DATA 110/week 7"

3. Load Dataset

setwd("/Users/bettyovalle/Desktop/College/007 – Spring 2026/DATA 110/week 7")
Shootings <- read_csv("fatal-police-shootings-data.csv")
Rows: 9497 Columns: 19
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (12): threat_type, flee_status, armed_with, city, county, state, locati...
dbl   (4): id, latitude, longitude, age
lgl   (2): was_mental_illness_related, body_camera
date  (1): date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

4. Examine data

head(Shootings)
# A tibble: 6 × 19
     id date       threat_type flee_status armed_with city          county state
  <dbl> <date>     <chr>       <chr>       <chr>      <chr>         <chr>  <chr>
1     3 2015-01-02 point       not         gun        Shelton       Mason  WA   
2     4 2015-01-02 point       not         gun        Aloha         Washi… OR   
3     5 2015-01-03 move        not         unarmed    Wichita       Sedgw… KS   
4     8 2015-01-04 point       not         replica    San Francisco San F… CA   
5     9 2015-01-04 point       not         other      Evans         Weld   CO   
6    11 2015-01-04 attack      not         gun        Guthrie       Logan  OK   
# ℹ 11 more variables: latitude <dbl>, longitude <dbl>,
#   location_precision <chr>, name <chr>, age <dbl>, gender <chr>, race <chr>,
#   race_source <chr>, was_mental_illness_related <lgl>, body_camera <lgl>,
#   agency_ids <chr>

5. Cleaning Data

names(Shootings) <- toupper(names(Shootings))

How the information looks

summary(Shootings)
       ID             DATE            THREAT_TYPE        FLEE_STATUS       
 Min.   :    3   Min.   :2015-01-02   Length:9497        Length:9497       
 1st Qu.: 2657   1st Qu.:2017-06-05   Class :character   Class :character  
 Median : 5249   Median :2019-11-02   Mode  :character   Mode  :character  
 Mean   : 5219   Mean   :2019-09-29                                        
 3rd Qu.: 7787   3rd Qu.:2022-02-16                                        
 Max.   :10284   Max.   :2024-03-26                                        
                                                                           
  ARMED_WITH            CITY              COUNTY             STATE          
 Length:9497        Length:9497        Length:9497        Length:9497       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
    LATITUDE       LONGITUDE          LOCATION_PRECISION     NAME          
 Min.   :19.50   Min.   :-9.007e+15   Length:9497        Length:9497       
 1st Qu.:33.46   1st Qu.:-1.120e+02   Class :character   Class :character  
 Median :36.08   Median :-9.400e+01   Mode  :character   Mode  :character  
 Mean   :36.64   Mean   :-1.066e+12                                        
 3rd Qu.:40.03   3rd Qu.:-8.300e+01                                        
 Max.   :71.30   Max.   :-6.800e+01                                        
 NA's   :1051    NA's   :1051                                              
      AGE           GENDER              RACE           RACE_SOURCE       
 Min.   : 2.00   Length:9497        Length:9497        Length:9497       
 1st Qu.:28.00   Class :character   Class :character   Class :character  
 Median :35.00   Mode  :character   Mode  :character   Mode  :character  
 Mean   :37.42                                                           
 3rd Qu.:45.00                                                           
 Max.   :92.00                                                           
 NA's   :387                                                             
 WAS_MENTAL_ILLNESS_RELATED BODY_CAMERA      AGENCY_IDS       
 Mode :logical              Mode :logical   Length:9497       
 FALSE:7601                 FALSE:7905      Class :character  
 TRUE :1896                 TRUE :1592      Mode  :character  
                                                              
                                                              
                                                              
                                                              

Create Variable #1

Shootings_year <- Shootings |>
  mutate(year = format(DATE, "%Y")) |>
  group_by(year) |>
  summarise(total = n())

Plot

Using Unit 3 notes

Plot1 <- ggplot(Shootings_year, aes(x = year, y = total)) +
  geom_line() +
  geom_point(color = "lightblue") +
  labs(title = "Fatal Police Shootings per Year",
    x = "Year",
    y = "Number of Shootings")
Plot1
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?

Create Variable #2

Using Unit 5 notes

Shootings_race_year <- Shootings |>
  mutate(year = format(DATE, "%Y")) |>
  group_by(year, RACE) |>
  summarise(total = n(), .groups = "drop")

Plot

plot2 <- Shootings_race_year |>
  ggplot(aes(x = year, y = total)) +
  geom_point(aes(color = RACE)) +
  geom_point(color = "lightblue") +
  facet_wrap(~RACE) +
  labs(title = "Fatal Police Shootings by Race Over Time",
    x = "Year",
    y = "Number of Shootings") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90))

plot2

Create variable #3

by_state <- Shootings |>
  group_by(STATE) |>
  summarise(
    count = n(),
    avg_age = mean(AGE, na.rm = TRUE),
    avg_lat = mean(LATITUDE, na.rm = TRUE),
    avg_long = mean(LONGITUDE, na.rm = TRUE),
    .groups = "drop") |>
  arrange(desc(count)) |>
  filter(count > 10)
head(by_state)
# A tibble: 6 × 5
  STATE count avg_age avg_lat avg_long
  <chr> <int>   <dbl>   <dbl>    <dbl>
1 CA     1319    35.8    35.5   -119. 
2 TX      895    36.4    30.9    -97.6
3 FL      608    39.3    28.2    -81.8
4 AZ      426    36.7    33.4   -112. 
5 GA      358    38.0    33.4    -84.0
6 CO      342    36.5    39.5   -105. 

Top 10 States

topten_states <- by_state |>
  head(10)

Plot 3

ggplot(topten_states, aes(x = reorder(STATE, count), y = count)) +
  geom_col(fill = "lightblue") +
  coord_flip() +
  labs(title = "Top 10 Fatal Police Shootings by State",
    x = "State",
    y = "Number of Shootings") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 11))

Conclusion

I started by loading the dataset using read_csv and checked the data with head and summary. Then I cleaned the column names using touppe. This way everything was in a consistent format. I also removed missing values in calculations by using na.rm when needed. After that, I used group_by and summarise to count the number of shootings per state and calculate some simple averages like age. This helped to organize the data into a simpler table for analysis. My visualization shows the top 10 states with the highest number of fatal police shootings. The bar chart makes it easy to compare states. One interesting pattern is that a few states have much higher numbers than others. I also tried to work more with time trends and race comparisons, but I had some difficulties making those parts work correctly in the code.

For this analysis, I also used notes from Unit 3 and Unit 5 class lectures, which helped guide the data cleaning process.