Overview

This file will help you subset data from The Washington Post’s Fatal Force database. The Fatal Force database is a comprehensive collection of data on fatal shootings by police officers in the United States, meticulously compiled by The Washington Post since 2015. This dataset provides researchers with detailed information about incidents of fatal force, including the race of the deceased, circumstances surrounding the shooting, whether the individual was armed, and any mental health crises involved.

Set up your work enviornment

Open up a new .Rmd file.

Use {r setup, include=F} in your first code chunk.

knitr::opts_chunk$set(echo = TRUE)

# install packages
# install.packages("tidyverse", repos = "http://cran.us.r-project.org")

# load libraries
library(tidyverse) # collection of essential packages for data science
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr) # the dplyr package makes data manipulation easier

Load the data

Read in the raw data from the Washington Post’s repository.

This will ensure that you get the most up-to-date data.

fatal <- read.csv("https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/v2/fatal-police-shootings-data.csv")

Prepare the data

We will first turn the file into a tibble using the as_tibble function.

From there, we will use the glimpse(), names(), and str() functions to understand the data.

# examine fatal force data
fatal <- as_tibble(fatal) # turn data into a tibble

Examine the data.

names(fatal)
##  [1] "id"                         "date"                      
##  [3] "threat_type"                "flee_status"               
##  [5] "armed_with"                 "city"                      
##  [7] "county"                     "state"                     
##  [9] "latitude"                   "longitude"                 
## [11] "location_precision"         "name"                      
## [13] "age"                        "gender"                    
## [15] "race"                       "race_source"               
## [17] "was_mental_illness_related" "body_camera"               
## [19] "agency_ids"
glimpse(fatal)
## Rows: 10,247
## Columns: 19
## $ id                         <int> 3, 4, 5, 8, 9, 11, 13, 15, 16, 17, 19, 21, …
## $ date                       <chr> "2015-01-02", "2015-01-02", "2015-01-03", "…
## $ threat_type                <chr> "point", "point", "move", "point", "point",…
## $ flee_status                <chr> "not", "not", "not", "not", "not", "not", "…
## $ armed_with                 <chr> "gun", "gun", "unarmed", "replica", "other"…
## $ city                       <chr> "Shelton", "Aloha", "Wichita", "San Francis…
## $ county                     <chr> "Mason", "Washington", "Sedgwick", "San Fra…
## $ state                      <chr> "WA", "OR", "KS", "CA", "CO", "OK", "AZ", "…
## $ latitude                   <dbl> 47.24683, 45.48742, 37.69477, 37.76291, 40.…
## $ longitude                  <dbl> -123.12159, -122.89170, -97.28055, -122.422…
## $ location_precision         <chr> "not_available", "not_available", "not_avai…
## $ name                       <chr> "Tim Elliot", "Lewis Lee Lembke", "John Pau…
## $ age                        <int> 53, 47, 23, 32, 39, 18, 22, 35, 34, 47, 25,…
## $ gender                     <chr> "male", "male", "male", "male", "male", "ma…
## $ race                       <chr> "A", "W", "H", "W", "H", "W", "H", "W", "W"…
## $ race_source                <chr> "not_available", "not_available", "not_avai…
## $ was_mental_illness_related <chr> "True", "False", "False", "True", "False", …
## $ body_camera                <chr> "False", "False", "False", "False", "False"…
## $ agency_ids                 <chr> "73", "70", "238", "196", "473", "101", "19…
str(fatal)
## tibble [10,247 × 19] (S3: tbl_df/tbl/data.frame)
##  $ id                        : int [1:10247] 3 4 5 8 9 11 13 15 16 17 ...
##  $ date                      : chr [1:10247] "2015-01-02" "2015-01-02" "2015-01-03" "2015-01-04" ...
##  $ threat_type               : chr [1:10247] "point" "point" "move" "point" ...
##  $ flee_status               : chr [1:10247] "not" "not" "not" "not" ...
##  $ armed_with                : chr [1:10247] "gun" "gun" "unarmed" "replica" ...
##  $ city                      : chr [1:10247] "Shelton" "Aloha" "Wichita" "San Francisco" ...
##  $ county                    : chr [1:10247] "Mason" "Washington" "Sedgwick" "San Francisco" ...
##  $ state                     : chr [1:10247] "WA" "OR" "KS" "CA" ...
##  $ latitude                  : num [1:10247] 47.2 45.5 37.7 37.8 40.4 ...
##  $ longitude                 : num [1:10247] -123.1 -122.9 -97.3 -122.4 -104.7 ...
##  $ location_precision        : chr [1:10247] "not_available" "not_available" "not_available" "not_available" ...
##  $ name                      : chr [1:10247] "Tim Elliot" "Lewis Lee Lembke" "John Paul Quintero" "Matthew Hoffman" ...
##  $ age                       : int [1:10247] 53 47 23 32 39 18 22 35 34 47 ...
##  $ gender                    : chr [1:10247] "male" "male" "male" "male" ...
##  $ race                      : chr [1:10247] "A" "W" "H" "W" ...
##  $ race_source               : chr [1:10247] "not_available" "not_available" "not_available" "not_available" ...
##  $ was_mental_illness_related: chr [1:10247] "True" "False" "False" "True" ...
##  $ body_camera               : chr [1:10247] "False" "False" "False" "False" ...
##  $ agency_ids                : chr [1:10247] "73" "70" "238" "196" ...
head(fatal)
## # A tibble: 6 × 19
##      id date      threat_type flee_status armed_with city  county state latitude
##   <int> <chr>     <chr>       <chr>       <chr>      <chr> <chr>  <chr>    <dbl>
## 1     3 2015-01-… point       not         gun        Shel… Mason  WA        47.2
## 2     4 2015-01-… point       not         gun        Aloha Washi… OR        45.5
## 3     5 2015-01-… move        not         unarmed    Wich… Sedgw… KS        37.7
## 4     8 2015-01-… point       not         replica    San … San F… CA        37.8
## 5     9 2015-01-… point       not         other      Evans Weld   CO        40.4
## 6    11 2015-01-… attack      not         gun        Guth… Logan  OK        35.9
## # ℹ 10 more variables: longitude <dbl>, location_precision <chr>, name <chr>,
## #   age <int>, gender <chr>, race <chr>, race_source <chr>,
## #   was_mental_illness_related <chr>, body_camera <chr>, agency_ids <chr>
tail(fatal)
## # A tibble: 6 × 19
##      id date      threat_type flee_status armed_with city  county state latitude
##   <int> <chr>     <chr>       <chr>       <chr>      <chr> <chr>  <chr>    <dbl>
## 1 11099 2024-11-… point       ""          gun        Mari… Jacks… FL        30.8
## 2 11095 2024-11-… undetermin… "not"       unknown    Greer Green… SC        35.0
## 3 11101 2024-11-… point       "not"       gun        Lind… Smith  TX        32.5
## 4 11102 2024-11-… move        ""          replica    Phoe… Maric… AZ        33.5
## 5 11103 2024-11-… shoot       ""          gun        Bowl… Warren KY        37.0
## 6 11105 2024-11-… shoot       "car"       gun;other  Jack… Jacks… MI        42.2
## # ℹ 10 more variables: longitude <dbl>, location_precision <chr>, name <chr>,
## #   age <int>, gender <chr>, race <chr>, race_source <chr>,
## #   was_mental_illness_related <chr>, body_camera <chr>, agency_ids <chr>

We will change some of the data to more appropriate formats.

# fix vars

# change vars to appropriate formats
fatal$date <- as.Date(fatal$date) # check/change to date format
fatal$age <- as.numeric(fatal$age)
fatal$was_mental_illness_related <- as.logical(fatal$was_mental_illness_related)
fatal$body_camera <- as.logical(fatal$body_camera)
  
# format to 20YY
fatal.year <- format(fatal$date, format="20%y") 
fatal$year <- fatal.year # add a year column to the df
fatal %>% relocate(state, year) -> fatal
fatal
## # A tibble: 10,247 × 20
##    state year     id date       threat_type flee_status armed_with city   county
##    <chr> <chr> <int> <date>     <chr>       <chr>       <chr>      <chr>  <chr> 
##  1 WA    2015      3 2015-01-02 point       not         gun        Shelt… Mason 
##  2 OR    2015      4 2015-01-02 point       not         gun        Aloha  Washi…
##  3 KS    2015      5 2015-01-03 move        not         unarmed    Wichi… Sedgw…
##  4 CA    2015      8 2015-01-04 point       not         replica    San F… San F…
##  5 CO    2015      9 2015-01-04 point       not         other      Evans  Weld  
##  6 OK    2015     11 2015-01-04 attack      not         gun        Guthr… Logan 
##  7 AZ    2015     13 2015-01-05 shoot       car         gun        Chand… Maric…
##  8 KS    2015     15 2015-01-06 point       not         gun        Assar… Saline
##  9 IA    2015     16 2015-01-06 accident    not         unarmed    Burli… Des M…
## 10 PA    2015     17 2015-01-06 point       not         replica    Knoxv… Alleg…
## # ℹ 10,237 more rows
## # ℹ 11 more variables: latitude <dbl>, longitude <dbl>,
## #   location_precision <chr>, name <chr>, age <dbl>, gender <chr>, race <chr>,
## #   race_source <chr>, was_mental_illness_related <lgl>, body_camera <lgl>,
## #   agency_ids <chr>

Analyze the data

Get counts by year.

# get counts by year
fatal %>% count(year)
## # A tibble: 10 × 2
##    year      n
##    <chr> <int>
##  1 2015    995
##  2 2016    959
##  3 2017    984
##  4 2018    992
##  5 2019    994
##  6 2020   1021
##  7 2021   1050
##  8 2022   1097
##  9 2023   1164
## 10 2024    991

You may continue from this point with your own analyses.