This file will help you subset data from The Washington Post’s Fatal Force database. The Fatal Force database is a comprehensive collection of data on fatal shootings by police officers in the United States, meticulously compiled by The Washington Post since 2015. This dataset provides researchers with detailed information about incidents of fatal force, including the race of the deceased, circumstances surrounding the shooting, whether the individual was armed, and any mental health crises involved.
Open up a new .Rmd file.
Use {r setup, include=F}
in your first code chunk.
knitr::opts_chunk$set(echo = TRUE)
# install packages
# install.packages("tidyverse", repos = "http://cran.us.r-project.org")
# load libraries
library(tidyverse) # collection of essential packages for data science
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr) # the dplyr package makes data manipulation easier
Read in the raw data from the Washington Post’s repository.
This will ensure that you get the most up-to-date data.
fatal <- read.csv("https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/v2/fatal-police-shootings-data.csv")
We will first turn the file into a tibble using the as_tibble function.
From there, we will use the glimpse(), names(), and str() functions to understand the data.
# examine fatal force data
fatal <- as_tibble(fatal) # turn data into a tibble
Examine the data.
names(fatal)
## [1] "id" "date"
## [3] "threat_type" "flee_status"
## [5] "armed_with" "city"
## [7] "county" "state"
## [9] "latitude" "longitude"
## [11] "location_precision" "name"
## [13] "age" "gender"
## [15] "race" "race_source"
## [17] "was_mental_illness_related" "body_camera"
## [19] "agency_ids"
glimpse(fatal)
## Rows: 10,247
## Columns: 19
## $ id <int> 3, 4, 5, 8, 9, 11, 13, 15, 16, 17, 19, 21, …
## $ date <chr> "2015-01-02", "2015-01-02", "2015-01-03", "…
## $ threat_type <chr> "point", "point", "move", "point", "point",…
## $ flee_status <chr> "not", "not", "not", "not", "not", "not", "…
## $ armed_with <chr> "gun", "gun", "unarmed", "replica", "other"…
## $ city <chr> "Shelton", "Aloha", "Wichita", "San Francis…
## $ county <chr> "Mason", "Washington", "Sedgwick", "San Fra…
## $ state <chr> "WA", "OR", "KS", "CA", "CO", "OK", "AZ", "…
## $ latitude <dbl> 47.24683, 45.48742, 37.69477, 37.76291, 40.…
## $ longitude <dbl> -123.12159, -122.89170, -97.28055, -122.422…
## $ location_precision <chr> "not_available", "not_available", "not_avai…
## $ name <chr> "Tim Elliot", "Lewis Lee Lembke", "John Pau…
## $ age <int> 53, 47, 23, 32, 39, 18, 22, 35, 34, 47, 25,…
## $ gender <chr> "male", "male", "male", "male", "male", "ma…
## $ race <chr> "A", "W", "H", "W", "H", "W", "H", "W", "W"…
## $ race_source <chr> "not_available", "not_available", "not_avai…
## $ was_mental_illness_related <chr> "True", "False", "False", "True", "False", …
## $ body_camera <chr> "False", "False", "False", "False", "False"…
## $ agency_ids <chr> "73", "70", "238", "196", "473", "101", "19…
str(fatal)
## tibble [10,247 × 19] (S3: tbl_df/tbl/data.frame)
## $ id : int [1:10247] 3 4 5 8 9 11 13 15 16 17 ...
## $ date : chr [1:10247] "2015-01-02" "2015-01-02" "2015-01-03" "2015-01-04" ...
## $ threat_type : chr [1:10247] "point" "point" "move" "point" ...
## $ flee_status : chr [1:10247] "not" "not" "not" "not" ...
## $ armed_with : chr [1:10247] "gun" "gun" "unarmed" "replica" ...
## $ city : chr [1:10247] "Shelton" "Aloha" "Wichita" "San Francisco" ...
## $ county : chr [1:10247] "Mason" "Washington" "Sedgwick" "San Francisco" ...
## $ state : chr [1:10247] "WA" "OR" "KS" "CA" ...
## $ latitude : num [1:10247] 47.2 45.5 37.7 37.8 40.4 ...
## $ longitude : num [1:10247] -123.1 -122.9 -97.3 -122.4 -104.7 ...
## $ location_precision : chr [1:10247] "not_available" "not_available" "not_available" "not_available" ...
## $ name : chr [1:10247] "Tim Elliot" "Lewis Lee Lembke" "John Paul Quintero" "Matthew Hoffman" ...
## $ age : int [1:10247] 53 47 23 32 39 18 22 35 34 47 ...
## $ gender : chr [1:10247] "male" "male" "male" "male" ...
## $ race : chr [1:10247] "A" "W" "H" "W" ...
## $ race_source : chr [1:10247] "not_available" "not_available" "not_available" "not_available" ...
## $ was_mental_illness_related: chr [1:10247] "True" "False" "False" "True" ...
## $ body_camera : chr [1:10247] "False" "False" "False" "False" ...
## $ agency_ids : chr [1:10247] "73" "70" "238" "196" ...
head(fatal)
## # A tibble: 6 × 19
## id date threat_type flee_status armed_with city county state latitude
## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 3 2015-01-… point not gun Shel… Mason WA 47.2
## 2 4 2015-01-… point not gun Aloha Washi… OR 45.5
## 3 5 2015-01-… move not unarmed Wich… Sedgw… KS 37.7
## 4 8 2015-01-… point not replica San … San F… CA 37.8
## 5 9 2015-01-… point not other Evans Weld CO 40.4
## 6 11 2015-01-… attack not gun Guth… Logan OK 35.9
## # ℹ 10 more variables: longitude <dbl>, location_precision <chr>, name <chr>,
## # age <int>, gender <chr>, race <chr>, race_source <chr>,
## # was_mental_illness_related <chr>, body_camera <chr>, agency_ids <chr>
tail(fatal)
## # A tibble: 6 × 19
## id date threat_type flee_status armed_with city county state latitude
## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 11099 2024-11-… point "" gun Mari… Jacks… FL 30.8
## 2 11095 2024-11-… undetermin… "not" unknown Greer Green… SC 35.0
## 3 11101 2024-11-… point "not" gun Lind… Smith TX 32.5
## 4 11102 2024-11-… move "" replica Phoe… Maric… AZ 33.5
## 5 11103 2024-11-… shoot "" gun Bowl… Warren KY 37.0
## 6 11105 2024-11-… shoot "car" gun;other Jack… Jacks… MI 42.2
## # ℹ 10 more variables: longitude <dbl>, location_precision <chr>, name <chr>,
## # age <int>, gender <chr>, race <chr>, race_source <chr>,
## # was_mental_illness_related <chr>, body_camera <chr>, agency_ids <chr>
We will change some of the data to more appropriate formats.
# fix vars
# change vars to appropriate formats
fatal$date <- as.Date(fatal$date) # check/change to date format
fatal$age <- as.numeric(fatal$age)
fatal$was_mental_illness_related <- as.logical(fatal$was_mental_illness_related)
fatal$body_camera <- as.logical(fatal$body_camera)
# format to 20YY
fatal.year <- format(fatal$date, format="20%y")
fatal$year <- fatal.year # add a year column to the df
fatal %>% relocate(state, year) -> fatal
fatal
## # A tibble: 10,247 × 20
## state year id date threat_type flee_status armed_with city county
## <chr> <chr> <int> <date> <chr> <chr> <chr> <chr> <chr>
## 1 WA 2015 3 2015-01-02 point not gun Shelt… Mason
## 2 OR 2015 4 2015-01-02 point not gun Aloha Washi…
## 3 KS 2015 5 2015-01-03 move not unarmed Wichi… Sedgw…
## 4 CA 2015 8 2015-01-04 point not replica San F… San F…
## 5 CO 2015 9 2015-01-04 point not other Evans Weld
## 6 OK 2015 11 2015-01-04 attack not gun Guthr… Logan
## 7 AZ 2015 13 2015-01-05 shoot car gun Chand… Maric…
## 8 KS 2015 15 2015-01-06 point not gun Assar… Saline
## 9 IA 2015 16 2015-01-06 accident not unarmed Burli… Des M…
## 10 PA 2015 17 2015-01-06 point not replica Knoxv… Alleg…
## # ℹ 10,237 more rows
## # ℹ 11 more variables: latitude <dbl>, longitude <dbl>,
## # location_precision <chr>, name <chr>, age <dbl>, gender <chr>, race <chr>,
## # race_source <chr>, was_mental_illness_related <lgl>, body_camera <lgl>,
## # agency_ids <chr>
Get counts by year.
# get counts by year
fatal %>% count(year)
## # A tibble: 10 × 2
## year n
## <chr> <int>
## 1 2015 995
## 2 2016 959
## 3 2017 984
## 4 2018 992
## 5 2019 994
## 6 2020 1021
## 7 2021 1050
## 8 2022 1097
## 9 2023 1164
## 10 2024 991
You may continue from this point with your own analyses.