I will be using the Fatal Police Shootings data set from the openintro.org website. There are 6421 observations and 12 variables. I will be using 2 variables; date and body camera to answer my question about if there is a change in the proportion of body camera use over time. The data was collected from 2015 through 2021 by the Washington Post. It is periodically updated as more information becomes available.
I chose it because I am curious to see how quickly new technology is being adopted. I wanted to know if, as I suspect, there was an increase in body camera usage over time.
I am going to select the date and body camera columns to answer the question. I am going to look at the dimensions and head of the data as well as check for any missing information. I will use group by and summarise and look at proportions of how frequently body cameras were used.
Then I will create tables, display the relevant numbers, create a scatterplot of the results and discuss them.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(corrplot)
## corrplot 0.95 loaded
library(lubridate)
setwd("~/Downloads/Data 101 Course materials/Data Sets")
f_p_s <- read.csv("fatal_police_shootings.csv")
str(f_p_s)
## 'data.frame': 6421 obs. of 12 variables:
## $ date : chr "2015-01-02" "2015-01-02" "2015-01-03" "2015-01-04" ...
## $ manner_of_death : chr "shot" "shot" "shot and Tasered" "shot" ...
## $ armed : chr "gun" "gun" "unarmed" "toy weapon" ...
## $ age : int 53 47 23 32 39 18 22 35 34 47 ...
## $ gender : chr "M" "M" "M" "M" ...
## $ race : chr "A" "W" "H" "W" ...
## $ city : chr "Shelton" "Aloha" "Wichita" "San Francisco" ...
## $ state : chr "WA" "OR" "KS" "CA" ...
## $ signs_of_mental_illness: chr "True" "False" "False" "True" ...
## $ threat_level : chr "attack" "attack" "other" "attack" ...
## $ flee : chr "Not fleeing" "Not fleeing" "Not fleeing" "Not fleeing" ...
## $ body_camera : chr "False" "False" "False" "False" ...
head(f_p_s)
## date manner_of_death armed age gender race city state
## 1 2015-01-02 shot gun 53 M A Shelton WA
## 2 2015-01-02 shot gun 47 M W Aloha OR
## 3 2015-01-03 shot and Tasered unarmed 23 M H Wichita KS
## 4 2015-01-04 shot toy weapon 32 M W San Francisco CA
## 5 2015-01-04 shot nail gun 39 M H Evans CO
## 6 2015-01-04 shot gun 18 M W Guthrie OK
## signs_of_mental_illness threat_level flee body_camera
## 1 True attack Not fleeing False
## 2 False attack Not fleeing False
## 3 False other Not fleeing False
## 4 True attack Not fleeing False
## 5 False attack Not fleeing False
## 6 False attack Not fleeing False
colSums(is.na(f_p_s))
## date manner_of_death armed
## 0 0 0
## age gender race
## 285 0 0
## city state signs_of_mental_illness
## 0 0 0
## threat_level flee body_camera
## 0 0 0
Even though there are 285 na’s in the age column it is not an issue because I am not using the age variable.
f_p_s$year <- format(as.Date(f_p_s$date), "%Y")
This groups all the incidents together by year.
df <- f_p_s |>
group_by(year, body_camera) |>
summarise(number_incidents = n()
)
## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by year and body_camera.
## ℹ Output is grouped by year.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(year, body_camera))` for per-operation grouping
## (`?dplyr::dplyr_by`) instead.
df
## # A tibble: 14 × 3
## # Groups: year [7]
## year body_camera number_incidents
## <chr> <chr> <int>
## 1 2015 False 918
## 2 2015 True 75
## 3 2016 False 814
## 4 2016 True 145
## 5 2017 False 878
## 6 2017 True 108
## 7 2018 False 869
## 8 2018 True 121
## 9 2019 False 863
## 10 2019 True 136
## 11 2020 False 847
## 12 2020 True 174
## 13 2021 False 365
## 14 2021 True 108
False means no camera footage. True means there was a recording mentioned by news sources.
total<- xtabs(~year, data=f_p_s)
total
## year
## 2015 2016 2017 2018 2019 2020 2021
## 993 959 986 990 999 1021 473
This shows the total number of fatal police shootings that took place by year.
df_true<- f_p_s |>
group_by(year, body_camera) |>
summarise(number_incidents = n()
) |>
filter(body_camera=="True")
## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by year and body_camera.
## ℹ Output is grouped by year.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(year, body_camera))` for per-operation grouping
## (`?dplyr::dplyr_by`) instead.
df_true
## # A tibble: 7 × 3
## # Groups: year [7]
## year body_camera number_incidents
## <chr> <chr> <int>
## 1 2015 True 75
## 2 2016 True 145
## 3 2017 True 108
## 4 2018 True 121
## 5 2019 True 136
## 6 2020 True 174
## 7 2021 True 108
This shows the total fatal police shootings that were captured by body camera by year.
#2015
75/993
## [1] 0.0755287
#2016
145/959
## [1] 0.1511992
#2017
108/986
## [1] 0.1095335
#2018
121/990
## [1] 0.1222222
#2019
136/999
## [1] 0.1361361
#2020
174/1021
## [1] 0.1704212
#2021
108/473
## [1] 0.2283298
years <- (c(2015, 2016, 2017, 2018, 2019, 2020, 2021))
proportion_body_camera <- (c(0.0755287, 0.1511992, 0.1095335, 0.1222222, 0.1361361, 0.1704212, 0.2283298))
plot(years, proportion_body_camera)
2021 had the highest proportion of fatal shooting incidents captured on body camera at almost 23%, (a significant increase from previous years) and 2015 had the least at around 7.5%. In 2016 the proportion of fatal shooting incidents captured on camera doubled to 15%. After that in 2017 the usage declines to almost 11% then continues to increase each year through 2021.
So in conclusion, the proportion of fatal shooting incidents captured on body camera has certainly increased since 2015, with the largest increase taking place between 2015 and 2016, and despite a decrease in 2016, it continued to increase each year through and including 2021.
It would be interesting to compare these results to current usage to determine if this trend has continued. It would also be worthwhile to look at whether the increase in the usage of the body camera has led to fewer incidents of fatal police shootings; if the camera creates a deterrent for those who would threaten to harm police officers and as well as for officers to have more transparency and accountability for actions taken.
It is important to note that the data on body camera usage from this data set was gathered from whether there were news reports, i.e. if it was made public that there was body camera footage of a fatal police shooting. We do not know if there is more footage that exists than we are aware of. It would be helpful to know if there is a requirement for this information to be shared to know if we are indeed receiving the complete picture of the body camera usage during of fatal police shootings. Although the Washington Post is updating this data set as more information becomes avaialble, more investigation of this point is required.
Additionally, it is worth noting that the Covid-19 pandemic and the resulting lockdowns and social distancing may have accounted for the decline in fatal police shootings in 2021 and may have affected the proportion of body camera use that year, though it is imagined that that would be the case in 2020 as well and the numbers don’t reflect that in that year.
Sources https://www.openintro.org/data/index.php?data=fatal_police_shootings
https://www.statology.org/italics-in-r/ (For the scatterplot)
For further reading: