Sungji Peter Shin
3.3.2019
This dataset containing records of every fatal shooting in the United States by a police officer in the line of duty has been collected by the Washington Post since January 1, 2015. Details of the records include “the race of the deceased, the circumstances of the shooting, whether the person was armed and whether the person was experiencing a mental-health-crisis”, etc. (Tate et al., 2016) The scope of data is limited to the circumstances in which a police officer, in the line of duty, shoots and kills a civilian, thus excluding the deaths of people in police custody and fatal shootings by off-duty officers or non-shooting deaths.
library(Zelig)
Loading required package: survival
shootings <- read.csv(file = 'E:/SOC712/Fatal_police_shootings.csv', header = TRUE, sep = ",")
head(shootings)
dim(shootings)
[1] 2142 14
# rows containing missing values are excluded.
shooting <- na.omit(shootings)
dim(shooting)
[1] 2099 14
Original dataset has 14 variables and 2,142 cases. 43 cases have at least one missing value in any of the variables thus excluded from this analysis (N=2099). My dependent variable is ‘brutal’ which indicates the level of brutality in each police shooting incident. This binary variable has two possible values: 0 = shot; 1 = shot and tasered. Shot and tasered represent the increased level of brutality in the police shooting incidents. Aim of this assignment is to create models that facilitates an understanding how the mean of the dependent variable varies as the values of the predictors (independent variables) change. My hypotheses are the following:
1) Black and Hispanic population will more likely to be victimized of an increased level of brutality;
2) Among the Black and Hispanic population, young males aged between 20 and 40 will more likely to be victimized of an increased level of brutality;
3) Individuals who try to flee will more likely to be victimized of an increased level of brutality;
4) When body camera attached to police officer is off, individuals will more likely to be victimized of an increased level of brutality;
5) Individuals who have a sign of mental illness will more likely to be victimized of an increased level of brutality; and 6) Individuals who pose a threat will more likely to be victimized of an increased level of brutality.
library(dplyr)
library(pander)
# assigning numeric values to 'brutal' variable: 0 = shot and 1 = shot and Tasered
shooting <- shooting %>%
mutate(brutality = as.integer(manner_of_death))
shooting <- shooting %>%
select(brutality, manner_of_death, everything())
shooting <- shooting %>%
mutate(brutal = sjmisc::rec(brutality, rec = '1=0; 2=1')) %>%
select(brutal, everything()) %>%
select(-brutality)
shooting <- shooting %>%
mutate(run = sjmisc::rec(flee, rec = 'Car=Flee; Foot=Flee; else=copy')) %>%
select(run, flee, everything())
head(shooting)
# simple example of obtaining probabilities (P)
demo_df <- shooting %>%
group_by(race) %>%
summarise(py1 = mean(brutal)) %>%
mutate(py0 = 1 - py1) %>%
pandoc.table()
-------------------------
race py1 py0
------ --------- --------
0.09877 0.9012
A 0.09375 0.9062
B 0.05805 0.9419
H 0.06648 0.9335
N 0 1
O 0.1429 0.8571
W 0.0744 0.9256
-------------------------
m0 <- glm(brutal ~ race, family = binomial, data = shooting)
summary(m0)
Call:
glm(formula = brutal ~ race, family = binomial, data = shooting)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.5553 -0.3932 -0.3932 -0.3458 2.3860
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.21102 0.37242 -5.937 2.91e-09 ***
raceA -0.05767 0.71170 -0.081 0.935
raceB -0.57559 0.41587 -1.384 0.166
raceH -0.43101 0.42817 -1.007 0.314
raceN -14.35505 453.47148 -0.032 0.975
raceO 0.41926 0.65602 0.639 0.523
raceW -0.31002 0.39081 -0.793 0.428
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1065.1 on 2098 degrees of freedom
Residual deviance: 1056.5 on 2092 degrees of freedom
AIC: 1070.5
Number of Fisher Scoring iterations: 15
First model, m0, used ‘Race’ as an independent variable (A = Asians; B = Blacks; H = Hispanic; N = Native Americans; O = Others; and W = Whites). I expected that both Blacks and Hispanics would have positive correlations in this model; but neither has positive coefficients nor has statistically significance
m1 <- glm(brutal ~ race + gender + age, family = binomial, data = shooting)
summary(m1)
Call:
glm(formula = brutal ~ race + gender + age, family = binomial,
data = shooting)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.5622 -0.3956 -0.3735 -0.3481 2.4633
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.372e+00 6.672e-01 -3.555 0.000378 ***
raceA -6.545e-02 7.138e-01 -0.092 0.926937
raceB -5.867e-01 4.267e-01 -1.375 0.169157
raceH -4.466e-01 4.368e-01 -1.022 0.306574
raceN -1.434e+01 4.533e+02 -0.032 0.974763
raceO 4.154e-01 6.618e-01 0.628 0.530199
raceW -3.125e-01 3.928e-01 -0.796 0.426216
genderM 2.070e-01 4.712e-01 0.439 0.660366
age -8.603e-04 6.959e-03 -0.124 0.901613
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1065.1 on 2098 degrees of freedom
Residual deviance: 1056.3 on 2090 degrees of freedom
AIC: 1074.3
Number of Fisher Scoring iterations: 15
Second model, m1, added other demographic variables, ‘Gender and Age’ to the first model, m0. As I expected, males have higher risk of being victimized of an increased level of brutality (e^.2070 = 1.23) than females; but does not have statistical significance. Age does not have statistical significance as well.
m2 <- glm(brutal ~ race + gender + age + body_camera + run, family = binomial, data = shooting)
summary(m2)
Call:
glm(formula = brutal ~ race + gender + age + body_camera + run,
family = binomial, data = shooting)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6786 -0.4123 -0.3814 -0.3080 2.5974
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.337993 0.962937 -2.428 0.0152 *
raceA -0.101737 0.716781 -0.142 0.8871
raceB -0.554854 0.428704 -1.294 0.1956
raceH -0.442194 0.438818 -1.008 0.3136
raceN -14.372576 450.778059 -0.032 0.9746
raceO 0.398778 0.664515 0.600 0.5484
raceW -0.304256 0.394875 -0.771 0.4410
genderM 0.190879 0.471883 0.405 0.6858
age -0.004195 0.007067 -0.594 0.5528
body_cameraTrue 0.334284 0.253478 1.319 0.1872
runFlee -0.426715 0.765392 -0.558 0.5772
runNot fleeing 0.184517 0.742282 0.249 0.8037
runOther 0.296708 0.836726 0.355 0.7229
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1065.1 on 2098 degrees of freedom
Residual deviance: 1046.7 on 2086 degrees of freedom
AIC: 1072.7
Number of Fisher Scoring iterations: 15
Third model, m2, added two more variables ‘Body_camera and Run’ which indiecate whether a body camera attached to the police officer was on and whether the deceased tried to flee. The outcome is opposite to my hypotheses (3 and 4); when body camera is on, an increased level of brutality occurred less likely and individuals who tried to flee are less likely to be victimized of an increased level of brutality. Again, none of the independent variables of this model has statistical significance.
m3 <- glm(brutal ~ race + gender + age + body_camera + run + signs_of_mental_illness*threat_level, family = binomial, data = shooting)
summary(m3)
Call:
glm(formula = brutal ~ race + gender + age + body_camera + run +
signs_of_mental_illness * threat_level, family = binomial,
data = shooting)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8736 -0.3944 -0.3386 -0.2959 2.6869
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.945609 0.992328 -2.968 0.00299 **
raceA -0.221708 0.726658 -0.305 0.76029
raceB -0.469771 0.433126 -1.085 0.27810
raceH -0.437204 0.442814 -0.987 0.32348
raceN -14.407387 445.120876 -0.032 0.97418
raceO 0.375709 0.672919 0.558 0.57662
raceW -0.297437 0.398983 -0.745 0.45598
genderM 0.318667 0.477191 0.668 0.50426
age -0.003212 0.007152 -0.449 0.65333
body_cameraTrue 0.213899 0.257184 0.832 0.40558
runFlee -0.351240 0.784414 -0.448 0.65432
runNot fleeing 0.200085 0.764860 0.262 0.79363
runOther 0.419347 0.852318 0.492 0.62271
signs_of_mental_illnessTrue 0.287517 0.272511 1.055 0.29140
threat_levelother 0.789002 0.223705 3.527 0.00042 ***
threat_levelundetermined 0.415081 0.406494 1.021 0.30720
signs_of_mental_illnessTrue:threat_levelother 0.129023 0.377595 0.342 0.73258
signs_of_mental_illnessTrue:threat_levelundetermined -14.264544 721.666596 -0.020 0.98423
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1065.1 on 2098 degrees of freedom
Residual deviance: 1019.6 on 2081 degrees of freedom
AIC: 1055.6
Number of Fisher Scoring iterations: 15
From the last model, m3, only one statistical significance is noticed with independent variable of threat_level (other). Interaction variable or the synergic effects of combined predictors (threat_level * signs_of_mental_illness) does not show statistical significance.
anova(m0, m1, m2, m3, test = 'Chisq')
Analysis of Deviance Table
Model 1: brutal ~ race
Model 2: brutal ~ race + gender + age
Model 3: brutal ~ race + gender + age + body_camera + run
Model 4: brutal ~ race + gender + age + body_camera + run + signs_of_mental_illness *
threat_level
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 2092 1056.5
2 2090 1056.3 2 0.2192 0.89620
3 2086 1046.7 4 9.5534 0.04866 *
4 2081 1019.6 5 27.1226 5.399e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
lmtest::lrtest(m0, m1, m2, m3)
Likelihood ratio test
Model 1: brutal ~ race
Model 2: brutal ~ race + gender + age
Model 3: brutal ~ race + gender + age + body_camera + run
Model 4: brutal ~ race + gender + age + body_camera + run + signs_of_mental_illness *
threat_level
#Df LogLik Df Chisq Pr(>Chisq)
1 7 -528.25
2 9 -528.14 2 0.2192 0.89620
3 13 -523.36 4 9.5534 0.04866 *
4 18 -509.80 5 27.1226 5.399e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
library(texreg)
htmlreg(list(m0, m1, m2, m3), doctype = FALSE)
| Model 1 | Model 2 | Model 3 | Model 4 | ||
|---|---|---|---|---|---|
| (Intercept) | -2.21*** | -2.37*** | -2.34* | -2.95** | |
| (0.37) | (0.67) | (0.96) | (0.99) | ||
| raceA | -0.06 | -0.07 | -0.10 | -0.22 | |
| (0.71) | (0.71) | (0.72) | (0.73) | ||
| raceB | -0.58 | -0.59 | -0.55 | -0.47 | |
| (0.42) | (0.43) | (0.43) | (0.43) | ||
| raceH | -0.43 | -0.45 | -0.44 | -0.44 | |
| (0.43) | (0.44) | (0.44) | (0.44) | ||
| raceN | -14.36 | -14.34 | -14.37 | -14.41 | |
| (453.47) | (453.27) | (450.78) | (445.12) | ||
| raceO | 0.42 | 0.42 | 0.40 | 0.38 | |
| (0.66) | (0.66) | (0.66) | (0.67) | ||
| raceW | -0.31 | -0.31 | -0.30 | -0.30 | |
| (0.39) | (0.39) | (0.39) | (0.40) | ||
| genderM | 0.21 | 0.19 | 0.32 | ||
| (0.47) | (0.47) | (0.48) | |||
| age | -0.00 | -0.00 | -0.00 | ||
| (0.01) | (0.01) | (0.01) | |||
| body_cameraTrue | 0.33 | 0.21 | |||
| (0.25) | (0.26) | ||||
| runFlee | -0.43 | -0.35 | |||
| (0.77) | (0.78) | ||||
| runNot fleeing | 0.18 | 0.20 | |||
| (0.74) | (0.76) | ||||
| runOther | 0.30 | 0.42 | |||
| (0.84) | (0.85) | ||||
| signs_of_mental_illnessTrue | 0.29 | ||||
| (0.27) | |||||
| threat_levelother | 0.79*** | ||||
| (0.22) | |||||
| threat_levelundetermined | 0.42 | ||||
| (0.41) | |||||
| signs_of_mental_illnessTrue:threat_levelother | 0.13 | ||||
| (0.38) | |||||
| signs_of_mental_illnessTrue:threat_levelundetermined | -14.26 | ||||
| (721.67) | |||||
| AIC | 1070.50 | 1074.28 | 1072.73 | 1055.61 | |
| BIC | 1110.05 | 1125.12 | 1146.17 | 1157.29 | |
| Log Likelihood | -528.25 | -528.14 | -523.36 | -509.80 | |
| Deviance | 1056.50 | 1056.28 | 1046.73 | 1019.61 | |
| Num. obs. | 2099 | 2099 | 2099 | 2099 | |
| p < 0.001, p < 0.01, p < 0.05 | |||||
The above likelihood ratio tests show that the last model, m3, fits the best to the data. On the table, however, m3 has the lowest AIC but has the highest BIC at the same time. Because the differences between BIC of each model is small and the results from the likelihood ratio test confirms that m3 is the best fit, I assume that the m3 is the best fit among other models.
For both males and females, the chance of being victimized of an increased level of brutality by police officer decreases as the age of deceased increases. However, males in any age have higher likelihood compared to their counterparts.
For any level of threat posed by the deceased, males have higher likelihood of being victimized of an increased level of brutality compared to their counterparts.
Tate el al. (2016, July 7). How The Washington Post is examining police shootings in the United States. The Washington Post. Retrieved from (https://www.washingtonpost.com/national/how-the-washington-post-is-examining-police-shootings-in-the-united-states/2016/07/07/d9c52238-43ad-11e6-8856-f26de2537a9d_story.html?utm_term=.db17ec01fa1c)