Sungji Peter Shin
3.3.2019

About dataset: Fatal Police Shootings

This dataset containing records of every fatal shooting in the United States by a police officer in the line of duty has been collected by the Washington Post since January 1, 2015. Details of the records include “the race of the deceased, the circumstances of the shooting, whether the person was armed and whether the person was experiencing a mental-health-crisis”, etc. (Tate et al., 2016) The scope of data is limited to the circumstances in which a police officer, in the line of duty, shoots and kills a civilian, thus excluding the deaths of people in police custody and fatal shootings by off-duty officers or non-shooting deaths.

library(Zelig)
Loading required package: survival
shootings <- read.csv(file = 'E:/SOC712/Fatal_police_shootings.csv', header = TRUE, sep = ",")
head(shootings)
dim(shootings)
[1] 2142   14
# rows containing missing values are excluded. 
shooting <- na.omit(shootings)
dim(shooting)
[1] 2099   14

Original dataset has 14 variables and 2,142 cases. 43 cases have at least one missing value in any of the variables thus excluded from this analysis (N=2099). My dependent variable is ‘brutal’ which indicates the level of brutality in each police shooting incident. This binary variable has two possible values: 0 = shot; 1 = shot and tasered. Shot and tasered represent the increased level of brutality in the police shooting incidents. Aim of this assignment is to create models that facilitates an understanding how the mean of the dependent variable varies as the values of the predictors (independent variables) change. My hypotheses are the following:
1) Black and Hispanic population will more likely to be victimized of an increased level of brutality;
2) Among the Black and Hispanic population, young males aged between 20 and 40 will more likely to be victimized of an increased level of brutality;
3) Individuals who try to flee will more likely to be victimized of an increased level of brutality;
4) When body camera attached to police officer is off, individuals will more likely to be victimized of an increased level of brutality;
5) Individuals who have a sign of mental illness will more likely to be victimized of an increased level of brutality; and 6) Individuals who pose a threat will more likely to be victimized of an increased level of brutality.

library(dplyr)
library(pander)
# assigning numeric values to 'brutal' variable: 0 = shot and 1 = shot and Tasered 
shooting <- shooting %>%
  mutate(brutality = as.integer(manner_of_death))
shooting <- shooting %>%
  select(brutality, manner_of_death, everything())
shooting <- shooting %>%
  mutate(brutal = sjmisc::rec(brutality, rec = '1=0; 2=1')) %>%
  select(brutal, everything()) %>%
  select(-brutality)
shooting <- shooting %>%
  mutate(run = sjmisc::rec(flee, rec = 'Car=Flee; Foot=Flee; else=copy')) %>%
  select(run, flee, everything())
head(shooting)
# simple example of obtaining probabilities (P) 
demo_df <- shooting %>%
    group_by(race) %>%
    summarise(py1 = mean(brutal)) %>%
  mutate(py0 = 1 - py1) %>%
  pandoc.table()

-------------------------
 race     py1      py0   
------ --------- --------
        0.09877   0.9012 

  A     0.09375   0.9062 

  B     0.05805   0.9419 

  H     0.06648   0.9335 

  N        0        1    

  O     0.1429    0.8571 

  W     0.0744    0.9256 
-------------------------

4 Models using one or more independent variables:

m0 <- glm(brutal ~ race, family = binomial, data = shooting)
summary(m0)

Call:
glm(formula = brutal ~ race, family = binomial, data = shooting)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.5553  -0.3932  -0.3932  -0.3458   2.3860  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.21102    0.37242  -5.937 2.91e-09 ***
raceA        -0.05767    0.71170  -0.081    0.935    
raceB        -0.57559    0.41587  -1.384    0.166    
raceH        -0.43101    0.42817  -1.007    0.314    
raceN       -14.35505  453.47148  -0.032    0.975    
raceO         0.41926    0.65602   0.639    0.523    
raceW        -0.31002    0.39081  -0.793    0.428    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1065.1  on 2098  degrees of freedom
Residual deviance: 1056.5  on 2092  degrees of freedom
AIC: 1070.5

Number of Fisher Scoring iterations: 15

First model, m0, used ‘Race’ as an independent variable (A = Asians; B = Blacks; H = Hispanic; N = Native Americans; O = Others; and W = Whites). I expected that both Blacks and Hispanics would have positive correlations in this model; but neither has positive coefficients nor has statistically significance

m1 <- glm(brutal ~ race + gender + age, family = binomial, data = shooting)
summary(m1)

Call:
glm(formula = brutal ~ race + gender + age, family = binomial, 
    data = shooting)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.5622  -0.3956  -0.3735  -0.3481   2.4633  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.372e+00  6.672e-01  -3.555 0.000378 ***
raceA       -6.545e-02  7.138e-01  -0.092 0.926937    
raceB       -5.867e-01  4.267e-01  -1.375 0.169157    
raceH       -4.466e-01  4.368e-01  -1.022 0.306574    
raceN       -1.434e+01  4.533e+02  -0.032 0.974763    
raceO        4.154e-01  6.618e-01   0.628 0.530199    
raceW       -3.125e-01  3.928e-01  -0.796 0.426216    
genderM      2.070e-01  4.712e-01   0.439 0.660366    
age         -8.603e-04  6.959e-03  -0.124 0.901613    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1065.1  on 2098  degrees of freedom
Residual deviance: 1056.3  on 2090  degrees of freedom
AIC: 1074.3

Number of Fisher Scoring iterations: 15

Second model, m1, added other demographic variables, ‘Gender and Age’ to the first model, m0. As I expected, males have higher risk of being victimized of an increased level of brutality (e^.2070 = 1.23) than females; but does not have statistical significance. Age does not have statistical significance as well.

m2 <- glm(brutal ~ race + gender + age + body_camera + run, family = binomial, data = shooting)
summary(m2)

Call:
glm(formula = brutal ~ race + gender + age + body_camera + run, 
    family = binomial, data = shooting)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.6786  -0.4123  -0.3814  -0.3080   2.5974  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)  
(Intercept)      -2.337993   0.962937  -2.428   0.0152 *
raceA            -0.101737   0.716781  -0.142   0.8871  
raceB            -0.554854   0.428704  -1.294   0.1956  
raceH            -0.442194   0.438818  -1.008   0.3136  
raceN           -14.372576 450.778059  -0.032   0.9746  
raceO             0.398778   0.664515   0.600   0.5484  
raceW            -0.304256   0.394875  -0.771   0.4410  
genderM           0.190879   0.471883   0.405   0.6858  
age              -0.004195   0.007067  -0.594   0.5528  
body_cameraTrue   0.334284   0.253478   1.319   0.1872  
runFlee          -0.426715   0.765392  -0.558   0.5772  
runNot fleeing    0.184517   0.742282   0.249   0.8037  
runOther          0.296708   0.836726   0.355   0.7229  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1065.1  on 2098  degrees of freedom
Residual deviance: 1046.7  on 2086  degrees of freedom
AIC: 1072.7

Number of Fisher Scoring iterations: 15

Third model, m2, added two more variables ‘Body_camera and Run’ which indiecate whether a body camera attached to the police officer was on and whether the deceased tried to flee. The outcome is opposite to my hypotheses (3 and 4); when body camera is on, an increased level of brutality occurred less likely and individuals who tried to flee are less likely to be victimized of an increased level of brutality. Again, none of the independent variables of this model has statistical significance.

m3 <- glm(brutal ~ race + gender + age + body_camera + run + signs_of_mental_illness*threat_level, family = binomial, data = shooting)
summary(m3)

Call:
glm(formula = brutal ~ race + gender + age + body_camera + run + 
    signs_of_mental_illness * threat_level, family = binomial, 
    data = shooting)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.8736  -0.3944  -0.3386  -0.2959   2.6869  

Coefficients:
                                                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)                                           -2.945609   0.992328  -2.968  0.00299 ** 
raceA                                                 -0.221708   0.726658  -0.305  0.76029    
raceB                                                 -0.469771   0.433126  -1.085  0.27810    
raceH                                                 -0.437204   0.442814  -0.987  0.32348    
raceN                                                -14.407387 445.120876  -0.032  0.97418    
raceO                                                  0.375709   0.672919   0.558  0.57662    
raceW                                                 -0.297437   0.398983  -0.745  0.45598    
genderM                                                0.318667   0.477191   0.668  0.50426    
age                                                   -0.003212   0.007152  -0.449  0.65333    
body_cameraTrue                                        0.213899   0.257184   0.832  0.40558    
runFlee                                               -0.351240   0.784414  -0.448  0.65432    
runNot fleeing                                         0.200085   0.764860   0.262  0.79363    
runOther                                               0.419347   0.852318   0.492  0.62271    
signs_of_mental_illnessTrue                            0.287517   0.272511   1.055  0.29140    
threat_levelother                                      0.789002   0.223705   3.527  0.00042 ***
threat_levelundetermined                               0.415081   0.406494   1.021  0.30720    
signs_of_mental_illnessTrue:threat_levelother          0.129023   0.377595   0.342  0.73258    
signs_of_mental_illnessTrue:threat_levelundetermined -14.264544 721.666596  -0.020  0.98423    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1065.1  on 2098  degrees of freedom
Residual deviance: 1019.6  on 2081  degrees of freedom
AIC: 1055.6

Number of Fisher Scoring iterations: 15

From the last model, m3, only one statistical significance is noticed with independent variable of threat_level (other). Interaction variable or the synergic effects of combined predictors (threat_level * signs_of_mental_illness) does not show statistical significance.

Likelihood ratio test:

anova(m0, m1, m2, m3, test = 'Chisq')
Analysis of Deviance Table

Model 1: brutal ~ race
Model 2: brutal ~ race + gender + age
Model 3: brutal ~ race + gender + age + body_camera + run
Model 4: brutal ~ race + gender + age + body_camera + run + signs_of_mental_illness * 
    threat_level
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1      2092     1056.5                          
2      2090     1056.3  2   0.2192   0.89620    
3      2086     1046.7  4   9.5534   0.04866 *  
4      2081     1019.6  5  27.1226 5.399e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
lmtest::lrtest(m0, m1, m2, m3)
Likelihood ratio test

Model 1: brutal ~ race
Model 2: brutal ~ race + gender + age
Model 3: brutal ~ race + gender + age + body_camera + run
Model 4: brutal ~ race + gender + age + body_camera + run + signs_of_mental_illness * 
    threat_level
  #Df  LogLik Df   Chisq Pr(>Chisq)    
1   7 -528.25                          
2   9 -528.14  2  0.2192    0.89620    
3  13 -523.36  4  9.5534    0.04866 *  
4  18 -509.80  5 27.1226  5.399e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
library(texreg)
htmlreg(list(m0, m1, m2, m3), doctype = FALSE)
Statistical models
Model 1 Model 2 Model 3 Model 4
(Intercept) -2.21*** -2.37*** -2.34* -2.95**
(0.37) (0.67) (0.96) (0.99)
raceA -0.06 -0.07 -0.10 -0.22
(0.71) (0.71) (0.72) (0.73)
raceB -0.58 -0.59 -0.55 -0.47
(0.42) (0.43) (0.43) (0.43)
raceH -0.43 -0.45 -0.44 -0.44
(0.43) (0.44) (0.44) (0.44)
raceN -14.36 -14.34 -14.37 -14.41
(453.47) (453.27) (450.78) (445.12)
raceO 0.42 0.42 0.40 0.38
(0.66) (0.66) (0.66) (0.67)
raceW -0.31 -0.31 -0.30 -0.30
(0.39) (0.39) (0.39) (0.40)
genderM 0.21 0.19 0.32
(0.47) (0.47) (0.48)
age -0.00 -0.00 -0.00
(0.01) (0.01) (0.01)
body_cameraTrue 0.33 0.21
(0.25) (0.26)
runFlee -0.43 -0.35
(0.77) (0.78)
runNot fleeing 0.18 0.20
(0.74) (0.76)
runOther 0.30 0.42
(0.84) (0.85)
signs_of_mental_illnessTrue 0.29
(0.27)
threat_levelother 0.79***
(0.22)
threat_levelundetermined 0.42
(0.41)
signs_of_mental_illnessTrue:threat_levelother 0.13
(0.38)
signs_of_mental_illnessTrue:threat_levelundetermined -14.26
(721.67)
AIC 1070.50 1074.28 1072.73 1055.61
BIC 1110.05 1125.12 1146.17 1157.29
Log Likelihood -528.25 -528.14 -523.36 -509.80
Deviance 1056.50 1056.28 1046.73 1019.61
Num. obs. 2099 2099 2099 2099
p < 0.001, p < 0.01, p < 0.05

The above likelihood ratio tests show that the last model, m3, fits the best to the data. On the table, however, m3 has the lowest AIC but has the highest BIC at the same time. Because the differences between BIC of each model is small and the results from the likelihood ratio test confirms that m3 is the best fit, I assume that the m3 is the best fit among other models.

Plotting

For both males and females, the chance of being victimized of an increased level of brutality by police officer decreases as the age of deceased increases. However, males in any age have higher likelihood compared to their counterparts.

For any level of threat posed by the deceased, males have higher likelihood of being victimized of an increased level of brutality compared to their counterparts.

Reference

Tate el al. (2016, July 7). How The Washington Post is examining police shootings in the United States. The Washington Post. Retrieved from (https://www.washingtonpost.com/national/how-the-washington-post-is-examining-police-shootings-in-the-united-states/2016/07/07/d9c52238-43ad-11e6-8856-f26de2537a9d_story.html?utm_term=.db17ec01fa1c)

