I have chosen the dataset “Gun Deaths in the U.S.: 2012-2014” from Kaggle. The dataset contains information about the sex, age, race, education, police involvement, and intent of gun deaths between 2012 and 2014.

Importing Packages

library(dplyr)
library(Zelig)
library(pander)
library(visreg)
library(readr)
gun_data <- read_csv("/Users/rachel_ramphal/Documents/guns.csv")
## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   year = col_double(),
##   month = col_character(),
##   intent = col_character(),
##   police = col_double(),
##   sex = col_character(),
##   age = col_double(),
##   race = col_character(),
##   hispanic = col_double(),
##   place = col_character(),
##   education = col_double()
## )

Models

Model 1

For the first model I will examine the independent variable, age, and the dependent variable, police_new (this variable determines if there was any police involvement) to determine if there is a significant relationship between the two.

model_1 <- glm(police ~ age, family = "binomial", data = gun_data)
summary(model_1)
## 
## Call:
## glm(formula = police ~ age, family = "binomial", data = gun_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.2658  -0.1944  -0.1613  -0.1338   3.2783  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.32544    0.06313  -52.68   <2e-16 ***
## age         -0.02349    0.00158  -14.87   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 14772  on 100779  degrees of freedom
## Residual deviance: 14526  on 100778  degrees of freedom
##   (18 observations deleted due to missingness)
## AIC: 14530
## 
## Number of Fisher Scoring iterations: 7
coef(model_1)
## (Intercept)         age 
## -3.32543802 -0.02348987

This first model shows that the intercept shows that if age = 0, then there is a -3.325 chance of police involvement in a gun related death. However, the coefficient for age is negative. Therefore we can assume there is not a statistically significant relationship between age and police involvement.

Model 2

For this second model, I have introduced two more independent variables, race and sex, in order to determine their significance to police involvement in the gun related death.

model_2 <- glm(police ~ age + race + sex, family = "binomial", data = gun_data)
summary(model_2)
## 
## Call:
## glm(formula = police ~ age + race + sex, family = "binomial", 
##     data = gun_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.3620  -0.1909  -0.1554  -0.1226   3.5838  
## 
## Coefficients:
##                                     Estimate Std. Error z value Pr(>|z|)
## (Intercept)                        -4.138176   0.228572 -18.104  < 2e-16
## age                                -0.020740   0.001724 -12.032  < 2e-16
## raceBlack                          -0.604649   0.192958  -3.134  0.00173
## raceHispanic                        0.182220   0.194938   0.935  0.34991
## raceNative American/Native Alaskan  0.116695   0.274925   0.424  0.67123
## raceWhite                          -0.560773   0.189592  -2.958  0.00310
## sexM                                1.263397   0.129516   9.755  < 2e-16
##                                       
## (Intercept)                        ***
## age                                ***
## raceBlack                          ** 
## raceHispanic                          
## raceNative American/Native Alaskan    
## raceWhite                          ** 
## sexM                               ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 14772  on 100779  degrees of freedom
## Residual deviance: 14265  on 100773  degrees of freedom
##   (18 observations deleted due to missingness)
## AIC: 14279
## 
## Number of Fisher Scoring iterations: 8
coef(model_2)
##                        (Intercept)                                age 
##                        -4.13817619                        -0.02073968 
##                          raceBlack                       raceHispanic 
##                        -0.60464875                         0.18222022 
## raceNative American/Native Alaskan                          raceWhite 
##                         0.11669487                        -0.56077274 
##                               sexM 
##                         1.26339655

In this second model, age continues to be insignificant to police involvement. However, we can see that gender is significant. There is a statistically significant relationship between sex (male) and police involvement. There is also a significant relationship between race (Hispanic & Native American/Native Alaskan) and police involvement.

Model 3

For this third model, I have added the variables location (place) and education.

model_3 <- glm(police ~ age + race * sex + place  + education, family = "binomial", data = gun_data)
summary(model_3)
## 
## Call:
## glm(formula = police ~ age + race * sex + place + education, 
##     family = "binomial", data = gun_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.1111  -0.0183  -0.0110  -0.0067   4.5940  
## 
## Coefficients:
##                                           Estimate Std. Error z value
## (Intercept)                             -3.880e+01  9.732e+03  -0.004
## age                                     -2.535e-02  1.475e-02  -1.719
## raceBlack                               -7.503e-01  8.171e+03   0.000
## raceHispanic                            -5.343e-01  8.599e+03   0.000
## raceNative American/Native Alaskan      -4.552e-01  1.313e+04   0.000
## raceWhite                                2.255e-01  7.861e+03   0.000
## sexM                                    -2.823e-01  8.626e+03   0.000
## placeHome                                1.480e+01  5.853e+03   0.003
## placeIndustrial/construction             1.726e-01  9.902e+03   0.000
## placeOther specified                     1.585e+01  5.853e+03   0.003
## placeOther unspecified                   1.773e+01  5.853e+03   0.003
## placeResidential institution             4.155e-01  1.043e+04   0.000
## placeSchool/instiution                   3.308e-01  7.520e+03   0.000
## placeSports                              3.901e-01  1.256e+04   0.000
## placeStreet                              1.721e+01  5.853e+03   0.003
## placeTrade/service area                  1.668e+01  5.853e+03   0.003
## education                               -2.704e-01  2.745e-01  -0.985
## raceBlack:sexM                           1.552e+01  8.984e+03   0.002
## raceHispanic:sexM                        1.607e+01  9.375e+03   0.002
## raceNative American/Native Alaskan:sexM  3.830e-01  1.433e+04   0.000
## raceWhite:sexM                           1.631e+01  8.703e+03   0.002
##                                         Pr(>|z|)  
## (Intercept)                               0.9968  
## age                                       0.0856 .
## raceBlack                                 0.9999  
## raceHispanic                              1.0000  
## raceNative American/Native Alaskan        1.0000  
## raceWhite                                 1.0000  
## sexM                                      1.0000  
## placeHome                                 0.9980  
## placeIndustrial/construction              1.0000  
## placeOther specified                      0.9978  
## placeOther unspecified                    0.9976  
## placeResidential institution              1.0000  
## placeSchool/instiution                    1.0000  
## placeSports                               1.0000  
## placeStreet                               0.9977  
## placeTrade/service area                   0.9977  
## education                                 0.3245  
## raceBlack:sexM                            0.9986  
## raceHispanic:sexM                         0.9986  
## raceNative American/Native Alaskan:sexM   1.0000  
## raceWhite:sexM                            0.9985  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 363.35  on 99342  degrees of freedom
## Residual deviance: 323.22  on 99322  degrees of freedom
##   (1455 observations deleted due to missingness)
## AIC: 365.22
## 
## Number of Fisher Scoring iterations: 23

From this third model, I can interpret that education level has a small significant to police involvement and location has a high significance with police involvement. I can also see from this model that police involvement in the gun related death is higher for Black, Hispanic, and Native American/Native Alaskan males.

AIC vs. BIC

Statistical models
Model 1 Model 2 Model 3
(Intercept) -3.33*** -4.14*** -38.80
(0.06) (0.23) (9731.84)
age -0.02*** -0.02*** -0.03
(0.00) (0.00) (0.01)
raceBlack -0.60** -0.75
(0.19) (8170.76)
raceHispanic 0.18 -0.53
(0.19) (8599.49)
raceNative American/Native Alaskan 0.12 -0.46
(0.27) (13133.05)
raceWhite -0.56** 0.23
(0.19) (7860.81)
sexM 1.26*** -0.28
(0.13) (8625.51)
placeHome 14.80
(5852.96)
placeIndustrial/construction 0.17
(9901.52)
placeOther specified 15.85
(5852.96)
placeOther unspecified 17.73
(5852.96)
placeResidential institution 0.42
(10427.97)
placeSchool/instiution 0.33
(7520.39)
placeSports 0.39
(12558.14)
placeStreet 17.21
(5852.96)
placeTrade/service area 16.68
(5852.96)
education -0.27
(0.27)
raceBlack:sexM 15.52
(8983.82)
raceHispanic:sexM 16.07
(9375.44)
raceNative American/Native Alaskan:sexM 0.38
(14329.21)
raceWhite:sexM 16.31
(8702.88)
AIC 14530.14 14279.27 365.22
BIC 14549.18 14345.91 564.85
Log Likelihood -7263.07 -7132.63 -161.61
Deviance 14526.14 14265.27 323.22
Num. obs. 100780 100780 99343
p < 0.001, p < 0.01, p < 0.05

Both the AIC and BIC test show that models get better as more variables are added. From this it is seen that model 3 is the best model because it has the lowest values.

Visuals

Graph 1 - Model 2

visreg(model_2, "age", by ="police",  scale = "response", line = list (col = "slateblue2"), fill = list(col = "violet"))

This graph shows that as age increases the likelihood of police involvement in the gun related death decreases.

Graph 2 - Model 3

visreg(model_3, "age", by ="police", scale = "response", line = list (col = "dodgerblue1"), fill = list(col = "turquoise2"))

This graph (using model 3) is consistent with the first (using model 2) and shows that as age increases the likelihood of police involvement in the gun related death decreases.

Graph 3 - Model 3

visreg(model_3, "education", by ="police",  scale = "response", line = list (col = "red"), fill = list(col = "black"))

This graph doesn’t show a clear significance between level of education and police involvement in the gun related death.