I have chosen the dataset “Gun Deaths in the U.S.: 2012-2014” from Kaggle. The dataset contains information about the sex, age, race, education, police involvement, and intent of gun deaths between 2012 and 2014.
library(dplyr)
library(Zelig)
library(pander)
library(visreg)
library(readr)
gun_data <- read_csv("/Users/rachel_ramphal/Documents/guns.csv")
## Parsed with column specification:
## cols(
## X1 = col_double(),
## year = col_double(),
## month = col_character(),
## intent = col_character(),
## police = col_double(),
## sex = col_character(),
## age = col_double(),
## race = col_character(),
## hispanic = col_double(),
## place = col_character(),
## education = col_double()
## )
For the first model I will examine the independent variable, age, and the dependent variable, police_new (this variable determines if there was any police involvement) to determine if there is a significant relationship between the two.
model_1 <- glm(police ~ age, family = "binomial", data = gun_data)
summary(model_1)
##
## Call:
## glm(formula = police ~ age, family = "binomial", data = gun_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.2658 -0.1944 -0.1613 -0.1338 3.2783
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.32544 0.06313 -52.68 <2e-16 ***
## age -0.02349 0.00158 -14.87 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 14772 on 100779 degrees of freedom
## Residual deviance: 14526 on 100778 degrees of freedom
## (18 observations deleted due to missingness)
## AIC: 14530
##
## Number of Fisher Scoring iterations: 7
coef(model_1)
## (Intercept) age
## -3.32543802 -0.02348987
This first model shows that the intercept shows that if age = 0, then there is a -3.325 chance of police involvement in a gun related death. However, the coefficient for age is negative. Therefore we can assume there is not a statistically significant relationship between age and police involvement.
For this second model, I have introduced two more independent variables, race and sex, in order to determine their significance to police involvement in the gun related death.
model_2 <- glm(police ~ age + race + sex, family = "binomial", data = gun_data)
summary(model_2)
##
## Call:
## glm(formula = police ~ age + race + sex, family = "binomial",
## data = gun_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.3620 -0.1909 -0.1554 -0.1226 3.5838
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.138176 0.228572 -18.104 < 2e-16
## age -0.020740 0.001724 -12.032 < 2e-16
## raceBlack -0.604649 0.192958 -3.134 0.00173
## raceHispanic 0.182220 0.194938 0.935 0.34991
## raceNative American/Native Alaskan 0.116695 0.274925 0.424 0.67123
## raceWhite -0.560773 0.189592 -2.958 0.00310
## sexM 1.263397 0.129516 9.755 < 2e-16
##
## (Intercept) ***
## age ***
## raceBlack **
## raceHispanic
## raceNative American/Native Alaskan
## raceWhite **
## sexM ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 14772 on 100779 degrees of freedom
## Residual deviance: 14265 on 100773 degrees of freedom
## (18 observations deleted due to missingness)
## AIC: 14279
##
## Number of Fisher Scoring iterations: 8
coef(model_2)
## (Intercept) age
## -4.13817619 -0.02073968
## raceBlack raceHispanic
## -0.60464875 0.18222022
## raceNative American/Native Alaskan raceWhite
## 0.11669487 -0.56077274
## sexM
## 1.26339655
In this second model, age continues to be insignificant to police involvement. However, we can see that gender is significant. There is a statistically significant relationship between sex (male) and police involvement. There is also a significant relationship between race (Hispanic & Native American/Native Alaskan) and police involvement.
For this third model, I have added the variables location (place) and education.
model_3 <- glm(police ~ age + race * sex + place + education, family = "binomial", data = gun_data)
summary(model_3)
##
## Call:
## glm(formula = police ~ age + race * sex + place + education,
## family = "binomial", data = gun_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.1111 -0.0183 -0.0110 -0.0067 4.5940
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) -3.880e+01 9.732e+03 -0.004
## age -2.535e-02 1.475e-02 -1.719
## raceBlack -7.503e-01 8.171e+03 0.000
## raceHispanic -5.343e-01 8.599e+03 0.000
## raceNative American/Native Alaskan -4.552e-01 1.313e+04 0.000
## raceWhite 2.255e-01 7.861e+03 0.000
## sexM -2.823e-01 8.626e+03 0.000
## placeHome 1.480e+01 5.853e+03 0.003
## placeIndustrial/construction 1.726e-01 9.902e+03 0.000
## placeOther specified 1.585e+01 5.853e+03 0.003
## placeOther unspecified 1.773e+01 5.853e+03 0.003
## placeResidential institution 4.155e-01 1.043e+04 0.000
## placeSchool/instiution 3.308e-01 7.520e+03 0.000
## placeSports 3.901e-01 1.256e+04 0.000
## placeStreet 1.721e+01 5.853e+03 0.003
## placeTrade/service area 1.668e+01 5.853e+03 0.003
## education -2.704e-01 2.745e-01 -0.985
## raceBlack:sexM 1.552e+01 8.984e+03 0.002
## raceHispanic:sexM 1.607e+01 9.375e+03 0.002
## raceNative American/Native Alaskan:sexM 3.830e-01 1.433e+04 0.000
## raceWhite:sexM 1.631e+01 8.703e+03 0.002
## Pr(>|z|)
## (Intercept) 0.9968
## age 0.0856 .
## raceBlack 0.9999
## raceHispanic 1.0000
## raceNative American/Native Alaskan 1.0000
## raceWhite 1.0000
## sexM 1.0000
## placeHome 0.9980
## placeIndustrial/construction 1.0000
## placeOther specified 0.9978
## placeOther unspecified 0.9976
## placeResidential institution 1.0000
## placeSchool/instiution 1.0000
## placeSports 1.0000
## placeStreet 0.9977
## placeTrade/service area 0.9977
## education 0.3245
## raceBlack:sexM 0.9986
## raceHispanic:sexM 0.9986
## raceNative American/Native Alaskan:sexM 1.0000
## raceWhite:sexM 0.9985
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 363.35 on 99342 degrees of freedom
## Residual deviance: 323.22 on 99322 degrees of freedom
## (1455 observations deleted due to missingness)
## AIC: 365.22
##
## Number of Fisher Scoring iterations: 23
From this third model, I can interpret that education level has a small significant to police involvement and location has a high significance with police involvement. I can also see from this model that police involvement in the gun related death is higher for Black, Hispanic, and Native American/Native Alaskan males.
| Model 1 | Model 2 | Model 3 | ||
|---|---|---|---|---|
| (Intercept) | -3.33*** | -4.14*** | -38.80 | |
| (0.06) | (0.23) | (9731.84) | ||
| age | -0.02*** | -0.02*** | -0.03 | |
| (0.00) | (0.00) | (0.01) | ||
| raceBlack | -0.60** | -0.75 | ||
| (0.19) | (8170.76) | |||
| raceHispanic | 0.18 | -0.53 | ||
| (0.19) | (8599.49) | |||
| raceNative American/Native Alaskan | 0.12 | -0.46 | ||
| (0.27) | (13133.05) | |||
| raceWhite | -0.56** | 0.23 | ||
| (0.19) | (7860.81) | |||
| sexM | 1.26*** | -0.28 | ||
| (0.13) | (8625.51) | |||
| placeHome | 14.80 | |||
| (5852.96) | ||||
| placeIndustrial/construction | 0.17 | |||
| (9901.52) | ||||
| placeOther specified | 15.85 | |||
| (5852.96) | ||||
| placeOther unspecified | 17.73 | |||
| (5852.96) | ||||
| placeResidential institution | 0.42 | |||
| (10427.97) | ||||
| placeSchool/instiution | 0.33 | |||
| (7520.39) | ||||
| placeSports | 0.39 | |||
| (12558.14) | ||||
| placeStreet | 17.21 | |||
| (5852.96) | ||||
| placeTrade/service area | 16.68 | |||
| (5852.96) | ||||
| education | -0.27 | |||
| (0.27) | ||||
| raceBlack:sexM | 15.52 | |||
| (8983.82) | ||||
| raceHispanic:sexM | 16.07 | |||
| (9375.44) | ||||
| raceNative American/Native Alaskan:sexM | 0.38 | |||
| (14329.21) | ||||
| raceWhite:sexM | 16.31 | |||
| (8702.88) | ||||
| AIC | 14530.14 | 14279.27 | 365.22 | |
| BIC | 14549.18 | 14345.91 | 564.85 | |
| Log Likelihood | -7263.07 | -7132.63 | -161.61 | |
| Deviance | 14526.14 | 14265.27 | 323.22 | |
| Num. obs. | 100780 | 100780 | 99343 | |
| p < 0.001, p < 0.01, p < 0.05 | ||||
Both the AIC and BIC test show that models get better as more variables are added. From this it is seen that model 3 is the best model because it has the lowest values.
visreg(model_2, "age", by ="police", scale = "response", line = list (col = "slateblue2"), fill = list(col = "violet"))
This graph shows that as age increases the likelihood of police involvement in the gun related death decreases.
visreg(model_3, "age", by ="police", scale = "response", line = list (col = "dodgerblue1"), fill = list(col = "turquoise2"))
This graph (using model 3) is consistent with the first (using model 2) and shows that as age increases the likelihood of police involvement in the gun related death decreases.
visreg(model_3, "education", by ="police", scale = "response", line = list (col = "red"), fill = list(col = "black"))
This graph doesn’t show a clear significance between level of education and police involvement in the gun related death.