SOC 712 HW# 5

Rachel Ramphal

03/17/19

Gun Related Deaths in the United States (2012-2014)

Importing Packages

library(dplyr)
library(Zelig)
library(tidyverse)
library(pander)
library(knitr)
library(visreg)
library(readr)
gun_data <- read_csv("/Users/rachel_ramphal/Documents/Data Sets/guns.csv")
## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   year = col_double(),
##   month = col_character(),
##   intent = col_character(),
##   police = col_double(),
##   sex = col_character(),
##   age = col_double(),
##   race = col_character(),
##   hispanic = col_double(),
##   place = col_character(),
##   education = col_double()
## )

Data Preview

head(gun_data)
## # A tibble: 6 x 11
##      X1  year month intent police sex     age race  hispanic place
##   <dbl> <dbl> <chr> <chr>   <dbl> <chr> <dbl> <chr>    <dbl> <chr>
## 1     1  2012 01    Suici…      0 M        34 Asia…      100 Home 
## 2     2  2012 01    Suici…      0 F        21 White      100 Stre…
## 3     3  2012 01    Suici…      0 M        60 White      100 Othe…
## 4     4  2012 02    Suici…      0 M        64 White      100 Home 
## 5     5  2012 02    Suici…      0 M        31 White      100 Othe…
## 6     6  2012 02    Suici…      0 M        17 Nati…      100 Home 
## # … with 1 more variable: education <dbl>

Creating Models

Model 1

For this first model I have created a linear model examining the relationship between education (independent variable) and police involvement (dependent variable) in the gun related death. The police variable has two options: 0 = no police involvement, 1 = police involvement.

m1 <- lm(police ~ education, data = gun_data)
summary(m1)
## 
## Call:
## lm(formula = police ~ education, data = gun_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.01824 -0.01491 -0.01491 -0.01157  0.99510 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0215775  0.0009318  23.157   <2e-16 ***
## education   -0.0033362  0.0003726  -8.954   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1171 on 100743 degrees of freedom
##   (53 observations deleted due to missingness)
## Multiple R-squared:  0.0007951,  Adjusted R-squared:  0.0007852 
## F-statistic: 80.17 on 1 and 100743 DF,  p-value: < 2.2e-16

Police involvement was used as the dependent variable and education level was the independent variable used. The intercept is 0.0215, meaning for those with the lowest level of education (1= less than high school education) the incidence of police intervention is 0.0215 units.
The education coefficient is -0.0033. This means as education increases by 1 year the incidence of police intervention decreases by 0.0033 units.

Model 2

m2 <- lm(police ~ age, data = gun_data)
summary(m2)
## 
## Call:
## lm(formula = police ~ age, data = gun_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.02644 -0.01873 -0.01416 -0.00959  0.99841 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0264419  0.0009072   29.14   <2e-16 ***
## age         -0.0002857  0.0000189  -15.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.117 on 100778 degrees of freedom
##   (18 observations deleted due to missingness)
## Multiple R-squared:  0.002262,   Adjusted R-squared:  0.002252 
## F-statistic: 228.5 on 1 and 100778 DF,  p-value: < 2.2e-16

Police intervention was used as the dependent variable again but now it is measured with age as the independent variable. The intercept of this model is 0.0264, meaning people of age 0 have 0.0264 log odds of police involvement in the death.
The age coefficient is -0.000286 showing that when age increases by one year, the log odd of police being involved in the gun related death decreases by 0.000286 units.

Model 3

m3 <- lm(police ~ education + age, data = gun_data)
summary(m3)
## 
## Call:
## lm(formula = police ~ education + age, data = gun_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.02813 -0.01867 -0.01399 -0.00926  0.99832 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.025e-02  1.134e-03   26.68  < 2e-16 ***
## education   -2.117e-03  3.834e-04   -5.52 3.39e-08 ***
## age         -2.614e-04  1.948e-05  -13.42  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.117 on 100724 degrees of freedom
##   (71 observations deleted due to missingness)
## Multiple R-squared:  0.002577,   Adjusted R-squared:  0.002557 
## F-statistic: 130.1 on 2 and 100724 DF,  p-value: < 2.2e-16

Police intervention (dependent variable) was used with age and education (independent variables) in this linear model. The intercept is 3.025e-02, this means that for those with the lowest education level (1), the incidence of police involvement is 3.025e-02 units.
The education coefficient is -2.117e-03 showing as education increases by 1 year the incidence of police involvement in the gun related death decreases by 2.117e-03 units. The age coefficient (-2.164e-04) shows that for every one year increase in age the log odd of police being involved in the gun related death decreases by 2.164e-04 units.

Model 4

m4 <- lm(police ~ education + sex + age, data = gun_data)
summary(m4)
## 
## Call:
## lm(formula = police ~ education + sex + age, data = gun_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.02947 -0.01969 -0.01434 -0.00854  1.00338 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.048e-02  1.480e-03  13.834  < 2e-16 ***
## education   -1.820e-03  3.843e-04  -4.737 2.18e-06 ***
## sexM         1.082e-02  1.055e-03  10.253  < 2e-16 ***
## age         -2.655e-04  1.947e-05 -13.635  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1169 on 100723 degrees of freedom
##   (71 observations deleted due to missingness)
## Multiple R-squared:  0.003617,   Adjusted R-squared:  0.003587 
## F-statistic: 121.9 on 3 and 100723 DF,  p-value: < 2.2e-16

In this model I added another independent variable, sex, to determine its relationship to police involvement in the gun related death along with education and age.
The intercept is showing that those with the lowest education level (1) have a log odd of 2.048e-02 units of police being involved in the gun related death. The education coefficient (-1.820e-03) shows that for every one year increase in education the incidence of police involvement in the gun related death decreases by 1.820e-03 units. The sexM coefficient shows that compared to women, men have a 1.082e-02 higher log odd of police being involved in their death. The age coefficient shows that as age increases by one year, the log odd of police being involved in the gun related death decreases by 2.655e-04 units.

Comparing Models

Statistical models
Model 1 Model 2 Model 3 Model 4
(Intercept) 0.02*** 0.03*** 0.03*** 0.02***
(0.00) (0.00) (0.00) (0.00)
education -0.00*** -0.00*** -0.00***
(0.00) (0.00) (0.00)
age -0.00*** -0.00*** -0.00***
(0.00) (0.00) (0.00)
sexM 0.01***
(0.00)
R2 0.00 0.00 0.00 0.00
Adj. R2 0.00 0.00 0.00 0.00
Num. obs. 100745 100780 100727 100727
RMSE 0.12 0.12 0.12 0.12
p < 0.001, p < 0.01, p < 0.05

The best model created is hard to determine from this table because they all have the same R^2 value. The coefficients for each variable were so small that they do not appear in this table. I think from the data collected it is hard to determine a significant relationship between any of the variables.

Visuals

Graph 1

visreg(m2,"age", by ="police",  scale = "response", line = list (col = "blue4"), fill = list(col = "cornflowerblue"))

This graph shows that as age increases the log odd of police involvement in the gun related death also decreases. However, this is also shown for the responses where police were not involved either. This may be unreliable.

Graph 2

visreg(m3,"education", by ="police",  scale = "response", line = list (col = "gold"), fill = list(col = "deeppink"))

This graph shows that as the level of education increases the log odd of police being involved in the gun related death decreases. This is also shown to be true where police officers are not involved at all. This shows a trend that there are less incidents of gun related deaths among individuals with higher education levels.

Graph 3

visreg(m4,"sex", by ="police",  scale = "response", line = list (col = "mediumpurple1"), fill = list(col = "limegreen"))

This graph shows that males have a higher incidence of police being involved in their gun related death than women. It is shown that women have a higher number of gun related deaths with no police involvement.

Graph 4

visreg(m4,"age", by ="police",  scale = "response", line = list (col = "midnightblue"), fill = list(col = "orange"))

This graph also shows that as age increases the log odd of police involvement in the gun related death decreases. This was used to show this is still true when new variables are included in the analysis.

Conclusion

Overall, it is shown that those who have lower education levels have a higher log odd of police being involved in their gun related death. As well younger people. Males are also shown to have a higher log odd (than women) for police involvement in their gun related death.
However, all the numbers found are very small therefore I think it is difficult to determine if these correlations found are significant.