I am working with a shortened version of the 2012 General Social Survey Data. This dataset contains a smaller number of variables, some of which I look at here. I am interested in looking at which factors predict gun ownership.

library("Zelig")
## Loading required package: MASS
## Loading required package: boot
## ## 
## ##  Zelig (Version 3.5.3, built: 2011-11-29)
## ##  Please refer to http://gking.harvard.edu/zelig for full
## ##  documentation or help.zelig() for help with commands and
## ##  models supported by Zelig.
## ##
## 
## ##  Zelig project citations:
## ##    Kosuke Imai, Gary King, and Olivia Lau. (2009).
## ##    ``Zelig: Everyone's Statistical Software,''
## ##    http://gking.harvard.edu/zelig.
## ##  and
## ##    Kosuke Imai, Gary King, and Olivia Lau. (2008).
## ##    ``Toward A Common Framework for Statistical Analysis
## ##    and Development,'' Journal of Computational and
## ##    Graphical Statistics, Vol. 17, No. 4 (December)
## ##    pp. 892-913. 
## 
## ##  To cite individual Zelig models, please use the citation format printed with
## ##  each model run and in the documentation.
## ##
library("DescTools")
library("dplyr")
## Warning: package 'dplyr' was built under R version 3.1.3
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:MASS':
## 
##     select
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("stargazer")
## 
## Please cite as: 
## 
##  Hlavac, Marek (2014). stargazer: LaTeX code and ASCII text for well-formatted regression and summary statistics tables.
##  R package version 5.1. http://CRAN.R-project.org/package=stargazer
library("readstata13")
## Warning: package 'readstata13' was built under R version 3.1.3
library("foreign")
library("car")
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:DescTools':
## 
##     Recode
## 
## The following object is masked from 'package:boot':
## 
##     logit
d <- read.dta("C:/Users/Abigail Walsh/Documents/Grad School/Queen's College/Basic Analytics/Basic Analytics/GSS.dta")
names(d)
##  [1] "CASEID"   "WORKBLKS" "RACDIF1"  "RACMAR"   "RACDIF2"  "RACDIF3" 
##  [7] "HELPBLK"  "HELPPOOR" "YEAR"     "SEX"      "AGE"      "RACE"    
## [13] "REALINC"  "REALRINC" "EDUC"     "DEGREE"   "PRESTG80" "PAPRES80"
## [19] "MARITAL"  "DIVORCE"  "CHILDS"   "RELIG"    "WRKSLF"   "UNEMP"   
## [25] "REGION"   "SIZE"     "RACLIVE"  "FEAR"     "GUN"      "POLVIEWS"
## [31] "FECHLD"   "FEFAM"
Final <- select(d, WORKBLKS, RACDIF1, RACDIF2, RACDIF3, RACMAR, HELPBLK, HELPPOOR, YEAR, SEX, AGE, RACE, REALINC, EDUC, DEGREE, RELIG, UNEMP, REGION, RACLIVE, FEAR, GUN, POLVIEWS)
names(Final)
##  [1] "WORKBLKS" "RACDIF1"  "RACDIF2"  "RACDIF3"  "RACMAR"   "HELPBLK" 
##  [7] "HELPPOOR" "YEAR"     "SEX"      "AGE"      "RACE"     "REALINC" 
## [13] "EDUC"     "DEGREE"   "RELIG"    "UNEMP"    "REGION"   "RACLIVE" 
## [19] "FEAR"     "GUN"      "POLVIEWS"
df <- data.frame(GUN = c("NO" , "YES", NA),stringsAsFactors=FALSE)
Final$REGION=as.numeric(Final$REGION)
Final$POLVIEWS=as.numeric(Final$POLVIEWS)
Final$RACE=as.numeric(Final$RACE)
Final$RACDIF2=as.numeric(Final$RACDIF2)

I created three models to help look at the data. The first model focused on education and age demographic information of participants. The second model considered additional information such as participant race and political leanings in this case how conservative they were in their political views. The third model adds in the measurement for racial attitudes, considering participants response to racial differences in education between Whites and Black being due to an inborn lack of ability for Blacks to learn.

model1 <- zelig(GUN ~ EDUC + AGE,
                data = Final, model="logit")
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
model2 <- zelig(GUN ~ EDUC + AGE + RACE + POLVIEWS, data = Final, model="logit")
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
model3 <- zelig(GUN ~ EDUC + AGE + RACE + POLVIEWS + RACDIF2, data=Final, model="logit")
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig

To determine the best model, I looked at the Akaike information at the bottom of the table. Based on this information, model 3 was the best fit.

stargazer(model1, model2, model3, type = "html", style = "demography", title = "Table 1: Logit Models",          
          covariate.labels = c("Education", "Age", "Race",
                               "Conservativism", "Racial Attitudies on Inborn Ability"),
          dep.var.labels   = "Gun Ownership")
Table 1: Logit Models
Gun Ownership
Model 1 Model 2 Model 3
Education 0.026*** 0.027*** 0.043**
(0.006) (0.007) (0.014)
Age 0.014*** 0.013*** 0.016***
(0.001) (0.001) (0.002)
Race -0.175*** -0.113
(0.043) (0.079)
Conservativism 0.030* 0.015
(0.015) (0.029)
Racial Attitudies on Inborn Ability 0.001
(0.105)
Constant 0.481*** 0.710*** 0.321
(0.102) (0.175) (0.449)
N 19,214 16,097 4,313
Log Likelihood -9,414.761 -8,037.523 -2,127.900
AIC 18,835.520 16,085.050 4,267.800
p < .05; p < .01; p < .001

The models listed above represent a replication from previous homework. In order to address the current homework assignment I have created two new models examining the influence of race and conservativism on gun ownership. The first model considers both race and conservativism individually. The second model considers race and conservativism individually, as well as considering the influence of any potential interaction between race and conservativism on gun ownership.

m4 <-zelig(GUN ~ RACE + POLVIEWS, data = Final, model = "logit")
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
m5 <-zelig(GUN ~ RACE + POLVIEWS + RACE:POLVIEWS, data = Final, model = "logit")
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
stargazer(m4, m5, type = "html", style = "demography", title = "Table 1: Logit Models Interaction",          
          covariate.labels = c("Race",
                               "Conservativism", "Race:Conservativism"),
          dep.var.labels   = "Gun Ownership")
Table 1: Logit Models Interaction
Gun Ownership
Model 1 Model 2
Race -0.223*** 0.077
(0.042) (0.154)
Conservativism 0.047** 0.184**
(0.015) (0.069)
Race:Conservativism -0.061*
(0.030)
Constant 1.618*** 0.944**
(0.126) (0.356)
N 16,174 16,174
Log Likelihood -8,128.982 -8,126.918
AIC 16,263.970 16,261.840
p < .05; p < .01; p < .001

Model 1 shows us that race has a statistically significant (p=.001)negative relationship on gun ownership while conservativism has a statistically significant (p=.01) positive relationship with gun ownership, meaning the more conservative the participant the more likely he or she is to own a gun. Model 2 shows us that given the introduction of the interaction between race and conservativism, the significant relationship between race and gun ownership disappears. The significant (p=.01) positive relationship between conservativism and gun ownership remains. With the introduction of the relationship of the interaction between race and conservativism on gun ownership we can see a statistically significant (p=.05) negative relationsip.

Simulation

I ran three simulations based on race of participants to see how each variable (race, conservativism, education, and age) predict the likelihood of each race to own a gun. Here 2=White, 3=Black, 4=Other.

m1 <-zelig(GUN ~ RACE + POLVIEWS + EDUC + AGE, data = Final, model = "logit")
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
m2 <-setx(m1, RACE="2")
m3 <-sim(m1, x=m2)
summary(m3)
## 
##   Model: logit 
##   Number of simulations: 1000 
## 
## Values of X 
##      (Intercept) RACE POLVIEWS     EDUC      AGE
## 4602           1    2 5.100205 12.48643 44.63403
## 
## Expected Values: E(Y|X)
##        mean         sd      2.5%     97.5%
## 1 0.8048926 0.00348936 0.7978758 0.8115482
## 
## Predicted Values: Y|X
##       0     1
## 1 0.191 0.809
m1 <-zelig(GUN ~ RACE + POLVIEWS + EDUC + AGE, data = Final, model = "logit")
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
m2 <-setx(m1, RACE="3")
m3 <-sim(m1, x=m2)
summary(m3)
## 
##   Model: logit 
##   Number of simulations: 1000 
## 
## Values of X 
##      (Intercept) RACE POLVIEWS     EDUC      AGE
## 4602           1    3 5.100205 12.48643 44.63403
## 
## Expected Values: E(Y|X)
##        mean          sd      2.5%     97.5%
## 1 0.7756251 0.007040131 0.7623491 0.7893932
## 
## Predicted Values: Y|X
##       0     1
## 1 0.223 0.777
m1 <-zelig(GUN ~ RACE + POLVIEWS + EDUC + AGE, data = Final, model = "logit")
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
m2 <-setx(m1, RACE="4")
m3 <-sim(m1, x=m2)
summary(m3)
## 
##   Model: logit 
##   Number of simulations: 1000 
## 
## Values of X 
##      (Intercept) RACE POLVIEWS     EDUC      AGE
## 4602           1    4 5.100205 12.48643 44.63403
## 
## Expected Values: E(Y|X)
##        mean        sd     2.5%    97.5%
## 1 0.7431881 0.0156075 0.712327 0.772102
## 
## Predicted Values: Y|X
##       0     1
## 1 0.262 0.738

Based on these simulations whites are predicted to be 80.1% likely to own a gun, black are predicted to be 79.8% likely to own a gun, and those who are neither black nor white are predicted to be 74.6% likelyt to own a gun.

Unfortunately, everytime I run the simulation, in order to find the predicted percentages they change. I believe this to be a result of the simulations being slightly different to account for any changes. Regardless, I cannot seem to capture a reliable percentage, so I am going to commit to those listed above which were the result of the most recent simulation run before publishing this assignment.

Difference of Differences

Final$polvies<-recode(Final$POLVIEWS, "'1'='1'; '2'='1'; '3'='1'; '4'='1'; '5'='2'; '6'='2'; '7'='2'; else=NA", levels=c('liberal', 'conservative'))
xh1 <- setx(m5, polviews = mean(Final$polviews)+sd(Final$polviews), RACE1=1)
xl1 <- setx(m5, polviews = mean(Final$polviews), RACE1=1)
xh0 <- setx(m5, polviews = mean(Final$polviews)+sd(Final$polviews), RACE1=2)
xl0 <- setx(m5, polviews = mean(Final$polviews), RACE1=2)

zh1 <- sim(m5, x=xh1)
zl1 <- sim(m5, x=xl1)
zh0 <- sim(m5, x=xh0)
zl0 <- sim(m5, x=xl0)

eff <- (zh1$qi$ev - zl1$qi$ev) -(zh0$qi$ev - zl0$qi$ev)
summary(eff)
##        V1            
##  Min.   :-1.986e-02  
##  1st Qu.:-4.568e-03  
##  Median : 6.416e-05  
##  Mean   :-3.470e-04  
##  3rd Qu.: 3.775e-03  
##  Max.   : 1.937e-02
hist(eff)

GLM with Categorical Data

To conduct this regression with categorical data I am going to use political views as the dependent varaible. This variable has seven categories ranging from very liberal to neutral to very conservative.

mod.cat<-zelig(POLVIEWS~RACE+AGE, data=Final, model="poisson")
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Oliva Lau. 2007. "poisson: Poisson Regression for Event Count Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
stargazer(mod.cat, type="text")
## 
## =============================================
##                       Dependent variable:    
##                   ---------------------------
##                            POLVIEWS          
## ---------------------------------------------
## RACE                       -0.036***         
##                             (0.004)          
##                                              
## AGE                        0.002***          
##                            (0.0001)          
##                                              
## Constant                   1.634***          
##                             (0.011)          
##                                              
## ---------------------------------------------
## Observations                45,860           
## Log Likelihood            -87,985.690        
## Akaike Inf. Crit.         175,977.400        
## =============================================
## Note:             *p<0.1; **p<0.05; ***p<0.01

Looking at the table above, the results show a significant relation between race and political views (p<.001), and age and political views (p<.001). White participants are more likely to be conservative and slightly older people are more likely to be conservative.

Simulation

m7 <-zelig(POLVIEWS ~ RACE + AGE, data = Final, model = "poisson")
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Oliva Lau. 2007. "poisson: Poisson Regression for Event Count Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
m8 <-setx(m7, RACE="2")
m9 <-sim(m7, x=m8)
summary(m9)
## 
##   Model: poisson 
##   Number of simulations: 1000 
## 
## Values of X 
##      (Intercept) RACE      AGE
## 3118           1    2 45.46426
## 
## Expected Values: E(Y|X)
##       mean        sd     2.5%    97.5%
## 1 5.145069 0.0110347 5.122918 5.166732
## 
## Predicted Values: Y|X
##    mean       sd 2.5%  97.5%
## 1 5.303 2.402699    1 10.025
plot(m9)

The above simulation shows the predicted and expected values for political views for white participants based on race and age.