So, I’ll start this off by saying I spent my Friday working on this assignment and started off with a much more complex model that took lots of time to process (I applied 4 group level predictors). At the end of it, I had a very hard time trying to contextualize the story that my final output was trying to tell me (not to mention my computer was starting to smoke). After rereading the asignment, I decided build a much simpler model that I’d have an easier time unpacking the results.

I will use a discrete outcome, graduating high school (or GED equivalency) [coded as hs = 1 being positive]. My relevant individual predictors are: gender and race [coded as male = 1 for male, and likewise for white, black, hispanic and other].

My group level variable will be percentage of people in poverty.

Research Question: Does poverty effect the ability of someone to graduate High School when accounting for race, gender and age?

## Warning: package 'bindrcpp' was built under R version 3.4.4

1. First fit a random intercept model

#Fitting Random intercept model

fit.mix.bin<-glmer(hs ~ male+ black+hispanic+other + (1|cofips),
                   family=binomial, data=merged, 
                   control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
library(MuMIn)
## Warning: package 'MuMIn' was built under R version 3.4.4
r.squaredGLMM(fit.mix.bin)
## Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help
## page.
## Warning: The null model is correct only if all variables used by the
## original model remain unchanged.
##                    R2m        R2c
## theoretical 0.10926565 0.15181774
## delta       0.02864403 0.03979908
cor(fitted(fit.mix.bin),fit.mix.bin@frame$`hs`)^2
## [1] 0.08071075
#Tests of coefficients
display(fit.mix.bin, detail=T, digits=4)
## glmer(formula = hs ~ male + black + hispanic + other + (1 | cofips), 
##     data = merged, family = binomial, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)))
##             coef.est coef.se  z value  Pr(>|z|)
## (Intercept)   3.1276   0.0318  98.2742   0.0000
## male         -0.0712   0.0172  -4.1516   0.0000
## black1       -0.9825   0.0274 -35.9069   0.0000
## hispanic1    -2.1784   0.0230 -94.8722   0.0000
## other1       -0.5745   0.0349 -16.4724   0.0000
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  cofips   (Intercept) 0.4063  
##  Residual             1.0000  
## ---
## number of obs: 220071, groups: cofips, 210
## AIC = 105474, DIC = 104129.4
## deviance = 104795.4
#confidence intervals for the Odds ratio
exp(confint(fit.mix.bin, method="Wald"))
##                  2.5 %     97.5 %
## .sig01              NA         NA
## (Intercept) 21.4385767 24.2870518
## male         0.9004626  0.9630999
## black1       0.3548381  0.3950121
## hispanic1    0.1082410  0.1184354
## other1       0.5257997  0.6028261

Here we can see that all of the predictors are significant, with a negative correlation, however there is a fairly high standard deviation. R-squared is fairly low.

2. Then fit the random intercept model with a group level variable as a predictor

#adding group variable poverty
fit.mix.bin.ml<-glmer(hs ~ male + black+hispanic+other + ppov +(1|cofips),nAGQ=10,
                      family=binomial, data=merged,
                      control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=4e5)))

r.squaredGLMM(fit.mix.bin.ml)
## Warning: The null model is correct only if all variables used by the
## original model remain unchanged.
##                    R2m        R2c
## theoretical 0.12748285 0.16182750
## delta       0.03371534 0.04279845
display(fit.mix.bin.ml, detail=T, digits=4)
## glmer(formula = hs ~ male + black + hispanic + other + ppov + 
##     (1 | cofips), data = merged, family = binomial, control = glmerControl(optimizer = "bobyqa", 
##     optCtrl = list(maxfun = 4e+05)), nAGQ = 10)
##             coef.est coef.se  z value  Pr(>|z|)
## (Intercept)   3.0097   0.0343  87.7527   0.0000
## male         -0.0718   0.0172  -4.1735   0.0000
## black1       -0.9714   0.0274 -35.4549   0.0000
## hispanic1    -2.1642   0.0231 -93.6794   0.0000
## other1       -0.5674   0.0350 -16.2079   0.0000
## ppov         -0.3006   0.0459  -6.5550   0.0000
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  cofips   (Intercept) 0.3672  
##  Residual             1.0000  
## ---
## number of obs: 220071, groups: cofips, 210
## AIC = 105436, DIC = 104174.1
## deviance = 104797.9
#confidence intervals for the Odds ratio
exp(confint(fit.mix.bin.ml, method="Wald"))
##                  2.5 %     97.5 %
## .sig01              NA         NA
## (Intercept) 18.9631663 21.6919943
## male         0.8999204  0.9626590
## black1       0.3587674  0.3994431
## hispanic1    0.1097529  0.1201561
## other1       0.5294175  0.6072841
## ppov         0.6767472  0.8100118

Again, all of the predictors are significant, with negative correlation. Our standard deviation has gone down, R-squared has improved, and the AIC decreased, although our DIC and deviance have both increased slightly.

3. Compare the fits using a likelihood ratio test. Is there justification for including the group level predictor?

#Comparing the two models
anova(fit.mix.bin, fit.mix.bin.ml)
## Data: merged
## Models:
## fit.mix.bin: hs ~ male + black + hispanic + other + (1 | cofips)
## fit.mix.bin.ml: hs ~ male + black + hispanic + other + ppov + (1 | cofips)
##                Df    AIC    BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
## fit.mix.bin     6 105474 105535 -52731   105462                         
## fit.mix.bin.ml  7 105436 105508 -52711   105422 39.836      1  2.762e-10
##                   
## fit.mix.bin       
## fit.mix.bin.ml ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So across the board, with regards to our likelihood ratio test, all indications show that including the group level variable is a stronger/better model. Both the AIC and BIC are smaller, the loglikelihood is closer to zero, the variance has decreased, and the CHi Sq test statistic is rather significant at almost 40, producing a very small p-value. We can state with confidencce, at the 0.05 level of significance, that there is justication for adding the group level variable.

5.Estimate a cross-level Interaction model

#cross level interaction using ppov*race
fit.mix.bin.cl<-glmer(hs~male+agez+(black+hispanic+other)*ppov+ (1|cofips),nAGQ=10,
                      family=binomial, data=merged,
                      control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=4e5)))
display(fit.mix.bin.cl, detail=T)
## glmer(formula = hs ~ male + agez + (black + hispanic + other) * 
##     ppov + (1 | cofips), data = merged, family = binomial, control = glmerControl(optimizer = "bobyqa", 
##     optCtrl = list(maxfun = 4e+05)), nAGQ = 10)
##                coef.est coef.se z value Pr(>|z|)
## (Intercept)      3.13     0.04   85.17    0.00  
## male            -0.11     0.02   -6.27    0.00  
## agez            -0.33     0.01  -35.86    0.00  
## black1          -1.07     0.03  -36.73    0.00  
## hispanic1       -2.40     0.03  -86.60    0.00  
## other1          -0.74     0.04  -18.15    0.00  
## ppov            -0.26     0.05   -5.03    0.00  
## black1:ppov     -0.03     0.05   -0.54    0.59  
## hispanic1:ppov  -0.01     0.04   -0.29    0.77  
## other1:ppov     -0.04     0.06   -0.65    0.51  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  cofips   (Intercept) 0.37    
##  Residual             1.00    
## ---
## number of obs: 220071, groups: cofips, 210
## AIC = 104119, DIC = 102837.8
## deviance = 103467.2
exp(confint(fit.mix.bin.cl, method="Wald"))
##                     2.5 %      97.5 %
## .sig01                 NA          NA
## (Intercept)    21.2200793 24.50459172
## male            0.8672607  0.92815277
## agez            0.7075028  0.73333458
## black1          0.3232076  0.36238856
## hispanic1       0.0856753  0.09551869
## other1          0.4390057  0.51543695
## ppov            0.6936614  0.85162259
## black1:ppov     0.8771834  1.07740218
## hispanic1:ppov  0.9062339  1.07591365
## other1:ppov     0.8465772  1.08701924
anova(fit.mix.bin.ml, fit.mix.bin.cl)
## Data: merged
## Models:
## fit.mix.bin.ml: hs ~ male + black + hispanic + other + ppov + (1 | cofips)
## fit.mix.bin.cl: hs ~ male + agez + (black + hispanic + other) * ppov + (1 | cofips)
##                Df    AIC    BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## fit.mix.bin.ml  7 105436 105508 -52711   105422                        
## fit.mix.bin.cl 11 104119 104232 -52048   104097  1325      4  < 2.2e-16
##                   
## fit.mix.bin.ml    
## fit.mix.bin.cl ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

a. Interpret the cross level interaction if it is significant.

Based on our Z-values (all <= 2) there is not sufficient evidence to conclude that the effects of cross level interaction are significant.

b. Is there evidence that the interaction is significant?

Based on the comparison of the mixed level model to the cross level interaction model, there seems to be some evidince that interaction is significant as there is a very strong Chi sq test statistic, and correspondingly low p-value when we do a likelihood ratio test between the two. To be honest, I’m a little surprised by the result and not quite sure how to interpret it?