So, I’ll start this off by saying I spent my Friday working on this assignment and started off with a much more complex model that took lots of time to process (I applied 4 group level predictors). At the end of it, I had a very hard time trying to contextualize the story that my final output was trying to tell me (not to mention my computer was starting to smoke). After rereading the asignment, I decided build a much simpler model that I’d have an easier time unpacking the results.
I will use a discrete outcome, graduating high school (or GED equivalency) [coded as hs = 1 being positive]. My relevant individual predictors are: gender and race [coded as male = 1 for male, and likewise for white, black, hispanic and other].
My group level variable will be percentage of people in poverty.
Research Question: Does poverty effect the ability of someone to graduate High School when accounting for race, gender and age?
## Warning: package 'bindrcpp' was built under R version 3.4.4
#Fitting Random intercept model
fit.mix.bin<-glmer(hs ~ male+ black+hispanic+other + (1|cofips),
family=binomial, data=merged,
control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
library(MuMIn)
## Warning: package 'MuMIn' was built under R version 3.4.4
r.squaredGLMM(fit.mix.bin)
## Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help
## page.
## Warning: The null model is correct only if all variables used by the
## original model remain unchanged.
## R2m R2c
## theoretical 0.10926565 0.15181774
## delta 0.02864403 0.03979908
cor(fitted(fit.mix.bin),fit.mix.bin@frame$`hs`)^2
## [1] 0.08071075
#Tests of coefficients
display(fit.mix.bin, detail=T, digits=4)
## glmer(formula = hs ~ male + black + hispanic + other + (1 | cofips),
## data = merged, family = binomial, control = glmerControl(optimizer = "bobyqa",
## optCtrl = list(maxfun = 2e+05)))
## coef.est coef.se z value Pr(>|z|)
## (Intercept) 3.1276 0.0318 98.2742 0.0000
## male -0.0712 0.0172 -4.1516 0.0000
## black1 -0.9825 0.0274 -35.9069 0.0000
## hispanic1 -2.1784 0.0230 -94.8722 0.0000
## other1 -0.5745 0.0349 -16.4724 0.0000
##
## Error terms:
## Groups Name Std.Dev.
## cofips (Intercept) 0.4063
## Residual 1.0000
## ---
## number of obs: 220071, groups: cofips, 210
## AIC = 105474, DIC = 104129.4
## deviance = 104795.4
#confidence intervals for the Odds ratio
exp(confint(fit.mix.bin, method="Wald"))
## 2.5 % 97.5 %
## .sig01 NA NA
## (Intercept) 21.4385767 24.2870518
## male 0.9004626 0.9630999
## black1 0.3548381 0.3950121
## hispanic1 0.1082410 0.1184354
## other1 0.5257997 0.6028261
Here we can see that all of the predictors are significant, with a negative correlation, however there is a fairly high standard deviation. R-squared is fairly low.
#adding group variable poverty
fit.mix.bin.ml<-glmer(hs ~ male + black+hispanic+other + ppov +(1|cofips),nAGQ=10,
family=binomial, data=merged,
control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=4e5)))
r.squaredGLMM(fit.mix.bin.ml)
## Warning: The null model is correct only if all variables used by the
## original model remain unchanged.
## R2m R2c
## theoretical 0.12748285 0.16182750
## delta 0.03371534 0.04279845
display(fit.mix.bin.ml, detail=T, digits=4)
## glmer(formula = hs ~ male + black + hispanic + other + ppov +
## (1 | cofips), data = merged, family = binomial, control = glmerControl(optimizer = "bobyqa",
## optCtrl = list(maxfun = 4e+05)), nAGQ = 10)
## coef.est coef.se z value Pr(>|z|)
## (Intercept) 3.0097 0.0343 87.7527 0.0000
## male -0.0718 0.0172 -4.1735 0.0000
## black1 -0.9714 0.0274 -35.4549 0.0000
## hispanic1 -2.1642 0.0231 -93.6794 0.0000
## other1 -0.5674 0.0350 -16.2079 0.0000
## ppov -0.3006 0.0459 -6.5550 0.0000
##
## Error terms:
## Groups Name Std.Dev.
## cofips (Intercept) 0.3672
## Residual 1.0000
## ---
## number of obs: 220071, groups: cofips, 210
## AIC = 105436, DIC = 104174.1
## deviance = 104797.9
#confidence intervals for the Odds ratio
exp(confint(fit.mix.bin.ml, method="Wald"))
## 2.5 % 97.5 %
## .sig01 NA NA
## (Intercept) 18.9631663 21.6919943
## male 0.8999204 0.9626590
## black1 0.3587674 0.3994431
## hispanic1 0.1097529 0.1201561
## other1 0.5294175 0.6072841
## ppov 0.6767472 0.8100118
Again, all of the predictors are significant, with negative correlation. Our standard deviation has gone down, R-squared has improved, and the AIC decreased, although our DIC and deviance have both increased slightly.
#Comparing the two models
anova(fit.mix.bin, fit.mix.bin.ml)
## Data: merged
## Models:
## fit.mix.bin: hs ~ male + black + hispanic + other + (1 | cofips)
## fit.mix.bin.ml: hs ~ male + black + hispanic + other + ppov + (1 | cofips)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## fit.mix.bin 6 105474 105535 -52731 105462
## fit.mix.bin.ml 7 105436 105508 -52711 105422 39.836 1 2.762e-10
##
## fit.mix.bin
## fit.mix.bin.ml ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So across the board, with regards to our likelihood ratio test, all indications show that including the group level variable is a stronger/better model. Both the AIC and BIC are smaller, the loglikelihood is closer to zero, the variance has decreased, and the CHi Sq test statistic is rather significant at almost 40, producing a very small p-value. We can state with confidencce, at the 0.05 level of significance, that there is justication for adding the group level variable.
#cross level interaction using ppov*race
fit.mix.bin.cl<-glmer(hs~male+agez+(black+hispanic+other)*ppov+ (1|cofips),nAGQ=10,
family=binomial, data=merged,
control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=4e5)))
display(fit.mix.bin.cl, detail=T)
## glmer(formula = hs ~ male + agez + (black + hispanic + other) *
## ppov + (1 | cofips), data = merged, family = binomial, control = glmerControl(optimizer = "bobyqa",
## optCtrl = list(maxfun = 4e+05)), nAGQ = 10)
## coef.est coef.se z value Pr(>|z|)
## (Intercept) 3.13 0.04 85.17 0.00
## male -0.11 0.02 -6.27 0.00
## agez -0.33 0.01 -35.86 0.00
## black1 -1.07 0.03 -36.73 0.00
## hispanic1 -2.40 0.03 -86.60 0.00
## other1 -0.74 0.04 -18.15 0.00
## ppov -0.26 0.05 -5.03 0.00
## black1:ppov -0.03 0.05 -0.54 0.59
## hispanic1:ppov -0.01 0.04 -0.29 0.77
## other1:ppov -0.04 0.06 -0.65 0.51
##
## Error terms:
## Groups Name Std.Dev.
## cofips (Intercept) 0.37
## Residual 1.00
## ---
## number of obs: 220071, groups: cofips, 210
## AIC = 104119, DIC = 102837.8
## deviance = 103467.2
exp(confint(fit.mix.bin.cl, method="Wald"))
## 2.5 % 97.5 %
## .sig01 NA NA
## (Intercept) 21.2200793 24.50459172
## male 0.8672607 0.92815277
## agez 0.7075028 0.73333458
## black1 0.3232076 0.36238856
## hispanic1 0.0856753 0.09551869
## other1 0.4390057 0.51543695
## ppov 0.6936614 0.85162259
## black1:ppov 0.8771834 1.07740218
## hispanic1:ppov 0.9062339 1.07591365
## other1:ppov 0.8465772 1.08701924
anova(fit.mix.bin.ml, fit.mix.bin.cl)
## Data: merged
## Models:
## fit.mix.bin.ml: hs ~ male + black + hispanic + other + ppov + (1 | cofips)
## fit.mix.bin.cl: hs ~ male + agez + (black + hispanic + other) * ppov + (1 | cofips)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## fit.mix.bin.ml 7 105436 105508 -52711 105422
## fit.mix.bin.cl 11 104119 104232 -52048 104097 1325 4 < 2.2e-16
##
## fit.mix.bin.ml
## fit.mix.bin.cl ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on our Z-values (all <= 2) there is not sufficient evidence to conclude that the effects of cross level interaction are significant.
Based on the comparison of the mixed level model to the cross level interaction model, there seems to be some evidince that interaction is significant as there is a very strong Chi sq test statistic, and correspondingly low p-value when we do a likelihood ratio test between the two. To be honest, I’m a little surprised by the result and not quite sure how to interpret it?