This is document will demonstrate how to calculate and plot the predictions of a mixed-effects logistic regression. This based largely on Ben Bolker’s Owls data example [pdf], & help from Morgan Sonderegger. This document also greatly benefitted from the R Markdown Reference Guide [pdf].
First, we have to load the following libraries:
library(languageR)
library(lme4)
library(lmerTest)
library(optimx)
library(arm)
library(ggplot2)
languageR contains the freely available data set we’ll be using,lme4 is what we need to fit the model,ggplot2 for making nice plots. (Useful ggplot2 reference: Cookbook for R)Let’s fit a model with the following information:
CaseMarking as the dependent variable: 1 corresponds to ergative marking, 0 to other (this is modified from the original data set)TextSpeakerWordOrderAgeGroupAnimacyOfSubjectwarlpiri.lmer = glmer(CaseMarking ~ WordOrder + AgeGroup +
AnimacyOfSubject + (1|Text) + (1|Speaker),
control=glmerControl(optimizer="optimx",optCtrl=list(method="nlminb")),
family = "binomial", data = warlpiri)
summary(warlpiri.lmer)
## Generalized linear mixed model fit by maximum likelihood ['glmerMod']
## Family: binomial ( logit )
## Formula: CaseMarking ~ WordOrder + AgeGroup + AnimacyOfSubject + (1 | Text) + (1 | Speaker)
## Data: warlpiri
##
## AIC BIC logLik deviance
## 301.7711 324.8671 -144.8856 289.7711
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 0.28524 0.5341
## Text (Intercept) 0.02204 0.1485
## Number of obs: 347, groups: Speaker, 27; Text, 3
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.1072 0.3225 6.534 6.4e-11 ***
## WordOrdersubNotInitial 0.6326 0.3620 1.748 0.0805 .
## AgeGroupchild -0.7154 0.3819 -1.873 0.0610 .
## AnimacyOfSubjectinanimate -0.7821 0.3593 -2.177 0.0295 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) WrdONI AgGrpc
## WrdOrdrsbNI -0.200
## AgeGropchld -0.661 -0.081
## AnmcyOfSbjc -0.263 -0.138 0.027
Set up the frame that gives you all possible combinations of values.
pframe0 <- with(warlpiri, expand.grid(WordOrder=levels(WordOrder),AgeGroup=levels(AgeGroup),AnimacyOfSubject=levels(AnimacyOfSubject)))
pframe0
## WordOrder AgeGroup AnimacyOfSubject
## 1 subInitial adult animate
## 2 subNotInitial adult animate
## 3 subInitial child animate
## 4 subNotInitial child animate
## 5 subInitial adult inanimate
## 6 subNotInitial adult inanimate
## 7 subInitial child inanimate
## 8 subNotInitial child inanimate
Now pframe0 contains all possible values for a given observation.
(How do you do this if there’s an interaction in your model?)
And from that, you can get a matrix with all the different possible values that a given observation might have.
mm <- model.matrix(~WordOrder+AgeGroup+AnimacyOfSubject,data=pframe0)
pframe1 <- data.frame(pframe0,eta=mm%*%fixef(warlpiri.lmer))
pframe1 <- with(pframe1,data.frame(pframe1,CaseMarking=invlogit(eta)))
pframe1$pse <- diag(mm %*% tcrossprod(vcov(warlpiri.lmer), mm))
Which makes these the bounds of the 95% confidence interval for the average word/speaker prediction, in log-odds.
pframe1$hi <- with(pframe1, eta + 1.96*pse)
pframe1$low <- with(pframe1, eta - 1.96*pse)
Translate these numbers to probablity using the invlogit() function.
pframe1$hi.prob <- invlogit(pframe1$hi)
pframe1$low.prob <- invlogit(pframe1$low)
Now you can make a nice plot of your predicted probabilities with 95% confidence intervals!
ggplot(aes(x=AgeGroup, ymin=low.prob, ymax=hi.prob,y=CaseMarking), data=pframe1) + geom_errorbar(aes(color=AnimacyOfSubject),width=0.2) + geom_point(aes(color=AnimacyOfSubject)) + coord_cartesian(ylim=c(0, 1)) + facet_wrap(~AnimacyOfSubject+WordOrder) + ggtitle('Predicted probability ')
ggplot(aes(x=AnimacyOfSubject, ymin=low.prob, ymax=hi.prob,y=CaseMarking), data=pframe1) + geom_errorbar(aes(width=0.2,color=AgeGroup))+ coord_cartesian(ylim=c(0, 1)) + ggtitle('Predicted probability ') + facet_grid(~WordOrder)