Model predictions for logistic regressions

This is document will demonstrate how to calculate and plot the predictions of a mixed-effects logistic regression. This based largely on Ben Bolker’s Owls data example [pdf], & help from Morgan Sonderegger. This document also greatly benefitted from the R Markdown Reference Guide [pdf].

First, we have to load the following libraries:

library(languageR)
library(lme4)
library(lmerTest)
library(optimx)
library(arm)
library(ggplot2)

languageR contains the freely available data set we’ll be using,
lme4 is what we need to fit the model,
ggplot2 for making nice plots. (Useful ggplot2 reference: Cookbook for R)

Fitting a model

Let’s fit a model with the following information:

CaseMarking as the dependent variable: 1 corresponds to ergative marking, 0 to other (this is modified from the original data set)
Random effects:
- Text
- Speaker
Fixed Effects:
- WordOrder
- AgeGroup
- AnimacyOfSubject

warlpiri.lmer = glmer(CaseMarking ~ WordOrder + AgeGroup +
                        AnimacyOfSubject + (1|Text) + (1|Speaker),
                      control=glmerControl(optimizer="optimx",optCtrl=list(method="nlminb")),
                      family = "binomial", data = warlpiri)

summary(warlpiri.lmer)

## Generalized linear mixed model fit by maximum likelihood ['glmerMod']
##  Family: binomial ( logit )
## Formula: CaseMarking ~ WordOrder + AgeGroup + AnimacyOfSubject + (1 |      Text) + (1 | Speaker) 
##    Data: warlpiri 
## 
##       AIC       BIC    logLik  deviance 
##  301.7711  324.8671 -144.8856  289.7711 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  Speaker (Intercept) 0.28524  0.5341  
##  Text    (Intercept) 0.02204  0.1485  
## Number of obs: 347, groups: Speaker, 27; Text, 3
## 
## Fixed effects:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 2.1072     0.3225   6.534  6.4e-11 ***
## WordOrdersubNotInitial      0.6326     0.3620   1.748   0.0805 .  
## AgeGroupchild              -0.7154     0.3819  -1.873   0.0610 .  
## AnimacyOfSubjectinanimate  -0.7821     0.3593  -2.177   0.0295 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) WrdONI AgGrpc
## WrdOrdrsbNI -0.200              
## AgeGropchld -0.661 -0.081       
## AnmcyOfSbjc -0.263 -0.138  0.027

Getting the predictions

Set up the frame that gives you all possible combinations of values.

pframe0 <- with(warlpiri, expand.grid(WordOrder=levels(WordOrder),AgeGroup=levels(AgeGroup),AnimacyOfSubject=levels(AnimacyOfSubject)))

pframe0

##       WordOrder AgeGroup AnimacyOfSubject
## 1    subInitial    adult          animate
## 2 subNotInitial    adult          animate
## 3    subInitial    child          animate
## 4 subNotInitial    child          animate
## 5    subInitial    adult        inanimate
## 6 subNotInitial    adult        inanimate
## 7    subInitial    child        inanimate
## 8 subNotInitial    child        inanimate

Now pframe0 contains all possible values for a given observation.

(How do you do this if there’s an interaction in your model?)

And from that, you can get a matrix with all the different possible values that a given observation might have.

mm <- model.matrix(~WordOrder+AgeGroup+AnimacyOfSubject,data=pframe0)

pframe1 <- data.frame(pframe0,eta=mm%*%fixef(warlpiri.lmer))
pframe1 <- with(pframe1,data.frame(pframe1,CaseMarking=invlogit(eta)))

pframe1$pse <- diag(mm %*% tcrossprod(vcov(warlpiri.lmer), mm))

Which makes these the bounds of the 95% confidence interval for the average word/speaker prediction, in log-odds.

pframe1$hi <- with(pframe1, eta + 1.96*pse)
pframe1$low <- with(pframe1, eta - 1.96*pse)

Translate these numbers to probablity using the invlogit() function.

pframe1$hi.prob <- invlogit(pframe1$hi)
pframe1$low.prob <- invlogit(pframe1$low)

Now you can make a nice plot of your predicted probabilities with 95% confidence intervals!

ggplot(aes(x=AgeGroup, ymin=low.prob, ymax=hi.prob,y=CaseMarking), data=pframe1) + geom_errorbar(aes(color=AnimacyOfSubject),width=0.2) + geom_point(aes(color=AnimacyOfSubject)) + coord_cartesian(ylim=c(0, 1)) + facet_wrap(~AnimacyOfSubject+WordOrder) + ggtitle('Predicted probability ')

ggplot(aes(x=AnimacyOfSubject, ymin=low.prob, ymax=hi.prob,y=CaseMarking), data=pframe1) + geom_errorbar(aes(width=0.2,color=AgeGroup))+ coord_cartesian(ylim=c(0, 1)) + ggtitle('Predicted probability ') + facet_grid(~WordOrder)

Model predictions for logistic regressions

oriana

February 17, 2015

Fitting a model

Getting the predictions