In the following, we will use the cake dataset which examines the flexibility of cakes from various recipes baked at several temperatures. This is the only linear and relatively easy example from the datasets in lme4 with multiple categorical predictors.

From the documentation:

Data on the breakage angle of chocolate cakes made with three different recipes and baked at six different temperatures. This is a split-plot design with the recipes being whole-units and the different temperatures being applied to sub-units (within replicates). The experimental notes suggest that the replicate numbering represents temporal ordering.

Variable Description
replicate a factor with levels 1 to 15
recipe a factor with levels A, B and C
temp numeric value of the baking temperature (degrees F)
temperature an ordered factor with levels 175 < 185 < 195 < 205 < 215 < 225
angle a numeric vector giving the angle at which the cake broke.

The replicate factor is nested within the recipe factor, and temperature is nested within replicate.

We will use the following basic model, as computed in the example documentation. For now, we will use ML estimation, as some prominent members of the community, e.g. Doug Bates, have expressed some skepticism at the use of REML over ML (12).

library(lme4)
## Loading required package: Matrix
m <- lmer(angle ~ recipe * temperature + (1|recipe:replicate), cake, REML= FALSE)

Vanilla lme4

ANOVA

anova(m)
## Analysis of Variance Table
##                    Df  Sum Sq Mean Sq F value
## recipe              2   10.19    5.09  0.2666
## temperature         5 2100.30  420.06 21.9856
## recipe:temperature 10  205.98   20.60  1.0781

The anova() method in lme4 can be used for an ANOVA in addition to its common use as likelihood-ratio test. (This is alsotrue of anova() for lm objects.) In this particular case, there is no significant interaction (even without \(p\)-values, this is obvious from the \(F\)-value), but even if there were, you wouldn’t see “where” the interaction is occurring. In traditional analyses, this is where you would have to “resolve” the interaction.

Likelihood-Ratio Test

If we want \(p\)-values for our ANOVA output or we want to compare random effects, the traditional way to do this with mixed models is with the likelihood-ratio test for nested models. There are two practical concerns here:

  1. The “deviance” of REML-fitted models is dependent on the fixed-effects parameterization and thus we cannot REML-fitted models with different fixed effects. We can however compare REML-fitted models with different random effects. Later versions of lme4 will automatically catch refit REML-fitted models with ML, but realistically you shouldn’t depend on this behavior and you should instead just fit the original models with ML.
  2. It can be a real pain to spell out the various model formulations.

Luckily, there are a few tricks to make this easier. drop1() just iterates through the model dropping the uppermost terms individually. In our case, that corresponds to the interaction term. drop1() can optionally also provide a LR-test.

drop1(m,test="Chisq")
## Single term deletions
## 
## Model:
## angle ~ recipe * temperature + (1 | recipe:replicate)
##                    Df    AIC   LRT Pr(Chi)
## <none>                1719.0              
## recipe:temperature 10 1709.6 10.53  0.3953

Each row corresponds to a new model computed with a single-term deleted. We can also tell drop1() which terms to drop:

drop1(m,scope=c("recipe","temperature","recipe:temperature"),test="Chisq")
## Single term deletions
## 
## Model:
## angle ~ recipe * temperature + (1 | recipe:replicate)
##                    Df    AIC   LRT Pr(Chi)
## <none>                1719.0              
## recipe              0 1719.0  0.00        
## temperature         0 1719.0  0.00        
## recipe:temperature 10 1709.6 10.53  0.3953

For more information see ?drop1.merMod.

Alternatively, we can use the update() method to interactively change models:

dropped <- update(m, . ~ . - recipe:temperature)
anova(dropped,m)
## Data: cake
## Models:
## dropped: angle ~ recipe + temperature + (1 | recipe:replicate)
## m: angle ~ recipe * temperature + (1 | recipe:replicate)
##         Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)
## dropped 10 1709.6 1745.6 -844.79   1689.6                        
## m       20 1719.0 1791.0 -839.53   1679.0 10.53     10     0.3953
drop1(dropped)
## Single term deletions
## 
## Model:
## angle ~ recipe + temperature + (1 | recipe:replicate)
##             Df    AIC
## <none>         1709.6
## recipe       2 1706.1
## temperature  5 1785.7

Coefficient Summary

One advantage of mixed models compared to traditional repeated-measures ANOVA is that we’re explicitly calcuating a regression model. As such, we have an individual estimate for each interaction as part of our model summary. We do not have to resolve interactions explictly, we can instead simply look at the individual interaction levels in the coefficients.

summary(m,correlation=FALSE)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: angle ~ recipe * temperature + (1 | recipe:replicate)
##    Data: cake
## 
##      AIC      BIC   logLik deviance df.resid 
##   1719.1   1791.0   -839.5   1679.1      250 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.73950 -0.63226 -0.05389  0.58985  2.85039 
## 
## Random effects:
##  Groups           Name        Variance Std.Dev.
##  recipe:replicate (Intercept) 39.05    6.249   
##  Residual                     19.11    4.371   
## Number of obs: 270, groups:  recipe:replicate, 45
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)           33.12222    1.67794  19.740
## recipeB               -1.47778    2.37297  -0.623
## recipeC               -1.52222    2.37297  -0.641
## temperature.L          6.43033    1.12860   5.698
## temperature.Q         -0.71285    1.12860  -0.632
## temperature.C         -2.32551    1.12860  -2.061
## temperature^4         -3.35128    1.12860  -2.969
## temperature^5         -0.15119    1.12860  -0.134
## recipeB:temperature.L  0.45419    1.59609   0.285
## recipeC:temperature.L  0.08765    1.59609   0.055
## recipeB:temperature.Q -0.23277    1.59609  -0.146
## recipeC:temperature.Q  1.21475    1.59609   0.761
## recipeB:temperature.C  2.69322    1.59609   1.687
## recipeC:temperature.C  2.63856    1.59609   1.653
## recipeB:temperature^4  3.02372    1.59609   1.894
## recipeC:temperature^4  3.13711    1.59609   1.965
## recipeB:temperature^5 -0.66354    1.59609  -0.416
## recipeC:temperature^5 -1.62525    1.59609  -1.018
## 
## Correlation matrix not shown by default, as p = 18 > 12.
## Use print(x, correlation=TRUE)  or
##   vcov(x)     if you need it

Looking at the summary, we see detailed estimats of each part of each effect for all effect orders. This is the resolution of the interaction in a classical ANOVA and arguably the resolution of the main effects. If you’re clever with your contrast coding, you may even be able to read relevant pairwise contrasts directly from this output. If you’re less clever with your contrasts, then you should look at the lsmeans, afex and phia packages. If you want pairwise contrasts of higher-level interactions, then you should be clever with your contrast coding so that you can read them off directly, use the effects package (see below) to get a graphical summary, and ask yourself if you really need that and if anybody, including you, really understands it.

There are two disadvantages to reading your effects off this type of summary:

  1. You only see (some of the) pairwise contrasts for multilevel factors, i.e. you don’t see overall effects for multilevel factors.
  2. You don’t have any \(p\)-values.

The solution to the first problem is called tests of linear hypotheses and are generally handled by ANOVA-type output (see above and below).

There are several solutions to the second problem. The problem arises from the difficulty in determining the denominator degrees of freedom in a mixed-effects context. For models fitted with lots of data, it’s usually safe to assume that the degrees of freedom are large enough that we can interpret the \(t\)-values as \(z\)-values. If you don’t have much data or absolutely need \(p\)-values, there are a few approximations for doing so.

car: John Fox’s companion to applied regression

The car package provides ANOVA-type output for a variety of model types. You can ask for Type-II (recommended) or Type-III (for compatibility with the defaults in some commercial software) Wald tests of linear hypotheses. You can also specify the test statistic to compute. The \(\chi^2\) test statistic is equivalent to the assumption of large/infinite degrees of freedom, i.e. that the Wald tests on the individual coefficients in the model summary are \(z\) and not \(t\) tests.

library(car)
Anova(m,type="II",test.statistic="Chisq")
## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: angle
##                       Chisq Df Pr(>Chisq)    
## recipe               0.5331  2     0.7660    
## temperature        109.9278  5     <2e-16 ***
## recipe:temperature  10.7807 10     0.3749    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This approximation is generally very fast.

It is also possible to use the Kenward-Roger approximation to compute \(F\)-statistics. The Kenward-Roger approximation is only valid for REML and thus we must first refit our model.

m.reml <- update(m,REML=TRUE)
Anova(m.reml,type="II",test.statistic="F")
## Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)
## 
## Response: angle
##                          F Df Df.res Pr(>F)    
## recipe              0.2488  2     42 0.7809    
## temperature        20.5199  5    210 <2e-16 ***
## recipe:temperature  1.0062 10    210 0.4393    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This approximation is much slower and more memory intensive (to the point that it may not be computationally tractable for typical datasets on typical computers!).

In my experience, both test statistics deliver similar results but the \(\chi^2\) are much faster and much less memory intensive.

lmerTest: override the builtin functions with optional Satterthwaite and Kenward-Roger approximations

If you want \(p\)-values in the summary() and anova() functions, then you can use the lmerTest package as a drop-in replacement for lme4. If you’ve already computed your models, it is also trivial to convert them after the fact.

library(lmerTest)
## 
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
## 
##     lmer
## The following object is masked from 'package:stats':
## 
##     step
# from this point on, a call to lmer() calls the version from lmerTest. To use the version from lme4, use explicit scope: 
# To avoid recomputing existing models, we can cast them to merModLmerTest and then use the functions from lmerTest
m.test <- as(m,"merModLmerTest")

Summary

There are three supported approximations for the degrees of freedom: lme4 (i.e. none), Satterthwaite (more computationally intensive, but usually doable), Kenward-Roger.

# for some reason, the lmerTest summary method doesn't support the correlation argument but the print methods do
print(summary(m.test,ddf="lme4"),correlation=FALSE)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: angle ~ recipe * temperature + (1 | recipe:replicate)
##    Data: cake
## 
##      AIC      BIC   logLik deviance df.resid 
##   1719.1   1791.0   -839.5   1679.1      250 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.73950 -0.63226 -0.05389  0.58985  2.85039 
## 
## Random effects:
##  Groups           Name        Variance Std.Dev.
##  recipe:replicate (Intercept) 39.05    6.249   
##  Residual                     19.11    4.371   
## Number of obs: 270, groups:  recipe:replicate, 45
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)           33.12222    1.67794  19.740
## recipeB               -1.47778    2.37297  -0.623
## recipeC               -1.52222    2.37297  -0.641
## temperature.L          6.43033    1.12860   5.698
## temperature.Q         -0.71285    1.12860  -0.632
## temperature.C         -2.32551    1.12860  -2.061
## temperature^4         -3.35128    1.12860  -2.969
## temperature^5         -0.15119    1.12860  -0.134
## recipeB:temperature.L  0.45419    1.59609   0.285
## recipeC:temperature.L  0.08765    1.59609   0.055
## recipeB:temperature.Q -0.23277    1.59609  -0.146
## recipeC:temperature.Q  1.21475    1.59609   0.761
## recipeB:temperature.C  2.69322    1.59609   1.687
## recipeC:temperature.C  2.63856    1.59609   1.653
## recipeB:temperature^4  3.02372    1.59609   1.894
## recipeC:temperature^4  3.13711    1.59609   1.965
## recipeB:temperature^5 -0.66354    1.59609  -0.416
## recipeC:temperature^5 -1.62525    1.59609  -1.018
print(summary(m.test,ddf="Satterthwaite"),correlation=FALSE)
## Linear mixed model fit by maximum likelihood t-tests use Satterthwaite
##   approximations to degrees of freedom [lmerMod]
## Formula: angle ~ recipe * temperature + (1 | recipe:replicate)
##    Data: cake
## 
##      AIC      BIC   logLik deviance df.resid 
##   1719.1   1791.0   -839.5   1679.1      250 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.73950 -0.63226 -0.05389  0.58985  2.85039 
## 
## Random effects:
##  Groups           Name        Variance Std.Dev.
##  recipe:replicate (Intercept) 39.05    6.249   
##  Residual                     19.11    4.371   
## Number of obs: 270, groups:  recipe:replicate, 45
## 
## Fixed effects:
##                        Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)            33.12222    1.67794  45.00000  19.740  < 2e-16 ***
## recipeB                -1.47778    2.37297  45.00000  -0.623  0.53659    
## recipeC                -1.52222    2.37297  45.00000  -0.641  0.52446    
## temperature.L           6.43033    1.12860 225.00000   5.698 3.78e-08 ***
## temperature.Q          -0.71285    1.12860 225.00000  -0.632  0.52828    
## temperature.C          -2.32551    1.12860 225.00000  -2.061  0.04050 *  
## temperature^4          -3.35128    1.12860 225.00000  -2.969  0.00331 ** 
## temperature^5          -0.15119    1.12860 225.00000  -0.134  0.89356    
## recipeB:temperature.L   0.45419    1.59609 225.00000   0.285  0.77624    
## recipeC:temperature.L   0.08765    1.59609 225.00000   0.055  0.95625    
## recipeB:temperature.Q  -0.23277    1.59609 225.00000  -0.146  0.88418    
## recipeC:temperature.Q   1.21475    1.59609 225.00000   0.761  0.44741    
## recipeB:temperature.C   2.69322    1.59609 225.00000   1.687  0.09291 .  
## recipeC:temperature.C   2.63856    1.59609 225.00000   1.653  0.09970 .  
## recipeB:temperature^4   3.02372    1.59609 225.00000   1.894  0.05945 .  
## recipeC:temperature^4   3.13711    1.59609 225.00000   1.965  0.05059 .  
## recipeB:temperature^5  -0.66354    1.59609 225.00000  -0.416  0.67801    
## recipeC:temperature^5  -1.62525    1.59609 225.00000  -1.018  0.30964    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Again, the Kenward-Roger approximation requires an REML-fitted model. If we pass lmerTest an ML-fitted model, the KR-approximation fails and a summary without any degrees of freedom correction.

m.reml.test <- as(m.reml,"merModLmerTest")
print(summary(m.reml.test,ddf="Kenward-Roger"),correlation=FALSE)
## Linear mixed model fit by REML t-tests use Kenward-Roger approximations
##   to degrees of freedom [lmerMod]
## Formula: angle ~ recipe * temperature + (1 | recipe:replicate)
##    Data: cake
## 
## REML criterion at convergence: 1638.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.64661 -0.61082 -0.05207  0.56985  2.75374 
## 
## Random effects:
##  Groups           Name        Variance Std.Dev.
##  recipe:replicate (Intercept) 41.84    6.468   
##  Residual                     20.47    4.524   
## Number of obs: 270, groups:  recipe:replicate, 45
## 
## Fixed effects:
##                        Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)            33.12222    1.73683  42.00000  19.070  < 2e-16 ***
## recipeB                -1.47778    2.45625  42.00000  -0.602  0.55065    
## recipeC                -1.52222    2.45625  42.00000  -0.620  0.53878    
## temperature.L           6.43033    1.16822 210.00000   5.504 1.07e-07 ***
## temperature.Q          -0.71285    1.16822 210.00000  -0.610  0.54239    
## temperature.C          -2.32551    1.16822 210.00000  -1.991  0.04782 *  
## temperature^4          -3.35128    1.16822 210.00000  -2.869  0.00454 ** 
## temperature^5          -0.15119    1.16822 210.00000  -0.129  0.89715    
## recipeB:temperature.L   0.45419    1.65211 210.00000   0.275  0.78365    
## recipeC:temperature.L   0.08765    1.65211 210.00000   0.053  0.95774    
## recipeB:temperature.Q  -0.23277    1.65211 210.00000  -0.141  0.88809    
## recipeC:temperature.Q   1.21475    1.65211 210.00000   0.735  0.46299    
## recipeB:temperature.C   2.69322    1.65211 210.00000   1.630  0.10456    
## recipeC:temperature.C   2.63856    1.65211 210.00000   1.597  0.11175    
## recipeB:temperature^4   3.02372    1.65211 210.00000   1.830  0.06863 .  
## recipeC:temperature^4   3.13711    1.65211 210.00000   1.899  0.05895 .  
## recipeB:temperature^5  -0.66354    1.65211 210.00000  -0.402  0.68836    
## recipeC:temperature^5  -1.62525    1.65211 210.00000  -0.984  0.32637    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kenward-Roger is generally assumed to be more accurate than Satterthwaite, but the results are often quite similar.

ANOVA

lmerTest also supplies an anova() method, again with the ddf argument:

anova(m.test,ddf="lme4")
## Analysis of Variance Table
##                    Df  Sum Sq Mean Sq F value
## recipe              2   10.19    5.09  0.2666
## temperature         5 2100.30  420.06 21.9856
## recipe:temperature 10  205.98   20.60  1.0781
anova(m.test,ddf="Satterthwaite")
## Analysis of Variance Table of type III  with  Satterthwaite 
## approximation for degrees of freedom
##                     Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)    
## recipe               10.19    5.09     2    45  0.2666 0.7672    
## temperature        2100.30  420.06     5   225 21.9856 <2e-16 ***
## recipe:temperature  205.98   20.60    10   225  1.0781 0.3801    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.reml.test,ddf="Kenward-Roger")
## Analysis of Variance Table of type III  with  Kenward-Roger 
## approximation for degrees of freedom
##                     Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)    
## recipe               10.19    5.09     2    42  0.2488 0.7809    
## temperature        2100.30  420.06     5   210 20.5199 <2e-16 ***
## recipe:temperature  205.98   20.60    10   210  1.0062 0.4393    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Effects: Visualizing your model

A final option when considering interactions to look at the effects with the effects package.

library(effects)
## 
## Attaching package: 'effects'
## The following object is masked from 'package:car':
## 
##     Prestige

At the very basic, we can calculate and plot all effects in a model.

e <- allEffects(m)
print(e)
##  model: angle ~ recipe * temperature
## 
##  recipe*temperature effect
##       temperature
## recipe      175      185      195      205      215      225
##      A 29.13333 31.53333 30.80000 33.53333 38.66667 35.06667
##      B 26.86667 29.40000 31.73333 32.13333 34.46667 35.26667
##      C 27.93333 28.93333 31.73333 30.86667 34.40000 35.73333
plot(e)

There are many options for the plot function (which is based on lattice graphics); make sure to take a look at ?plot.eff!

plot(e,multiline=TRUE,confint=TRUE,ci.style="bars"
     ,main="Effect of Recipe and Temperature on Breaking Angle"
     ,xlab="Temperature (°F)"
     ,ylab="Angle (deg)")

Alternatively, you can convert the effects to a data frame and use ggplot:

# allEffects() returns a list, but for our model (where everything can interact with everything), there's only one element
e1 <- e[[1]]
e.df <- as.data.frame(e1)
library(ggplot2)
g <- ggplot(e.df,aes(x=temperature,y=fit,color=recipe,shape=recipe,ymin=lower,ymax=upper)) + 
    geom_pointrange(position=position_dodge(width=.1)) + 
    xlab("Temperature (°F)") + ylab("Angle (deg)") + ggtitle("Effect of Recipe and Temperature on Breaking Angle")
plot(g)

If you are using a REML-fitted model, you have the option of using the Kenward-Roger approximation in computing the effects. Again, we get nigh-identical results.

e.reml <- allEffects(m.reml,KR=FALSE)
e.reml.kr <- allEffects(m.reml,KR=TRUE)
plot(e.reml,multiline=TRUE,confint=TRUE,ci.style="bars"
     ,main="REML, no KR "
     ,xlab="Temperature (°F)"
     ,ylab="Angle (deg)")

plot(e.reml,multiline=TRUE,confint=TRUE,ci.style="bars"
     ,main="REML, with KR "
     ,xlab="Temperature (°F)"
     ,ylab="Angle (deg)")

nokr <- as.data.frame(e.reml[[1]])
withkr <- as.data.frame(e.reml.kr[[1]])

nokr$KR <- "with KR"
withkr$KR <- "without KR"

comp <- rbind(nokr,withkr)

comp.g <- ggplot(comp,aes(x=temperature,y=fit,color=recipe,shape=recipe,ymin=lower,ymax=upper)) + 
    geom_pointrange(position=position_dodge(width=.1)) + facet_wrap(~KR) +
    xlab("Temperature (°F)") + ylab("Angle (deg)") + ggtitle("Effect of Recipe and Temperature on Breaking Angle")
plot(comp.g)

comp.g2 <- ggplot(comp,aes(x=temperature,y=fit,color=KR,shape=KR,ymin=lower,ymax=upper)) + 
    geom_pointrange(position=position_dodge(width=.1)) + facet_wrap(~recipe) +
    xlab("Temperature (°F)") + ylab("Angle (deg)") + ggtitle("Effect of Recipe and Temperature on Breaking Angle")
plot(comp.g2)

Session Info

sessionInfo()
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.5 (Yosemite)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_AU.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_2.1.0   effects_3.1-1   lmerTest_2.0-30 car_2.1-2      
## [5] lme4_1.1-12     Matrix_1.2-6   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.5         RColorBrewer_1.1-2  nloptr_1.0.4       
##  [4] plyr_1.8.3          tools_3.3.0         rpart_4.1-10       
##  [7] digest_0.6.9        evaluate_0.9        nlme_3.1-127       
## [10] gtable_0.2.0        lattice_0.20-33     mgcv_1.8-12        
## [13] yaml_2.1.13         parallel_3.3.0      SparseM_1.7        
## [16] gridExtra_2.2.1     cluster_2.0.4       stringr_1.0.0      
## [19] knitr_1.13          MatrixModels_0.4-1  grid_3.3.0         
## [22] nnet_7.3-12         data.table_1.9.6    survival_2.39-2    
## [25] foreign_0.8-66      rmarkdown_0.9.6     latticeExtra_0.6-28
## [28] minqa_1.2.4         Formula_1.2-1       magrittr_1.5       
## [31] Hmisc_3.17-4        scales_0.4.0        htmltools_0.3.5    
## [34] MASS_7.3-45         splines_3.3.0       pbkrtest_0.4-6     
## [37] colorspace_1.2-6    labeling_0.3        quantreg_5.24      
## [40] stringi_1.1.1       acepack_1.3-3.3     munsell_0.4.3      
## [43] chron_2.3-47