This is just a basic introduction to lmer
syntax for multilevel regression. As an example, we’ll analyze the effect of different diets on chick growth. A typical call to lmer
looks something like this
m <- lmer(weight ~ Time * Diet + (1 + Time | Chick), data=ChickWeight, REML=F)
The first argument is the model formula. The part to the left of the ~
is the outcome variable (in this case weight
, the chick weight in gm). The part to the right of the ~
has two components:
Time * Diet
which is a compact way of specifying all simple effects and interactions of time (number of days since birth) and diet. A less compact but more explicit way to writing that would be Time + Diet + Time:Diet
(1 + Time | Chick)
which allows individual chicks to vary randomly in terms of their intercept (starting weight) and their effect of Time
(weight change over time, also called a “random slope”, but I think that terminology can get confusing when fitting models with nonlinear predictors).The second argument is the data frame. Note that this makes it easy to fit models to subsets of the data – if you wanted to ignore diet 3, you could specify data = subset(ChickWeight, Diet != "3")
.
The last argument is optional. The default in lmer
is to fit models using the REML (REstricted Maximum Likelihood) criterion. There are good reasons for this, but we often use the likelihood ratio test to compare models based on log-likelhoods, so we should use the Maximum Likelihood (ML) criterion.
To see the results of the model, we can use the summary()
function:
summary(m)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: weight ~ Time * Diet + (1 + Time | Chick)
## Data: ChickWeight
##
## AIC BIC logLik deviance df.resid
## 4824.2 4876.5 -2400.1 4800.2 566
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.7508 -0.5693 -0.0401 0.4694 3.5415
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Chick (Intercept) 103.61 10.179
## Time 10.01 3.165 -0.99
## Residual 163.36 12.781
## Number of obs: 578, groups: Chick, 50
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 33.6541 2.8023 12.009
## Time 6.2799 0.7303 8.598
## Diet2 -5.0205 4.8072 -1.044
## Diet3 -15.4038 4.8072 -3.204
## Diet4 -1.7475 4.8145 -0.363
## Time:Diet2 2.3293 1.2508 1.862
## Time:Diet3 5.1430 1.2508 4.112
## Time:Diet4 3.2528 1.2515 2.599
##
## Correlation of Fixed Effects:
## (Intr) Time Diet2 Diet3 Diet4 Tm:Dt2 Tm:Dt3
## Time -0.881
## Diet2 -0.583 0.513
## Diet3 -0.583 0.513 0.340
## Diet4 -0.582 0.513 0.339 0.339
## Time:Diet2 0.514 -0.584 -0.882 -0.300 -0.299
## Time:Diet3 0.514 -0.584 -0.300 -0.882 -0.299 0.341
## Time:Diet4 0.514 -0.584 -0.300 -0.300 -0.882 0.341 0.341
Sometimes we just want the fixed effect parameter estimates (coefficients), which we can get by using coef()
:
coef(summary(m))
## Estimate Std. Error t value
## (Intercept) 33.654113 2.8023022 12.0094514
## Time 6.279858 0.7303499 8.5984243
## Diet2 -5.020517 4.8071843 -1.0443779
## Diet3 -15.403787 4.8071843 -3.2043263
## Diet4 -1.747532 4.8145392 -0.3629697
## Time:Diet2 2.329278 1.2507884 1.8622481
## Time:Diet3 5.143013 1.2507884 4.1118171
## Time:Diet4 3.252804 1.2515387 2.5990434
One of the most challenging parts of fitting multilevel models is figuring our the right random effects. To understand the random effects, it can be helpful to look at the estimated variance and covariance block from the summary:
VarCorr(m)
## Groups Name Std.Dev. Corr
## Chick (Intercept) 10.1789
## Time 3.1645 -0.986
## Residual 12.7811
and we can get the actual random effect estimates using resid(m)
#to save space, only showing a summary of the structure and the first few values
str(resid(m))
## Named num [1:578] 13.18 6.75 -0.68 -11.11 -14.54 ...
## - attr(*, "names")= chr [1:578] "1" "2" "3" "4" ...
To make a (relatively) simple summary plot with model fit, use fortify()
within ggplot
:
ggplot(fortify(m), aes(Time, weight, color=Diet)) +
stat_summary(fun.data=mean_se, geom="pointrange") +
stat_summary(aes(y=.fitted), fun.y=mean, geom="line")