Linear Mixed-Effects Models

Phillip M. Alday
12 August 2016 (Melbourne)

Some of the assumptions of linear regression

  • independence of errors
  • equal variance of errors (homoskedacity)
  • normality of errors

Today: violating independence

Sleep Study

plot of chunk unnamed-chunk-1

Simple Linear Regression

plot of chunk unnamed-chunk-2

Residuals

plot of chunk unnamed-chunk-3

Linear Regression: Single Subject

plot of chunk unnamed-chunk-4

Residuals

plot of chunk unnamed-chunk-5

Linear Regression: Per Subject

plot of chunk unnamed-chunk-6

Inference

  • Each subject-level linear regression generalizes within that subject.
  • How do we generalize across subjects?
  • Naive solution: model of models – look at the distribution of estimates

Many Models

Independent estimation

plot of chunk unnamed-chunk-7

Distribution of Estimates

plot of chunk unnamed-chunk-8

Hierarchical Models

Structuring Error

  • General linear model: \[ Y = \beta_{1}X_1 + \beta_{0} + \varepsilon \]
  • Hierarchical models:

    • per-subject correction for intercepts: \[ Y = \beta_{1}X_1 + \beta_{0} + S_0 + \varepsilon \]
    • per-subject correction for slopes: \[ Y = \left(\beta_{1} + S_1\right)X_1 + \beta_{0} + \varepsilon \]
    • per-subject correction for both: \[ Y = \left(\beta_{1} + S_1\right)X_1 + \beta_{0} + S_0 + \varepsilon \]
    • per-subject and per-item correction for both: \[ Y = \left(\beta_{1} + S_1 +I_1 \right)X_1 + \beta_{0} + I_0 + S_0 + \varepsilon \]
  • Equivalently: building-up structure in the variance component:

    1. per-observation: residual error
    2. per-group error
    3. (partial) nesting and crossing of groups

How Much Structure?

  • Too much structure leads to overparamaterization
  • Too little structure is anticonservative (cf. Barr et al. 2013)
  • Choose the maximal structure that
    1. fits your experimental design
    2. converges for your model

Hierarchical Model: Varying Intercepts

Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ 1 + Days + (1 | Subject)
   Data: sleepstudy

REML criterion at convergence: 1786.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2257 -0.5529  0.0109  0.5188  4.2506 

Random effects:
 Groups   Name        Variance Std.Dev.
 Subject  (Intercept) 1378.2   37.12   
 Residual              960.5   30.99   
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept) 251.4051     9.7467   25.79
Days         10.4673     0.8042   13.02

Correlation of Fixed Effects:
     (Intr)
Days -0.371

Hierarchical Model: Varying Intercepts and Slopes

Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
   Data: sleepstudy

REML criterion at convergence: 1743.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9536 -0.4634  0.0231  0.4634  5.1793 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 612.09   24.740       
          Days         35.07    5.922   0.07
 Residual             654.94   25.592       
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept)  251.405      6.825   36.84
Days          10.467      1.546    6.77

Correlation of Fixed Effects:
     (Intr)
Days -0.138

Comparing Two Models

Data: sleepstudy
Models:
m1: Reaction ~ 1 + Days + (1 | Subject)
m2: Reaction ~ 1 + Days + (1 + Days | Subject)
   Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
m1  4 1802.1 1814.8 -897.04   1794.1                             
m2  6 1763.9 1783.1 -875.97   1751.9 42.139      2  7.072e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Distribution of Estimates

plot of chunk unnamed-chunk-12

Many Models vs. Hierarchical Models

Distribution of Estimates

plot of chunk unnamed-chunk-13

Why mixed-models?

  • shrinkage (Stein paradox, variance-bias tradeoff, over- vs- underfitting)
  • multiple sources of variance simultaneously (cf. Clark 1973; Judd, Westfall & Kenny 2012)
  • better “sharing” of information
  • comprehensive framework for unbalanced, partially crossed designs including controlling for confounds (Sassenhagen and Alday, in press)

Important Details

REML vs. ML

Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
   Data: sleepstudy

REML criterion at convergence: 1743.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9536 -0.4634  0.0231  0.4634  5.1793 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 612.09   24.740       
          Days         35.07    5.922   0.07
 Residual             654.94   25.592       
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept)  251.405      6.825   36.84
Days          10.467      1.546    6.77

Correlation of Fixed Effects:
     (Intr)
Days -0.138

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
   Data: sleepstudy

     AIC      BIC   logLik deviance df.resid 
  1763.9   1783.1   -876.0   1751.9      174 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9416 -0.4656  0.0289  0.4636  5.1793 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 565.52   23.781       
          Days         32.68    5.717   0.08
 Residual             654.94   25.592       
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept)  251.405      6.632   37.91
Days          10.467      1.502    6.97

Correlation of Fixed Effects:
     (Intr)
Days -0.138

REML vs. ML

  • REML: Restricted Maximum Likelihood
  • Variance is the average squared distance to the true mean
  • Variance measured to the ML-estimated mean is less than true variance in finite samples
  • This is the motivation behind Bessel's correction (\( n-1 \) in the denominator instead of \( n \) when calculating variance using SS)
  • Similar motivation for using REML in estimation of mixed-effects models: more accurate estimation of variance (and hence random effects!)
  • “log likelihood” bei REML-fitted models dependent on paramerization and thus not comparable across models
  • similarly: AIC, BIC, other measures of fit not comparable across models
  • REML-models more accurate but not comparable with each other:
    • determine model structure with comparisons between ML-estimates
    • present final model with REML-estimates
    • lme4 has some built-in protections to prevent REML-based comparisons, but don't depend on these!

Where are my p-values?

  • Degrees of freedom in hierarchical models not entirely trivial to define or determine in the general case, but possible in certain specific cases
  • No \( p \)-values for \( t \) and \( F \) without degrees of freedom
  • There are (computationally expensive) ways to approximate them in the general case:
    • Satterthwaite
    • Kenward-Roger (usually more accurate, but more expensive)
  • But these are not without controversy
  • And you should report the method used!

Doug Bates, leading expert in computational methods for GLMM

Perhaps I can try again to explain why I don’t quote p-values or, more to the point, why I do not take the “obviously correct” approach of attempting to reproduce the results provided by SAS. Let me just say that, although there are those who feel that the purpose of the R Project - indeed the purpose of any statistical computing whatsoever - is to reproduce the p-values provided by SAS, I am not a member of that group. If those people feel that I am a heretic for even suggesting that a p-value provided by SAS could be other than absolute truth and that I should be made to suffer a slow, painful death by being burned at the stake for my heresy, then I suppose that we will be able to look forward to an exciting finale to the conference dinner at UseR!2006 next month. (Well, I won’t be looking forward to such a finale but the rest of you can.)

A (probably not) final word on p-values

  • A smaller \( p \)-value is not an indication of a stronger effect.
  • If you want to compare effect sizes, then use estimates of effect size!
  • You can always achieve an arbitrarily small \( p \)-value with sufficient data.
  • \( p \)-values are likelihoods under the null hypothesis, not posterior probabilities.