STA521 Lab 09

Justin Kao

Course Tools

RStudio Online: Your Data Science Playground


  • Browser-based: Access RStudio right from your browser—no setup headaches.
  • Duke VPN or Campus Connection: Seamlessly connect from anywhere using Duke VPN (Cisco) or while on campus.
  • Cloud-Powered: Harness the power of online resources—save your computer’s CPU and GPU for other tasks.
  • Hassle-Free Setup: No need to run install.packages—everything is pre-installed and ready to go!

Source: Dr. Colin Rundel STA 523

Introduction to Linear Mixed-Effects Models

Motivation

  • Heteroscedasticity accross groups

Levene’s test

Idea: If \(\sigma_j^2\) is large, then \(\left|y_{i, j}-\bar{y}_j\right|=\left|\hat{\epsilon}_{i, j}\right|\) should be large.

- Let \(z_{i, j}=\left|\hat{\epsilon}_{i j}\right|\)

- Use the ANOVA F-test for across-group differences in the \(z_{i, j}\) ’s

fit.nels<-lm(y.nels~as.factor(g.nels))
z.nels<-abs( fit.nels$res )
anova(lm(z.nels~as.factor(g.nels)) )
Analysis of Variance Table

Response: z.nels
                     Df Sum Sq Mean Sq F value    Pr(>F)    
as.factor(g.nels)   683  27078  39.645  1.6092 < 2.2e-16 ***
Residuals         12290 302776  24.636                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Variation Source

  school enroll flp public urbanicity hwh   ses mscore
1   1011      5   3      1      urban   2 -0.23  52.11
2   1011      5   3      1      urban   0  0.69  57.65
3   1011      5   3      1      urban   4 -0.68  66.44
4   1011      5   3      1      urban   5 -0.89  44.68
5   1011      5   3      1      urban   3 -1.28  40.57

Whether Public School or Not Matters?

ygstdv.nels<-c(tapply(y.nels,g.nels,sd))
boxplot(ygstdv.nels~pub.g.nels,col="lightblue")

Possible Explanations 1

  • Variation across schools attributable to student-level variation in SES

Possible Explanations 2

  • Variance across schools partially attributable to student-level variantion in SES

Possible Explanations 3

  • Variance across schools not attributable to student-level variantion in SES

General Setup of LME

\[\begin{gathered}\mathbf{y}_j=\mathbf{X}_j \boldsymbol{\beta}+\mathbf{Z}_j \mathbf{a}_j+\epsilon_j \\ \mathbf{E}\left[\begin{array}{l}\mathbf{a}_j \\ \epsilon_j\end{array}\right]=\left[\begin{array}{l}0 \\ 0\end{array}\right] \text { and } \operatorname{Cov}\left[\begin{array}{l}\mathbf{a}_j \\ \epsilon_j\end{array}\right]=\left[\begin{array}{ll}\Psi & 0 \\ 0 & \Sigma\end{array}\right]\end{gathered}\]

  • Across-group heterogeneity: \(\Psi\) is the variance-covariance in \(\mathbf{a}_1, \ldots, \mathbf{a}_m\).
  • Within-group heterogeneity: \(\Sigma\) is the variance-covariance of \(y_{1_j}, \ldots, y_{n_j j}\).

Theory: One-way random effects model, AKA the HNM

\[ \begin{aligned} y_{i, j} & =\mu+a_j+\epsilon_{i, j} \\ \left\{a_j\right\} & \sim \text { iid } N\left(0, \tau^2\right) \\ \left\{\epsilon_{i, j}\right\} & \sim \text { iid } N\left(0, \sigma^2\right) \end{aligned} \]

Express this model as \(\mathbf{y}_j=\mathbf{X}_j \boldsymbol{\beta}+\mathbf{Z}_j \mathbf{a}_j+\boldsymbol{\epsilon}_{\boldsymbol{j}}\)

  • Regression parameters:

\[ \beta=\mu, a_j=a_j \]

  • Design matrices:

\[ \mathbf{X}_j=\mathbf{Z}_j=\left[\begin{array}{c} 1 \\ \vdots \\ 1 \end{array}\right] \quad \text { for each } j \in\{1, \ldots, m\} \]

Code: One-way random effects model, AKA the HNM

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y.nels ~ 1 + (1 | g.nels)

     AIC      BIC   logLik deviance df.resid 
 93919.3  93941.7 -46956.6  93913.3    12971 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.8112 -0.6534  0.0093  0.6732  4.6999 

Random effects:
 Groups   Name        Variance Std.Dev.
 g.nels   (Intercept) 23.63    4.861   
 Residual             73.71    8.585   
Number of obs: 12974, groups:  g.nels, 684

Fixed effects:
            Estimate Std. Error t value
(Intercept)  50.9391     0.2026   251.4

Theory: Group-specific linear regression

\[ \begin{aligned} y_{i, j} & =\beta^T \mathbf{x}_{i, j}+\mathbf{a}_j^T \mathbf{x}_{i, j}+\epsilon_{i, j} \\ \left\{\mathbf{a}_j\right\} & \sim \text { iid } N(0, \Psi) \\ \left\{\epsilon_{i, j}\right\} & \sim \text { iid } N\left(0, \sigma^2\right) \end{aligned} \]

Express this model as \(\mathbf{y}_j=\mathbf{X}_j \boldsymbol{\beta}+\mathbf{Z}_j \mathbf{a}_j+\boldsymbol{\epsilon}_j\)

  • Design matrices: \[ \mathbf{X}_j=\mathbf{Z}_j=\left[\begin{array}{c} \mathbf{x}_{1, j} \rightarrow \\ \vdots \\ \mathbf{x}_{n_j j} \rightarrow \end{array}\right] \quad \text { fir each } j \in\{1, \ldots, m\} \]

  • Regression parameters: \(\boldsymbol{\beta}=\boldsymbol{\beta}, \mathbf{a}_j=\mathbf{a}_j\)

  • Covariance terms: \(\Psi=\operatorname{Cov}\left[\mathbf{a}_j\right], \Sigma=\sigma^2 \mathbf{I}\)

This is just a special case where \(\mathbf{X}_j=\mathbf{Z}_j\).

Code: Group-specific linear regression

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y.nels ~ ses.nels + (ses.nels | g.nels)

     AIC      BIC   logLik deviance df.resid 
 92553.1  92597.9 -46270.5  92541.1    12968 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.8910 -0.6382  0.0179  0.6669  4.4613 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 g.nels   (Intercept) 12.223   3.496        
          ses.nels     1.515   1.231    0.11
 Residual             67.345   8.206        
Number of obs: 12974, groups:  g.nels, 684

Fixed effects:
            Estimate Std. Error t value
(Intercept)  50.6767     0.1551   326.7
ses.nels      4.3594     0.1231    35.4

Correlation of Fixed Effects:
         (Intr)
ses.nels 0.007 

General LME

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y.nels ~ flp.nels + ses.nels + flp.nels * ses.nels + (ses.nels |  
    g.nels)

     AIC      BIC   logLik deviance df.resid 
 92396.3  92456.0 -46190.1  92380.3    12966 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9773 -0.6417  0.0201  0.6659  4.5202 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 g.nels   (Intercept)  9.012   3.002        
          ses.nels     1.571   1.254    0.06
 Residual             67.260   8.201        
Number of obs: 12974, groups:  g.nels, 684

Fixed effects:
                  Estimate Std. Error t value
(Intercept)        55.3975     0.3860 143.524
flp.nels           -2.4062     0.1819 -13.230
ses.nels            4.4909     0.3326  13.500
flp.nels:ses.nels  -0.1931     0.1587  -1.216

Correlation of Fixed Effects:
            (Intr) flp.nl ss.nls
flp.nels    -0.930              
ses.nels    -0.158  0.088       
flp.nls:ss.  0.086 -0.007 -0.926

Testing

Data: NULL
Models:
fit.0: y.nels ~ 1 + (1 | g.nels)
fit.1: y.nels ~ ses.nels + (ses.nels | g.nels)
      npar   AIC   BIC logLik deviance  Chisq Df Pr(>Chisq)    
fit.0    3 93919 93942 -46957    93913                         
fit.1    6 92553 92598 -46271    92541 1372.2  3  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Data: NULL
Models:
fit.1: y.nels ~ ses.nels + (ses.nels | g.nels)
fit.2: y.nels ~ flp.nels + ses.nels + flp.nels * ses.nels + (ses.nels | g.nels)
      npar   AIC   BIC logLik deviance Chisq Df Pr(>Chisq)    
fit.1    6 92553 92598 -46271    92541                        
fit.2    8 92396 92456 -46190    92380 160.8  2  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

LME - Is it worth it?


Call:
lm(formula = y.nels ~ flp.nels + ses.nels + flp.nels * ses.nels)

Residuals:
    Min      1Q  Median      3Q     Max 
-36.107  -5.758   0.142   5.977  33.538 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        54.8442     0.2280  240.50   <2e-16 ***
flp.nels           -2.0809     0.1075  -19.36   <2e-16 ***
ses.nels            4.9058     0.2810   17.46   <2e-16 ***
flp.nels:ses.nels  -0.1279     0.1361   -0.94    0.347    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.754 on 12970 degrees of freedom
Multiple R-squared:  0.2028,    Adjusted R-squared:  0.2026 
F-statistic:  1100 on 3 and 12970 DF,  p-value: < 2.2e-16

LME - Is it worth it?

  1. OLS is still unbiased, but not BLUE
  • Unbiasedness: The OLS estimate for \(\boldsymbol{\beta}\) remains unbiased if the random effects \(\mathbf{a}_j\) are independent of the predictors \(\mathbf{X}_j\). This means that \(\mathbb{E}\left[\mathbf{y}_j \mid \mathbf{X}_j\right]=\mathbf{X}_j \boldsymbol{\beta}\), so OLS can recover the fixed-effect coefficients without bias.

  • Not BLUE

  1. Variance of OLS estimate is no longer \(\sigma^2\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1}\)
  • In the presence of random effects \(\left(\mathbf{a}_j\right)\), the true variance of \(\mathbf{y}_j\) includes both:
    • Within-group error variance ( \(\sigma^2\), from \(\epsilon_j\) ).
    • Between-group variance (from \(\mathbf{Z}_j \mathbf{a}_j\) ).

LME - Is it worth it?

  1. BLUE is (approximately) the MLE returned by lmer
  • The Best Linear Unbiased Estimator (BLUE) for \(\boldsymbol{\beta}\) accounts for both fixed effects ( \(\mathbf{X}_j\) ) and random effects \(\left(\mathbf{Z}_j \mathbf{a}_j\right)\) in the data. This is achieved by using the mixed-effects model, where the estimates are computed by maximizing the marginal likelihood of the data: \[\mathbf{y}_j \sim N\left(\mathbf{X}_j \boldsymbol{\beta}, \mathbf{V}\right)\] where \(\mathbf{V}=\mathbf{Z}_j \mathbf{\Psi} \mathbf{Z}_j^{\top}+\sigma^2 \mathbf{I}\) is the covariance matrix incorporating random effects ( \(\left.\mathbf{\Psi}\right)\) and residual variance ( \(\sigma^2\) ).

  • The lmer function in R uses the maximum likelihood (ML) methods to estimate \(\boldsymbol{\beta}, \mathrm{\Psi}\), and \(\sigma^2\), which are optimal under the assumed model.

To Sum Up

Why Use lmer?

The lmer function (from the lme4 package) provides:

  1. Efficient Estimates: It calculates fixed-effect coefficients \((\boldsymbol{\beta})\) and random-effect variances (\(\Psi\)) simultaneously, producing estimates that are optimal under the model.
  2. Correct Standard Errors: It computes standard errors that account for the hierarchical structure of the data, reflecting both within-group and between-group variability.
  3. Flexibility: Allows modeling of complex group structures (e.g., random intercepts, random slopes, or crossed random effects).

Source

  • Dr. Colin rundel STA 523 Slides
  • Dr. David Banks STA 521 Slides
  • Dr. Peter Hoff STA 610
  • An Introduction to Statistical Learning(ISLR)

Thank you!

Happy Thanksgiving!