STA521 Lab 09

Justin Kao

Course Tools

RStudio Online: Your Data Science Playground

Access RStudio Online Here!

Browser-based: Access RStudio right from your browser—no setup headaches.
Duke VPN or Campus Connection: Seamlessly connect from anywhere using Duke VPN (Cisco) or while on campus.
Cloud-Powered: Harness the power of online resources—save your computer’s CPU and GPU for other tasks.
Hassle-Free Setup: No need to run install.packages—everything is pre-installed and ready to go!

Source: Dr. Colin Rundel STA 523

Introduction to Linear Mixed-Effects Models

Motivation

Heteroscedasticity accross groups

Levene’s test

Idea: If \(\sigma_j^2\) is large, then \(\left|y_{i, j}-\bar{y}_j\right|=\left|\hat{\epsilon}_{i, j}\right|\) should be large.

- Let \(z_{i, j}=\left|\hat{\epsilon}_{i j}\right|\)

- Use the ANOVA F-test for across-group differences in the \(z_{i, j}\) ’s

fit.nels<-lm(y.nels~as.factor(g.nels))
z.nels<-abs( fit.nels$res )
anova(lm(z.nels~as.factor(g.nels)) )

Analysis of Variance Table

Response: z.nels
                     Df Sum Sq Mean Sq F value    Pr(>F)    
as.factor(g.nels)   683  27078  39.645  1.6092 < 2.2e-16 ***
Residuals         12290 302776  24.636                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Variation Source

  school enroll flp public urbanicity hwh   ses mscore
1   1011      5   3      1      urban   2 -0.23  52.11
2   1011      5   3      1      urban   0  0.69  57.65
3   1011      5   3      1      urban   4 -0.68  66.44
4   1011      5   3      1      urban   5 -0.89  44.68
5   1011      5   3      1      urban   3 -1.28  40.57

Whether Public School or Not Matters?

ygstdv.nels<-c(tapply(y.nels,g.nels,sd))
boxplot(ygstdv.nels~pub.g.nels,col="lightblue")

Possible Explanations 1

Variation across schools attributable to student-level variation in SES

Possible Explanations 2

Variance across schools partially attributable to student-level variantion in SES

Possible Explanations 3

Variance across schools not attributable to student-level variantion in SES

General Setup of LME

\[\begin{gathered}\mathbf{y}_j=\mathbf{X}_j \boldsymbol{\beta}+\mathbf{Z}_j \mathbf{a}_j+\epsilon_j \\ \mathbf{E}\left[\begin{array}{l}\mathbf{a}_j \\ \epsilon_j\end{array}\right]=\left[\begin{array}{l}0 \\ 0\end{array}\right] \text { and } \operatorname{Cov}\left[\begin{array}{l}\mathbf{a}_j \\ \epsilon_j\end{array}\right]=\left[\begin{array}{ll}\Psi & 0 \\ 0 & \Sigma\end{array}\right]\end{gathered}\]

Across-group heterogeneity: \(\Psi\) is the variance-covariance in \(\mathbf{a}_1, \ldots, \mathbf{a}_m\).

Within-group heterogeneity: \(\Sigma\) is the variance-covariance of \(y_{1_j}, \ldots, y_{n_j j}\).

Theory: One-way random effects model, AKA the HNM

\[ \begin{aligned} y_{i, j} & =\mu+a_j+\epsilon_{i, j} \\ \left\{a_j\right\} & \sim \text { iid } N\left(0, \tau^2\right) \\ \left\{\epsilon_{i, j}\right\} & \sim \text { iid } N\left(0, \sigma^2\right) \end{aligned} \]

Express this model as \(\mathbf{y}_j=\mathbf{X}_j \boldsymbol{\beta}+\mathbf{Z}_j \mathbf{a}_j+\boldsymbol{\epsilon}_{\boldsymbol{j}}\)

Regression parameters:

\[ \beta=\mu, a_j=a_j \]

Design matrices:

\[ \mathbf{X}_j=\mathbf{Z}_j=\left[\begin{array}{c} 1 \\ \vdots \\ 1 \end{array}\right] \quad \text { for each } j \in\{1, \ldots, m\} \]

Code: One-way random effects model, AKA the HNM

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y.nels ~ 1 + (1 | g.nels)

     AIC      BIC   logLik deviance df.resid 
 93919.3  93941.7 -46956.6  93913.3    12971 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.8112 -0.6534  0.0093  0.6732  4.6999 

Random effects:
 Groups   Name        Variance Std.Dev.
 g.nels   (Intercept) 23.63    4.861   
 Residual             73.71    8.585   
Number of obs: 12974, groups:  g.nels, 684

Fixed effects:
            Estimate Std. Error t value
(Intercept)  50.9391     0.2026   251.4

Theory: Group-specific linear regression

\[ \begin{aligned} y_{i, j} & =\beta^T \mathbf{x}_{i, j}+\mathbf{a}_j^T \mathbf{x}_{i, j}+\epsilon_{i, j} \\ \left\{\mathbf{a}_j\right\} & \sim \text { iid } N(0, \Psi) \\ \left\{\epsilon_{i, j}\right\} & \sim \text { iid } N\left(0, \sigma^2\right) \end{aligned} \]

Express this model as \(\mathbf{y}_j=\mathbf{X}_j \boldsymbol{\beta}+\mathbf{Z}_j \mathbf{a}_j+\boldsymbol{\epsilon}_j\)

Design matrices: \[ \mathbf{X}_j=\mathbf{Z}_j=\left[\begin{array}{c} \mathbf{x}_{1, j} \rightarrow \\ \vdots \\ \mathbf{x}_{n_j j} \rightarrow \end{array}\right] \quad \text { fir each } j \in\{1, \ldots, m\} \]
Regression parameters: \(\boldsymbol{\beta}=\boldsymbol{\beta}, \mathbf{a}_j=\mathbf{a}_j\)
Covariance terms: \(\Psi=\operatorname{Cov}\left[\mathbf{a}_j\right], \Sigma=\sigma^2 \mathbf{I}\)

This is just a special case where \(\mathbf{X}_j=\mathbf{Z}_j\).

Code: Group-specific linear regression

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y.nels ~ ses.nels + (ses.nels | g.nels)

     AIC      BIC   logLik deviance df.resid 
 92553.1  92597.9 -46270.5  92541.1    12968 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.8910 -0.6382  0.0179  0.6669  4.4613 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 g.nels   (Intercept) 12.223   3.496        
          ses.nels     1.515   1.231    0.11
 Residual             67.345   8.206        
Number of obs: 12974, groups:  g.nels, 684

Fixed effects:
            Estimate Std. Error t value
(Intercept)  50.6767     0.1551   326.7
ses.nels      4.3594     0.1231    35.4

Correlation of Fixed Effects:
         (Intr)
ses.nels 0.007

General LME

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y.nels ~ flp.nels + ses.nels + flp.nels * ses.nels + (ses.nels |  
    g.nels)

     AIC      BIC   logLik deviance df.resid 
 92396.3  92456.0 -46190.1  92380.3    12966 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9773 -0.6417  0.0201  0.6659  4.5202 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 g.nels   (Intercept)  9.012   3.002        
          ses.nels     1.571   1.254    0.06
 Residual             67.260   8.201        
Number of obs: 12974, groups:  g.nels, 684

Fixed effects:
                  Estimate Std. Error t value
(Intercept)        55.3975     0.3860 143.524
flp.nels           -2.4062     0.1819 -13.230
ses.nels            4.4909     0.3326  13.500
flp.nels:ses.nels  -0.1931     0.1587  -1.216

Correlation of Fixed Effects:
            (Intr) flp.nl ss.nls
flp.nels    -0.930              
ses.nels    -0.158  0.088       
flp.nls:ss.  0.086 -0.007 -0.926

Testing

Data: NULL
Models:
fit.0: y.nels ~ 1 + (1 | g.nels)
fit.1: y.nels ~ ses.nels + (ses.nels | g.nels)
      npar   AIC   BIC logLik deviance  Chisq Df Pr(>Chisq)    
fit.0    3 93919 93942 -46957    93913                         
fit.1    6 92553 92598 -46271    92541 1372.2  3  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Data: NULL
Models:
fit.1: y.nels ~ ses.nels + (ses.nels | g.nels)
fit.2: y.nels ~ flp.nels + ses.nels + flp.nels * ses.nels + (ses.nels | g.nels)
      npar   AIC   BIC logLik deviance Chisq Df Pr(>Chisq)    
fit.1    6 92553 92598 -46271    92541                        
fit.2    8 92396 92456 -46190    92380 160.8  2  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

LME - Is it worth it?


Call:
lm(formula = y.nels ~ flp.nels + ses.nels + flp.nels * ses.nels)

Residuals:
    Min      1Q  Median      3Q     Max 
-36.107  -5.758   0.142   5.977  33.538 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        54.8442     0.2280  240.50   <2e-16 ***
flp.nels           -2.0809     0.1075  -19.36   <2e-16 ***
ses.nels            4.9058     0.2810   17.46   <2e-16 ***
flp.nels:ses.nels  -0.1279     0.1361   -0.94    0.347    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.754 on 12970 degrees of freedom
Multiple R-squared:  0.2028,    Adjusted R-squared:  0.2026 
F-statistic:  1100 on 3 and 12970 DF,  p-value: < 2.2e-16

LME - Is it worth it?

OLS is still unbiased, but not BLUE

Unbiasedness: The OLS estimate for \(\boldsymbol{\beta}\) remains unbiased if the random effects \(\mathbf{a}_j\) are independent of the predictors \(\mathbf{X}_j\). This means that \(\mathbb{E}\left[\mathbf{y}_j \mid \mathbf{X}_j\right]=\mathbf{X}_j \boldsymbol{\beta}\), so OLS can recover the fixed-effect coefficients without bias.
Not BLUE

Variance of OLS estimate is no longer \(\sigma^2\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1}\)

In the presence of random effects \(\left(\mathbf{a}_j\right)\), the true variance of \(\mathbf{y}_j\) includes both:
- Within-group error variance ( \(\sigma^2\), from \(\epsilon_j\) ).
- Between-group variance (from \(\mathbf{Z}_j \mathbf{a}_j\) ).

LME - Is it worth it?

BLUE is (approximately) the MLE returned by lmer

The Best Linear Unbiased Estimator (BLUE) for \(\boldsymbol{\beta}\) accounts for both fixed effects ( \(\mathbf{X}_j\) ) and random effects \(\left(\mathbf{Z}_j \mathbf{a}_j\right)\) in the data. This is achieved by using the mixed-effects model, where the estimates are computed by maximizing the marginal likelihood of the data: \[\mathbf{y}_j \sim N\left(\mathbf{X}_j \boldsymbol{\beta}, \mathbf{V}\right)\] where \(\mathbf{V}=\mathbf{Z}_j \mathbf{\Psi} \mathbf{Z}_j^{\top}+\sigma^2 \mathbf{I}\) is the covariance matrix incorporating random effects ( \(\left.\mathbf{\Psi}\right)\) and residual variance ( \(\sigma^2\) ).
The lmer function in R uses the maximum likelihood (ML) methods to estimate \(\boldsymbol{\beta}, \mathrm{\Psi}\), and \(\sigma^2\), which are optimal under the assumed model.

To Sum Up

Why Use lmer?

The lmer function (from the lme4 package) provides:

Efficient Estimates: It calculates fixed-effect coefficients \((\boldsymbol{\beta})\) and random-effect variances (\(\Psi\)) simultaneously, producing estimates that are optimal under the model.
Correct Standard Errors: It computes standard errors that account for the hierarchical structure of the data, reflecting both within-group and between-group variability.
Flexibility: Allows modeling of complex group structures (e.g., random intercepts, random slopes, or crossed random effects).

Source

Dr. Colin rundel STA 523 Slides
Dr. David Banks STA 521 Slides
Dr. Peter Hoff STA 610
An Introduction to Statistical Learning(ISLR)

Thank you!

Happy Thanksgiving!