DESCRIPTION OF THE DATASETS

During the exercise sessions, we will use 2 datasets: the hip fracture dataset and the SII dataset. A description of both datasets is given here:

SII Dataset

The data come from the SII (Study of Instructional Improvement), which was carried out to assess the math achievement scores of first- and third-grade pupils in randomly selected classrooms from a national US sample of elementary schools. The dataset includes results for 1190 first-grade pupils sampled from 312 classrooms in 107 schools.
From the sample of elementary schools, all first-grade classrooms were considered (1 to 9). Within a classroom, 1 to 10 children were sampled to take a test on math. The following variables were measured: * schoolid: school’s ID number
* classid: classroom’s ID number
* childid: pupil’s ID number
* sex: an indicator variable for sex (0 for males, 1 for females)
* mathkind: pupil’s math score in the spring of the kindergarten year
* mathgain: pupil’s gain in the math achievement score from the spring of kindergarten to the spring of first grade
* ses: pupil’s socioeconomic status
* housepov: % of households in the neighborhood of the school below the poverty level
* yearstea: years of teacher’s experience in teaching in the first grade

Hip fracture dataset

Hip fractures are a very common and important health problem among elderly people. They are often the result of an unfortunate, seemingly innocent fall in the patient’s home environment. Less fluent movements combined with bad vision and impaired reaction skills make older people more at risk for such falls. The weaker bones and especially the presence of osteoporosis also causes the hip to break more easily. The hip surgery needed is a serious operation and can involve a lot of blood loss. A relatively long recovery period (8 to 12 days) in the hospital is therefore a real necessity.
The combination of these overwhelming events can also cause the mental status of the patients to worsen. When this deterioration is very severe, typically causing a loss of orientation in time and space, this phenomenon is called a delirium. In daily practice, the ‘Mini-Mental State Examination score’ (MMSE) is sometimes being used to measure the cognitive status of patients. It is the sum of 30 binary outcomes which result from correct (1) or incorrect (0) answer to specific questions (e.g. ‘Where are you now?’). Obviously, higher MMSE values indicate better cognitive functioning.

It is of interest to study how this MMSE evolution depends on certain characteristics of the patients. The data set used for this purpose contains the MMSE score for 60 patients, at 1, 3, 5, 8 and 12 days after surgery. For each patient the following variables are measured:

IDNR: id of the patient
NEURO: pre-operative neuro-status (1: neuro-psychiatric, 0: not neuro-psychiatric)
AGE: age of the patient
MMSE: Mini-Mental State Examination score
TIME: days after surgery: 1, 3, 5, 8 and 12.

1 PART ONE: EXPLORATORY & GRAPHICAL ANALYSIS

Describe in detail why in both datasets, one cannot make use of basic statistical models, like linear regression or ANOVA. What assumptions are violated? What special kind of data is the hip fracture dataset?

Classic ANOVA and regression assume all observations are independent. Here, we have related data - observations are not independent:

hip fracture : multiple measurements (of MMSE) over time in the same individual = longitudinal study
SII : here there is non-independence at two levels:
- pupils within one class are not independent.
- classes within schools are not independent

The purpose of multilevel (or mixed) models is to account for the relatedness between these observations

#Loading the required packages
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.5.3

1.1 SII Dataset

# Read in dataset
siiData<-read.table("SII.txt",header=T)

# Coerce schoolid, classid, childid, sex and ses to a factor
siiData$schoolid<-as.factor(siiData$schoolid)
siiData$classid<-as.factor(siiData$classid)
siiData$childid<-as.factor(siiData$childid)
siiData$sex<-as.factor(siiData$sex)

# Rename the factor levels of 'sex' for easier interpretation
levels(siiData$sex)<-c("male","female")

# Generating the Individual-vs.-Aggregate Dot Plot 
group_means <- siiData %>%
  group_by(schoolid,sex) %>%
  summarize(Mean_mathgain = mean(mathgain), .groups = "drop")

ggplot(group_means, aes(x = schoolid, y = Mean_mathgain, col = sex)) +
  geom_point(alpha = 0.7, size = 2.5) +
  labs(
    title = "Distribution of mathgain for males and females across schools",
    x = "School ID (Cluster)",
    y = "Mathgain",
    color = "Sex of student"
  ) +
  theme_minimal()

1.2 Hip fracture dataset

# Read in dataset
hipData<-read.table("hipfracture2.txt",header = T)

# coercion of categorical variables
hipData$IDNR<-as.factor(hipData$IDNR)
hipData$NEURO<-as.factor(hipData$NEURO)

# Rename the factor levels of 'NEURO' for easier interpretation
levels(hipData$NEURO)<-c("not neuro-psy","neuro-psy")

# one plot with all the patients, using a different color for neuro and non-Neuro patients
ggplot(hipData, aes(x = TIME, y = MMSE, group = IDNR, col=NEURO)) +
  geom_line(alpha = 0.6) +
  geom_point(alpha = 0.6) +
  labs(title = "Individual Trajectory Plot", y = "MMSE")
#> Warning: Removed 30 rows containing missing values or values outside the scale range
#> (`geom_line()`).
#> Warning: Removed 33 rows containing missing values or values outside the scale range
#> (`geom_point()`).


# two separate plots for the neuro and non-neuro persons
ggplot(hipData, aes(x = TIME, y = MMSE, group = IDNR, col = NEURO)) +
  geom_line(alpha = 0.6) +
  geom_point(alpha = 0.6) +
  # Split the plot into separate panels based on the NEURO variable
  facet_wrap(~NEURO) + 
  labs(
    title = "Individual Trajectory Plots by Neuro Status", 
    y = "MMSE",
    x = "Time"
  )
#> Warning: Removed 30 rows containing missing values or values outside the scale range
#> (`geom_line()`).
#> Removed 33 rows containing missing values or values outside the scale range
#> (`geom_point()`).

2 PART TWO: EXPLORATORY & GRAPHICAL ANALYSIS

Calculate the ICC and the Design Effect for both datasets. Is there an indication that a multilevel model might be necessary?

2.1 SII Dataset

library(lme4)
#> Loading required package: Matrix
library(performance)
#> Warning: package 'performance' was built under R version 4.5.3

fit_null_sii <- lme4::lmer(mathgain ~ 1 + (1 | schoolid), data = siiData)
summary(fit_null_sii)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: mathgain ~ 1 + (1 | schoolid)
#>    Data: siiData
#> 
#> REML criterion at convergence: 11776.7
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -4.8127 -0.5903 -0.0186  0.5449  5.6632 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  schoolid (Intercept)  109.2   10.45   
#>  Residual             1094.0   33.08   
#> Number of obs: 1190, groups:  schoolid, 107
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)   57.583      1.442   39.95

# Calculate ICC
ICC_sii <- 109.2/(109.2+1094); ICC_sii
#> [1] 0.09075798
icc(fit_null_sii)
#> # Intraclass Correlation Coefficient
#> 
#>     Adjusted ICC: 0.091
#>   Unadjusted ICC: 0.091

# Calculate Average Cluster Size
avg_n_sii <- mean(table(siiData$schoolid))

# Calculate Design Effect
design_sii <- 1+(avg_n_sii-1)*ICC_sii ; design_sii
#> [1] 1.918606

The ICC is quite low (<0.1) and the design effect is below 2. So there is very few indication that a large proportion of the variance in the data is due to the variance between schools.

2.2 Hip fracture dataset

fit_null_hip <- lme4::lmer(MMSE ~ 1 + (1 | IDNR), data = hipData)
summary(fit_null_hip)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: MMSE ~ 1 + (1 | IDNR)
#>    Data: hipData
#> 
#> REML criterion at convergence: 1483.8
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -4.2710 -0.3834  0.0528  0.4960  3.0246 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  IDNR     (Intercept) 56.879   7.542   
#>  Residual              6.782   2.604   
#> Number of obs: 267, groups:  IDNR, 60
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)  20.2470     0.9871   20.51
ICC_hip <- 56.879/(56.879+6.782); ICC_hip
#> [1] 0.893467
icc(fit_null_hip)
#> # Intraclass Correlation Coefficient
#> 
#>     Adjusted ICC: 0.893
#>   Unadjusted ICC: 0.893

# Calculate Average Cluster Size
avg_n_hip <- mean(table(hipData$IDNR))

# Calculate Design Effect
design_hip <- 1+(avg_n_hip-1)*ICC_hip ; design_hip
#> [1] 4.573868

The ICC is very high (>0.8) and the design effect is higher than 2. This indicates that a significant proportion of the variance in the data is due to the variance between patients. A multilevel model is necessary.

3 PART THREE: FITTING A MULTILEVEL MODEL + INTERPRETING THE MODEL SUMMARY

3.1 SII Dataset

The research question we want to answer is:
Is there a difference of math achievement score between males and females?

We will now look at a random intercept model. We still only consider the clustering on school-level and ignore the clustering on the class-level (this will be included in a later stage).

Write down the full model with assumptions

$mathgain_{ij} = \alpha + a_i + \beta \cdot sex_j + \epsilon_{ij}$

$a_i \sim N(0, \sigma^2_{school})$, and $\epsilon_{ij} \sim N(0, \sigma^2_{res})$ independent

Fit the model in R, discuss the within-cluster variability and between-cluster variability

model_sii_1 <- lmer(mathgain ~ sex + (1 | schoolid), data = siiData)

summary(model_sii_1)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: mathgain ~ sex + (1 | schoolid)
#>    Data: siiData
#> 
#> REML criterion at convergence: 11772.3
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -4.8455 -0.6000 -0.0199  0.5417  5.6377 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  schoolid (Intercept)  109.3   10.45   
#>  Residual             1093.8   33.07   
#> Number of obs: 1190, groups:  schoolid, 107
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)   58.658      1.749  33.539
#> sexfemale     -2.133      1.966  -1.085
#> 
#> Correlation of Fixed Effects:
#>           (Intr)
#> sexfemale -0.566

The between-school variability is equal to $109.3$.
The within-school variability is equal to $1093.8$.
The total variability is equal to $109.3+1093.8=1203.1$.
The ICC (variability due to differences between schools) is equal to $109.3/1203.1=0.091$, meaning 9% of the variance is due to differences between schools, the rest of the variability is due to differences within a school.

What is the average mathgain for female students (across all schools)?

According to the model, the average mathgain for female students across all school is: \[ 58.658 - 2.133 = 56.525 \]

3.2 Hip fracture dataset

We want to study how MMSE develops over time, depending on certain characteristics of a patient. Fit a random intercept model to investigate this research question, taking into account the pre-operative neuro-status of the patient.

Write down the model with assumptions.

$MMSE_{ij} = \alpha + a_i + \beta_1 \cdot NEURO_j + \beta_2 \cdot TIME_j + \beta_3 \cdot NEURO_jTIME_j + \epsilon_{ij}$

$a_i \sim N(0, \sigma^2_{patient})$, and $\epsilon_{ij} \sim N(0, \sigma^2_{res})$ independent

Discuss the variability between and within patients.

model_hip_1 <- lmer(MMSE ~ TIME + NEURO + TIME:NEURO + (1 | IDNR), data = hipData)

summary(model_hip_1)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: MMSE ~ TIME + NEURO + TIME:NEURO + (1 | IDNR)
#>    Data: hipData
#> 
#> REML criterion at convergence: 1452.8
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -4.1447 -0.4416  0.0153  0.4882  2.8102 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  IDNR     (Intercept) 41.970   6.478   
#>  Residual              6.316   2.513   
#> Number of obs: 267, groups:  IDNR, 60
#> 
#> Fixed effects:
#>                     Estimate Std. Error t value
#> (Intercept)         22.01172    1.07702  20.438
#> TIME                 0.19753    0.05240   3.770
#> NEUROneuro-psy      -8.03853    1.86904  -4.301
#> TIME:NEUROneuro-psy -0.05981    0.09642  -0.620
#> 
#> Correlation of Fixed Effects:
#>             (Intr) TIME   NEURO-
#> TIME        -0.254              
#> NEUROnr-psy -0.576  0.146       
#> TIME:NEURO-  0.138 -0.543 -0.258

The between-patient variability $\sigma^2_{patient}$ is equal to $41.970$.
The within-patient variability $\sigma^2_{res}$ is equal to $6.316$.
The total variability $\sigma^2_{patient}+\sigma^2_{res}$ is equal to $41.970+6.316=48.286$.
The ICC (variability due to differences between patients) is equal to $41.970/48.286=0.869$, meaning 86.9% of the variance is due to differences between patients, the rest of the variability is due to differences within a patient.

What is the marginal model (i.e. average model) that describes the MMSE over time for not neuro-psychiatric patients?

$MMSE_{NotNeuro} = 22.01172 + 0.19753 \cdot TIME$

4 PART FOUR: PARAMETER INFERENCE

4.1 SII Dataset

Based on the random intercept model that you fit in the previous part, give an answer to the question: Is there a difference of math achievement score between males and females?

library(lmerTest)
#> Warning: package 'lmerTest' was built under R version 4.5.3
#> 
#> Attaching package: 'lmerTest'
#> The following object is masked from 'package:lme4':
#> 
#>     lmer
#> The following object is masked from 'package:stats':
#> 
#>     step
library(pbkrtest)
#> Warning: package 'pbkrtest' was built under R version 4.5.3

model_sii_full <- lme4::lmer(mathgain ~ sex + (1 | schoolid), data = siiData, REML=FALSE)
model_sii_red <- lme4::lmer(mathgain ~  (1 | schoolid), data = siiData, REML=FALSE)

# LRT Test

anova(model_sii_red,model_sii_full)
#> Data: siiData
#> Models:
#> model_sii_red: mathgain ~ (1 | schoolid)
#> model_sii_full: mathgain ~ sex + (1 | schoolid)
#>                npar   AIC   BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)
#> model_sii_red     3 11785 11800 -5889.6     11779                     
#> model_sii_full    4 11786 11806 -5889.0     11778 1.1784  1     0.2777

# Wald test

model_sii_wald <- lmerTest::lmer(mathgain ~ sex + (1 | schoolid), data = siiData)

anova(model_sii_wald, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#>     Sum Sq Mean Sq NumDF  DenDF F value Pr(>F)
#> sex 1285.8  1285.8     1 1166.3  1.1755 0.2785

# Bootstrap

bootstrap_test <- PBmodcomp(model_sii_full, model_sii_red, nsim = 500)
bootstrap_test
#> Bootstrap test; time: 27.60 sec; samples: 500; extremes: 131;
#> large : mathgain ~ sex + (1 | schoolid)
#>          stat df p.value
#> LRT    1.1784  1  0.2777
#> PBtest 1.1784     0.2635

All the tests show a p-value larger than 0.05, indicating that there is no significant difference in math achievement scores between males and females.

4.2 Hip fracture dataset

Based on the random intercept model that you fit in the previous part, is there a different evolution of MMSE score in the two neuro-status groups?


model_hip_full <- lme4::lmer(MMSE ~ TIME + NEURO + TIME:NEURO + (1 | IDNR), data = hipData, REML=FALSE)
model_hip_red <- lme4::lmer(MMSE ~ TIME + NEURO + (1 | IDNR), data = hipData, REML=FALSE)

# LRT Test

anova(model_hip_red,model_hip_full)
#> Data: hipData
#> Models:
#> model_hip_red: MMSE ~ TIME + NEURO + (1 | IDNR)
#> model_hip_full: MMSE ~ TIME + NEURO + TIME:NEURO + (1 | IDNR)
#>                npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)
#> model_hip_red     5 1460.4 1478.4 -725.21    1450.4                     
#> model_hip_full    6 1462.0 1483.5 -725.01    1450.0 0.3915  1     0.5315

# Wald test

model_hip_wald <- lmerTest::lmer(MMSE ~ TIME + NEURO + TIME:NEURO + (1 | IDNR), data = hipData)

anova(model_hip_wald, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#>             Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
#> TIME        76.323  76.323     1 206.509 12.0849 0.0006196 ***
#> NEURO      116.823 116.823     1  66.554 18.4975 5.693e-05 ***
#> TIME:NEURO   2.429   2.429     1 206.509  0.3846 0.5358332    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Bootstrap

#bootstrap_test <- PBmodcomp(model_hip_full, model_hip_red, nsim = 500)
#bootstrap_test

Both the LRT test and the Wald test show a p-value above 0.05, indicating that there is no significantly different evolution of MMSE score in the two neuro-status groups. This means that we can remove the interaction term between TIME and NEURO from the model.

Note that if you perform parametric bootstrapping here, you get an error (that’s why the code was not run here). This is due to the fact that there are missing data in the dataset. If you want to perform parametric bootstrapping here, you have to limit your dataset to only include the complete cases.

Based on your conclusion from the previous question, fit a new appropriate model. Is there a significant MMSE evolution over time?


model_hip_full_2 <- lme4::lmer(MMSE ~ TIME + NEURO + (1 | IDNR), data = hipData, REML=FALSE)
model_hip_red_2 <- lme4::lmer(MMSE ~  NEURO + (1 | IDNR), data = hipData, REML=FALSE)

# LRT Test

anova(model_hip_red_2,model_hip_full_2)
#> Data: hipData
#> Models:
#> model_hip_red_2: MMSE ~ NEURO + (1 | IDNR)
#> model_hip_full_2: MMSE ~ TIME + NEURO + (1 | IDNR)
#>                  npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)    
#> model_hip_red_2     4 1474.6 1488.9 -733.29    1466.6                         
#> model_hip_full_2    5 1460.4 1478.4 -725.21    1450.4 16.161  1  5.819e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Wald test

model_hip_wald_2 <- lmerTest::lmer(MMSE ~ TIME + NEURO + (1 | IDNR), data = hipData)

anova(model_hip_wald_2, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#>       Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
#> TIME  105.60  105.60     1 207.139  16.781 6.018e-05 ***
#> NEURO 133.87  133.87     1  58.068  21.274 2.238e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Bootstrap

#bootstrap_test <- PBmodcomp(model_hip_full_2, model_hip_red_2, nsim = 500)
#bootstrap_test

Both the LRT test and the Wald test show a p-value below 0.05, indicating that there is a significant MMSE evolution over time.

Looking at the profiles from the hipfracture dataset (see figure produced in PART ONE of the exercises), it might be of interest to include a quadratic time effect.

hipData$TIME2 <- hipData$TIME* hipData$TIME

Fit the model. Is the quadratic time effect needed here?

model_hip_full_3 <- lme4::lmer(MMSE ~ TIME + NEURO + TIME2 + (1 | IDNR), data = hipData, REML=FALSE)
model_hip_red_3 <- lme4::lmer(MMSE ~  NEURO + TIME + (1 | IDNR), data = hipData, REML=FALSE)

# LRT Test

anova(model_hip_red_3,model_hip_full_3)
#> Data: hipData
#> Models:
#> model_hip_red_3: MMSE ~ NEURO + TIME + (1 | IDNR)
#> model_hip_full_3: MMSE ~ TIME + NEURO + TIME2 + (1 | IDNR)
#>                  npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)
#> model_hip_red_3     5 1460.4 1478.4 -725.21    1450.4                     
#> model_hip_full_3    6 1460.3 1481.8 -724.16    1448.3 2.0937  1     0.1479

# Wald test

model_hip_wald_3 <- lmerTest::lmer(MMSE ~ TIME + NEURO + TIME2 + (1 | IDNR), data = hipData)

anova(model_hip_wald_3, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#>        Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
#> TIME   38.825  38.825     1 205.200  6.2006   0.01356 *  
#> NEURO 133.636 133.636     1  58.068 21.3422 2.181e-05 ***
#> TIME2  13.036  13.036     1 205.338  2.0819   0.15058    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both the LRT test and the Wald test show a p-value above 0.05, indicating that there is no need for a quadratic time effect.

Include the age of the patient as an extra covariate in the model. Is there a different effect of age in both neuro groups?

model_hip_full_4 <- lme4::lmer(MMSE ~ TIME + NEURO + AGE + AGE:NEURO + (1 | IDNR), data = hipData, REML=FALSE)
model_hip_red_4 <- lme4::lmer(MMSE ~  NEURO + TIME + AGE + (1 | IDNR), data = hipData, REML=FALSE)

# LRT Test

anova(model_hip_red_4,model_hip_full_4)
#> Data: hipData
#> Models:
#> model_hip_red_4: MMSE ~ NEURO + TIME + AGE + (1 | IDNR)
#> model_hip_full_4: MMSE ~ TIME + NEURO + AGE + AGE:NEURO + (1 | IDNR)
#>                  npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)
#> model_hip_red_4     6 1440.6 1462.1 -714.29    1428.6                     
#> model_hip_full_4    7 1442.5 1467.7 -714.27    1428.5 0.0285  1      0.866

# Wald test

model_hip_wald_4 <- lmerTest::lmer(MMSE ~ TIME + NEURO + AGE + AGE:NEURO + (1 | IDNR), data = hipData)

anova(model_hip_wald_4, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#>            Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
#> TIME      101.546 101.546     1 207.506 16.1315 8.259e-05 ***
#> NEURO       0.479   0.479     1  55.947  0.0761 0.7837217    
#> AGE       106.330 106.330     1  55.997 16.8915 0.0001305 ***
#> NEURO:AGE   0.166   0.166     1  55.984  0.0263 0.8716731    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both the LRT test and the Wald test show a p-value above 0.05, indicating that there is not a significantly different effect of age in both neuro groups. This means that we can drop the interaction term from the model.

Is the age effect needed here?

model_hip_full_5 <- lme4::lmer(MMSE ~ TIME + NEURO + AGE + (1 | IDNR), data = hipData, REML=FALSE)
model_hip_red_5 <- lme4::lmer(MMSE ~  NEURO + TIME + (1 | IDNR), data = hipData, REML=FALSE)

# LRT Test

anova(model_hip_red_5,model_hip_full_5)
#> Data: hipData
#> Models:
#> model_hip_red_5: MMSE ~ NEURO + TIME + (1 | IDNR)
#> model_hip_full_5: MMSE ~ TIME + NEURO + AGE + (1 | IDNR)
#>                  npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)    
#> model_hip_red_5     5 1460.4 1478.4 -725.21    1450.4                         
#> model_hip_full_5    6 1440.6 1462.1 -714.29    1428.6 21.844  1  2.957e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Wald test

model_hip_wald_5 <- lmerTest::lmer(MMSE ~ TIME + NEURO + AGE + (1 | IDNR), data = hipData)

anova(model_hip_wald_5, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#>       Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
#> TIME  101.49  101.49     1 207.554  16.123 8.295e-05 ***
#> NEURO 162.88  162.88     1  57.117  25.875 4.223e-06 ***
#> AGE   157.65  157.65     1  57.191  25.043 5.674e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both the LRT test and the Wald test show a p-value below 0.05, indicating that there is significant effect of AGE on the MMSE value. The variable AGE should therefore be kept in the model.

5 PART FIVE: NESTED EFFECTS

5.1 SII Dataset

Give and fit the model, testing if there is a difference in mathgain between boys and girls, taking into account clustering on both school and class levels.

Write down this model (no other covariates have to be considered yet).

$mathgain_{ijk}=\alpha +a_{1i} + a_{2i(j)} + \beta \cdot sex_k + \epsilon_{ijk}$

with $a_{1i} = $ school effect
with $a_{2i(j)}=$ class effect (within school)

Is there a difference in mathgain between boys and girls?


model_nested <- lme4::lmer(mathgain~ sex + (1|schoolid/classid), data=siiData, REML=F)
model_nested_red <- lme4::lmer(mathgain~ 1+ (1|schoolid/classid), data=siiData, REML=F)

anova(model_nested_red,model_nested )
#> Data: siiData
#> Models:
#> model_nested_red: mathgain ~ 1 + (1 | schoolid/classid)
#> model_nested: mathgain ~ sex + (1 | schoolid/classid)
#>                  npar   AIC   BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)
#> model_nested_red    4 11779 11800 -5885.7     11771                     
#> model_nested        5 11780 11805 -5884.9     11770 1.4574  1     0.2273

The p-value for the LRT is greater than 0.05, so there is no significant difference in mathgain between boys and girls.

Discuss the different variance components from the model in the previous exercise. * What fraction of the total variance is due to the variability between classes? * What fraction of the total variance is due to the variability between schools?


model_nested <- lme4::lmer(mathgain~ sex + (1|schoolid/classid), data=siiData)

summary(model_nested)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: mathgain ~ sex + (1 | schoolid/classid)
#>    Data: siiData
#> 
#> REML criterion at convergence: 11764.1
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -4.6789 -0.5889 -0.0324  0.5396  5.6126 
#> 
#> Random effects:
#>  Groups           Name        Variance Std.Dev.
#>  classid:schoolid (Intercept)  101.04  10.052  
#>  schoolid         (Intercept)   76.96   8.773  
#>  Residual                     1026.86  32.045  
#> Number of obs: 1190, groups:  classid:schoolid, 312; schoolid, 107
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)   58.615      1.747  33.547
#> sexfemale     -2.353      1.948  -1.208
#> 
#> Correlation of Fixed Effects:
#>           (Intr)
#> sexfemale -0.564

The total variance is equal to $101.04+76.96+1026.86=1204.86$
$101.04/1204.86=0.084 \rightarrow 8.4 \%$ of the total variance is due to the variability between classes.
$76.96/1204.86=0.064 \rightarrow 6.4\%$ of the total variance is due to the variability between schools.

Fit the same model, but consider the random effects for school and class levels to be crossed, rather than nested. Look at the variance components now. Do you notice a difference? Why is that?


model_crossed <- lme4::lmer(mathgain~ sex + (1|schoolid) + (1|classid), data=siiData)
summary(model_crossed)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: mathgain ~ sex + (1 | schoolid) + (1 | classid)
#>    Data: siiData
#> 
#> REML criterion at convergence: 11764.1
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -4.6789 -0.5889 -0.0324  0.5396  5.6126 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  classid  (Intercept)  101.04  10.052  
#>  schoolid (Intercept)   76.96   8.773  
#>  Residual             1026.86  32.045  
#> Number of obs: 1190, groups:  classid, 312; schoolid, 107
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)   58.615      1.747  33.547
#> sexfemale     -2.353      1.948  -1.208
#> 
#> Correlation of Fixed Effects:
#>           (Intr)
#> sexfemale -0.564

There is no difference in the estimated variance components compared to the previous model specification. This is because the variable classid is coded with globally unique identifiers for each class across the entire dataset, rather than with identifiers that repeat within schools (e.g., class 1, 2, 3 in school 1 and again class 1, 2, 3 in school 2). As a result, each class is uniquely associated with a single school, making the specification with nested random effects mathematically equivalent to the specification with crossed random effects for this dataset.

6 PART SIX: MODEL SELECTION

6.1 SII Dataset

Given the model from the previous part, with sex as a fixed effect and a random effect for school and class. Is it necessary to take into account the nested structure of classes within schools? Test whether the nested effects model performs better than a model that only includes a random effect for school, and a model that only includes a random effect for class.

model_nested <- lme4::lmer(mathgain~ sex + (1|schoolid/classid), data=siiData)
model_school <- lme4::lmer(mathgain~ sex + (1|schoolid), data=siiData)
model_class <- lme4::lmer(mathgain~ sex + (1|classid), data=siiData)

anova(model_school,model_nested)
#> refitting model(s) with ML (instead of REML)
#> Data: siiData
#> Models:
#> model_school: mathgain ~ sex + (1 | schoolid)
#> model_nested: mathgain ~ sex + (1 | schoolid/classid)
#>              npar   AIC   BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)   
#> model_school    4 11786 11806 -5889.0     11778                        
#> model_nested    5 11780 11805 -5884.9     11770 8.1813  1   0.004232 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(model_class,model_nested)
#> refitting model(s) with ML (instead of REML)
#> Data: siiData
#> Models:
#> model_class: mathgain ~ sex + (1 | classid)
#> model_nested: mathgain ~ sex + (1 | schoolid/classid)
#>              npar   AIC   BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)   
#> model_class     4 11785 11806 -5888.7     11777                        
#> model_nested    5 11780 11805 -5884.9     11770 7.4614  1   0.006303 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC(model_school,model_class,model_nested)
#>              df      AIC
#> model_school  4 11780.30
#> model_class   4 11779.89
#> model_nested  5 11774.14
BIC(model_school,model_class,model_nested)
#>              df      BIC
#> model_school  4 11800.63
#> model_class   4 11800.22
#> model_nested  5 11799.55

The LRT, AIC and BIC all indicate that the fully nested model, with random effects for school and class performs the best.

Fit one model including all covariates as main effects and as an interaction with sex, and taking into account all levels of clustering. Discuss significance of the covariates. Perform stepwise backward model selection on the fixed effects to select the best model

model_fullynested <- lmer(mathgain~ sex*(ses+housepov+yearstea+mathkind) + (1|schoolid/classid), data=siiData)
#> Warning: Some predictor variables are on very different scales: consider
#> rescaling
#> Warning: Some predictor variables are on very different scales: consider
#> rescaling
anova(model_fullynested)
#> Type III Analysis of Variance Table with Satterthwaite's method
#>              Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
#> sex             232     232     1 1134.74   0.3128   0.57608    
#> ses           15603   15603     1 1166.89  21.0357 4.995e-06 ***
#> housepov       2293    2293     1  116.20   3.0914   0.08134 .  
#> yearstea        196     196     1  245.22   0.2638   0.60801    
#> mathkind     316941  316941     1 1164.19 427.3025 < 2.2e-16 ***
#> sex:ses           0       0     1 1127.84   0.0005   0.98150    
#> sex:housepov   2645    2645     1 1116.88   3.5664   0.05922 .  
#> sex:yearstea   2850    2850     1 1113.87   3.8417   0.05024 .  
#> sex:mathkind    666     666     1 1134.65   0.8981   0.34349    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_step2 <- lmer(mathgain~ sex*(housepov+yearstea+mathkind) + ses + (1|schoolid/classid), data=siiData)
#> Warning: Some predictor variables are on very different scales: consider
#> rescaling
#> Warning: Some predictor variables are on very different scales: consider
#> rescaling
anova(model_step2)
#> Type III Analysis of Variance Table with Satterthwaite's method
#>              Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
#> sex             239     239     1 1133.03   0.3227   0.57013    
#> housepov       2292    2292     1  116.08   3.0933   0.08125 .  
#> yearstea        196     196     1  244.97   0.2649   0.60725    
#> mathkind     316920  316920     1 1165.14 427.6762 < 2.2e-16 ***
#> ses           15603   15603     1 1167.85  21.0556 4.943e-06 ***
#> sex:housepov   2713    2713     1 1120.73   3.6614   0.05594 .  
#> sex:yearstea   2852    2852     1 1114.95   3.8483   0.05005 .  
#> sex:mathkind    699     699     1 1132.67   0.9427   0.33180    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_step3 <- lmer(mathgain~ sex*(housepov+yearstea)+ ses + mathkind + (1|schoolid/classid), data=siiData)
anova(model_step3, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#>              Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
#> sex            3023    3023     1 1137.64   4.0761   0.04373 *  
#> housepov       2299    2299     1  110.12   3.1004   0.08105 .  
#> yearstea        206     206     1  255.28   0.2781   0.59841    
#> ses           15367   15367     1 1168.28  20.7233 5.861e-06 ***
#> mathkind     315980  315980     1 1164.45 426.1224 < 2.2e-16 ***
#> sex:housepov   3171    3171     1 1124.02   4.2759   0.03888 *  
#> sex:yearstea   2828    2828     1 1119.17   3.8138   0.05108 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_step4 <- lmer(mathgain~ sex*(housepov)+ ses + mathkind + yearstea + (1|schoolid/classid), data=siiData)
anova(model_step4, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#>              Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
#> sex             665     665     1 1111.80   0.8952   0.34428    
#> housepov       2164    2164     1  110.04   2.9141   0.09063 .  
#> ses           15824   15824     1 1168.19  21.3127 4.333e-06 ***
#> mathkind     317988  317988     1 1164.52 428.2735 < 2.2e-16 ***
#> yearstea        220     220     1  256.34   0.2962   0.58672    
#> sex:housepov   2336    2336     1 1116.47   3.1457   0.07640 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_step5 <- lmer(mathgain~ sex*housepov + ses + mathkind  + (1|schoolid/classid), data=siiData)
anova(model_step5, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#>              Sum Sq Mean Sq NumDF   DenDF  F value  Pr(>F)    
#> sex             670     670     1 1112.34   0.9028 0.34224    
#> housepov       2336    2336     1  108.92   3.1476 0.07883 .  
#> ses           15898   15898     1 1169.28  21.4205 4.1e-06 ***
#> mathkind     318218  318218     1 1165.81 428.7662 < 2e-16 ***
#> sex:housepov   2369    2369     1 1117.25   3.1921 0.07426 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_step6 <- lmer(mathgain~ sex + housepov + ses + mathkind  + (1|schoolid/classid), data=siiData)
anova(model_step6, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#>          Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
#> sex         458     458     1 1131.05   0.6165   0.43251    
#> housepov   2346    2346     1  108.99   3.1585   0.07832 .  
#> ses       16381   16381     1 1169.41  22.0556 2.962e-06 ***
#> mathkind 319087  319087     1 1165.47 429.6178 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_step7 <- lmer(mathgain~ housepov + ses + mathkind  + (1|schoolid/classid), data=siiData)
anova(model_step7, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#>          Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
#> housepov   2374    2374     1  109.05   3.1965   0.07658 .  
#> ses       16514   16514     1 1170.91  22.2305 2.709e-06 ***
#> mathkind 320206  320206     1 1166.13 431.0595 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_step8 <- lmer(mathgain~ ses + mathkind  + (1|schoolid/classid), data=siiData)
anova(model_step8, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#>          Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
#> ses       18065   18065     1 1163.2  24.297 9.452e-07 ***
#> mathkind 317617  317617     1 1163.8 427.188 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The least significant variables that are excluded in each step are:
1) sex:ses
2) sex:mathkind
3) sex:yearsta
4) yearsta
5) sex:housepov
6) sex
7) housepov

The final model includes fixed effects for ses and mathkind and random effects for school and class.

7 PART SEVEN: MODEL SELECTION

7.1 SII Dataset

Check the model assumptions of the final model you have chosen in the previous exercise.

model_final <- lmer(mathgain~ ses + mathkind  + (1|schoolid/classid), data=siiData)

# Residuals vs. Fitted Values Plot (Tests Equal Variance)
plot(model_final, 
     main = "Residuals vs. Fitted Values",
     xlab = "Fitted Values", ylab = "Residuals")

# Q-Q Plot of Residuals (Tests Normality of Residuals)
qqnorm(resid(model_final), main = "Normal Q-Q Plot: Residuals")
qqline(resid(model_final), col = "red")

# Extract Empirical Bayes estimates (random effects)
eb_estimates <- ranef(model_final)

# Histogram to check for Class-level outliers
ggplot(mapping = aes(x = eb_estimates$classid[[1]])) +
  geom_histogram(bins = 30, fill = "lightblue", color = "black") +
  labs(title = "Distribution of Class Random Effects", x = "Class Intercept Deviation")

# Histogram to check for School-level outliers
ggplot(mapping = aes(x = eb_estimates$schoolid[[1]])) +
  geom_histogram(bins = 15, fill = "lightgreen", color = "black") +
  labs(title = "Distribution of School Random Effects", x = "School Intercept Deviation")

# Scatterplot
mapping_df <- unique(siiData[, c("classid", "schoolid")])
mapping_df <- na.omit(mapping_df)

mapping_df$class_eb <- eb_estimates$classid[as.character(mapping_df$classid), 1]
mapping_df$school_eb      <- eb_estimates$schoolid[as.character(mapping_df$schoolid), 1]

# Plot the relationship between the levels
ggplot(mapping_df, aes(x = school_eb, y = class_eb)) +
  geom_point(alpha = 0.5, color = "darkblue") +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(title = "Class Effects vs. School Effects", x = "School Random Effect", y = "Class Random Effect")
#> `geom_smooth()` using formula = 'y ~ x'
#> Warning: Removed 31 rows containing non-finite outside the scale range
#> (`stat_smooth()`).
#> Warning: Removed 31 rows containing missing values or values outside the scale range
#> (`geom_point()`).

Linearity - Residuals vs. Fitted values - There is no clear trend in the points, so linearity is fulfilled.
Homoscedasticity - Residuals vs. Fitted values - The point are evenly distributed around 0, so constant variance is fulfilled.
Normality - QQplot - The points follow the diagonal line nicely, so normality of the residuals is fulfilled.
Outliers - Histogram of EB estimates - There might be some outlying values to be investigated.
Relationship between random effects - Scatterplot of EB values - There is a clear trend between the random effects, indicating that there is strong correlation between the random effects.

Multilevel Analysis - Exercise Solutions

dr. Annelies Agten

2026-05-25