In this case study we investigate the evolution of unaffected nail length in patients receiving one of two treatments for toenail disease. Measurements were repeatedly collected over time within patients, and patients were treated within centers.
TOENAIL DATA
The dataset contains the following variables:
nailbig: unaffected nail length (response)time: follow-up timetreat: treatment groupid: patient identifiercenter: treatment centerage: age of the patientImportant Note
This document presents a minimally working solution for the multilevel analysis of the toenail dataset. The modelling choices and analytical trajectory shown here reflect decisions that I personally made as a researcher based on my own reasoning and interpretation of the data.
However, multilevel modelling is rarely a fully fixed or linear process. Different researchers may reasonably make different choices regarding exploratory analyses, model building strategies, diagnostics, and interpretation of results.
Therefore, this document should be seen as one possible analytical pathway, rather than the single “correct” solution.
Use this analysis as a guide and as inspiration, but feel free to adapt, extend, simplify, or “pimp” the analysis according to your own statistical reasoning, research questions, and scientific style. Thoughtful motivation of your modelling decisions is more important than reproducing the exact same workflow.
# Load all the required packages
library(lme4)
#> Loading required package: Matrix
library(lmerTest)
#> Warning: package 'lmerTest' was built under R version 4.5.3
#>
#> Attaching package: 'lmerTest'
#> The following object is masked from 'package:lme4':
#>
#> lmer
#> The following object is masked from 'package:stats':
#>
#> step
library(pbkrtest)
#> Warning: package 'pbkrtest' was built under R version 4.5.3
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.5.3
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(performance)
#> Warning: package 'performance' was built under R version 4.5.3
# Load the dataset
toenail <-read.table("toenail.txt",header=T)
# Specify the categorical variables in R
toenail$treat<-as.factor(toenail$treat)
toenail$center<-as.factor(toenail$center)
toenail$id<-as.factor(toenail$id)
# Get an idea of your data
# Total number of observations
nrow(toenail)
#> [1] 1854
# Number of patients
length(unique(toenail$id))
#> [1] 298
# Number of centers
length(unique(toenail$center))
#> [1] 26
# Number of observations per patient
table(toenail$id)
#>
#> 1 2 3 4 6 7 9 10 11 12 13 15 16 17 18 19 20 21 22 23
#> 7 6 7 7 7 7 7 7 7 7 7 5 4 5 3 7 3 3 7 7
#> 24 25 27 28 29 30 31 33 35 37 38 39 40 41 45 48 49 50 51 52
#> 3 7 2 5 7 4 3 7 5 7 7 7 5 2 1 1 3 7 5 7
#> 53 54 55 56 58 59 60 61 63 64 65 66 68 69 70 72 73 75 76 78
#> 7 7 7 7 4 7 3 4 1 7 7 7 5 7 7 7 7 7 7 5
#> 79 80 81 82 83 84 85 86 87 88 89 90 93 94 95 96 97 99 101 102
#> 7 7 7 7 7 7 7 7 7 7 7 7 2 7 7 7 5 1 7 7
#> 104 105 106 107 108 109 110 111 114 116 117 118 119 120 121 123 124 125 126 127
#> 7 7 7 7 7 7 7 7 7 7 7 5 7 7 7 7 7 7 7 7
#> 129 130 131 132 133 134 136 137 138 139 140 141 142 143 144 145 146 149 150 151
#> 7 7 7 7 5 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
#> 152 154 156 157 158 160 161 162 163 164 165 166 168 169 170 172 173 174 175 176
#> 7 7 7 7 7 7 7 7 7 7 3 7 4 7 7 7 7 7 7 5
#> 177 178 180 181 182 185 186 188 189 190 191 192 193 194 195 197 198 199 200 201
#> 7 7 7 4 7 7 4 4 4 7 7 4 7 7 7 7 7 5 7 7
#> 202 203 204 205 206 207 209 210 211 212 213 214 215 216 217 218 220 221 222 223
#> 7 7 7 7 7 3 7 7 7 6 5 3 7 7 7 7 7 7 7 7
#> 224 225 226 227 228 229 230 231 232 233 234 235 237 239 240 241 242 243 245 246
#> 7 7 7 2 7 7 5 7 7 7 7 7 7 7 5 7 7 7 7 2
#> 247 248 249 250 251 252 254 255 256 258 259 260 261 262 263 264 266 269 270 271
#> 5 4 5 3 7 7 5 7 7 7 7 7 7 7 7 7 7 7 7 7
#> 273 275 276 277 278 279 280 283 284 287 288 289 290 292 293 294 295 297 298 300
#> 7 7 7 7 5 7 1 7 4 7 7 2 5 7 7 7 7 7 7 4
#> 301 302 305 306 307 308 309 310 311 312 313 314 316 319 321 324 325 327 328 330
#> 7 7 5 5 5 7 2 7 7 5 7 4 7 7 3 7 7 7 7 7
#> 331 332 333 334 335 336 337 338 340 341 343 346 350 351 352 353 354 355 356 357
#> 7 4 7 7 7 7 7 3 7 7 4 7 7 7 5 7 7 7 7 1
#> 358 359 360 361 363 364 365 366 367 368 369 372 373 374 377 381 382 383
#> 7 7 7 7 7 7 7 7 7 7 7 7 7 7 1 7 7 6
# Number of patients with 7 measurements
sum(table(toenail$id)==7)
#> [1] 226
# Number of patients per treatment group
table(toenail$treat[!duplicated(toenail$id)])
#>
#> 0 1
#> 148 150
# Summary Statistics
toenail_unique <- toenail[!duplicated(toenail$id),]
summary(toenail_unique)
#> nailbig time treat id center
#> Min. : 0.00 Min. :0 0:148 1 : 1 29 :252
#> 1st Qu.: 0.00 1st Qu.:0 1:150 2 : 1 22 : 5
#> Median : 0.25 Median :0 3 : 1 1 : 3
#> Mean : 1.89 Mean :0 4 : 1 11 : 3
#> 3rd Qu.: 3.00 3rd Qu.:0 6 : 1 14 : 3
#> Max. :13.00 Max. :0 7 : 1 36 : 3
#> (Other):292 (Other): 29
#> age
#> Min. :29.00
#> 1st Qu.:67.00
#> Median :67.00
#> Mean :64.05
#> 3rd Qu.:67.00
#> Max. :67.00
#>
mean(toenail_unique$age)
#> [1] 64.05034
mean(toenail_unique$age[toenail_unique$treat==0])
#> [1] 61.06081
mean(toenail_unique$age[toenail_unique$treat==1])
#> [1] 67
sd(toenail_unique$age[toenail_unique$treat==0])
#> [1] 9.722844
sd(toenail_unique$age[toenail_unique$treat==1])
#> [1] 0
mean(toenail$nailbig)
#> [1] 5.218608
mean(toenail$nailbig[toenail_unique$treat==0])
#> [1] 5.390776
mean(toenail$nailbig[toenail_unique$treat==1])
#> [1] 5.036111
sd(toenail$nailbig[toenail_unique$treat==0])
#> [1] 4.543127
sd(toenail$nailbig[toenail_unique$treat==1])
#> [1] 4.212292
In total 298 patients, in 26 centers were included in the study. Of these 298 patients, 148 were assigned to oral treatment ‘0’ and 150 were assigned to oral treatment ‘1’. In total the patients were measured on 7 different timepoints. 226 out of 298 patients had complete measurements, for the remaining patients there was some type of drop-out/missing values.
The mean age of the patients in the dataset is equal to 64 (61 for patients in treatment group ‘0’, with a standard deviation of 9.7, and 67 for patients in treatment group ‘1’, with a standard deviation of 0).
Here we can already make a clear conclusion: all patients that were assigned to treatement ‘1’, are 67 years old. There there is no variation of age within the treatment levels, indicating that we can’t fit an interaction term between age and treatment in later analyses.
The mean unaffected nail length of the patients in the dataset is equal to 5.22 (5.04 for patients in treatment group ‘0’, with a standard deviation of 4.54, and 5.04 for patients in treatment group ‘1’, with a standard deviation of 4.21).
Next we make some exploratory figures:
# Individual-vs-Aggregate Dot Plot
group_means <- toenail %>%
group_by(center,treat) %>%
summarize(Mean_nailbig = mean(nailbig), .groups = "drop")
ggplot(group_means, aes(x = center, y = Mean_nailbig, col = treat)) +
geom_point(alpha = 0.7, size = 2.5) +
labs(
title = "Distribution of the unaffected nail length for the treatment group and non-treatment group across centers",
x = "Center ID (Cluster)",
y = "Unaffected nail length",
color = "Treatment group"
) +
theme_minimal()
table(toenail$center[toenail$treat==1])
#>
#> 1 2 3 4 5 7 8 9 10 11 13 14 17 19 20 21 22 23 24 26
#> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 28 29 31 33 35 36
#> 0 954 0 0 0 0
From this index plot we can conclude that all of the patients that were assigned to treatment group ‘1’, were treated in the same center (the center with ID number 29). This indicates that we have a highly unbalanced design. We can also see that there is quite some between-center variability.
# one spaghetti plot with all the patients, using a different color for treatment group
ggplot(toenail, aes(x = time, y = nailbig, group = id, col=treat)) +
geom_line(alpha = 0.6) +
geom_point(alpha = 0.6) +
labs(title = "Individual Trajectory Plot", y = "Unaffected nail length")
# two separate spaghetti plots for the treatment groups
ggplot(toenail, aes(x = time, y = nailbig, group = id, col = treat)) +
geom_line(alpha = 0.6) +
geom_point(alpha = 0.6) +
# Split the plot into separate panels based on the NEURO variable
facet_wrap(~treat) +
labs(
title = "Individual Trajectory Plots by treatment groups",
y = "Unaffected nail length",
x = "Time"
)
From the spaghetti plot for the individual patient trajectories we can conclude that there is quite some heterogeneity between subjects. However, we cannot distinguish a clear trend between the two treatment groups. We can also identify some subjects with missing observations.
# One spaghetti plot with the mean trajectory over time, using a different color for treatment group
group_means2 <- toenail %>%
group_by(time,treat) %>%
summarize(Mean_nailbig = mean(nailbig), .groups = "drop")
ggplot(group_means2, aes(x = time, y = Mean_nailbig, col=treat)) +
geom_line(alpha = 0.6) +
geom_point(alpha = 0.6) +
labs(title = "Mean Trajectory Plot", y = "Unaffected nail length", color="Treatment group", x="Time")
The mean profile plot suggests an overall evolution over time in both treatment groups. Variability appears to increase slightly at later time points, suggesting that a random slope for time might be appropriate. The trend is also clearly not linear. A higher order term for the time coefficient might be necessary.
# Calculate ICC and Design Effect for model with random center effect
fit_null_center <- lme4::lmer(nailbig ~ 1 + (1 | center), data = toenail)
ICC_center <- icc(fit_null_center); ICC_center$icc[1]
#> NULL
avg_n_center <- mean(table(toenail$center))
design_center <- 1+(avg_n_center-1)*ICC_center$icc[1] ; design_center
#> numeric(0)
# Calculate ICC and Design Effect for model with random id effect
fit_null_id <- lme4::lmer(nailbig ~ 1 + (1 | id), data = toenail)
ICC_id <- icc(fit_null_id); ICC_id$icc[1]
#> NULL
avg_n_id <- mean(table(toenail$id))
design_id <- 1+(avg_n_id-1)*ICC_id$icc[1] ; design_id
#> numeric(0)
Although the ICC values for the models incorporating a random effect for center and patient respectively are rather low, the design effects for both models indicate that it is necessary to take into account the grouping structure on both levels.
We start from a fixed structure including:
* time
* treatment
* time × treatment interaction
We ignore age for now.
# Fit a nested model
model1 <- lme4::lmer(nailbig ~ time + treat + time:treat + (1 | center/id), data = toenail)
#> boundary (singular) fit: see help('isSingular')
summary(model1)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: nailbig ~ time + treat + time:treat + (1 | center/id)
#> Data: toenail
#>
#> REML criterion at convergence: 9429.3
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.5212 -0.5479 0.0259 0.5783 4.4210
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> id:center (Intercept) 6.519 2.553
#> center (Intercept) 0.000 0.000
#> Residual 6.951 2.637
#> Number of obs: 1854, groups: id:center, 299; center, 26
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 2.51802 0.24673 10.205
#> time 0.56146 0.02268 24.759
#> treat1 0.24732 0.34679 0.713
#> time:treat1 0.04668 0.03142 1.486
#>
#> Correlation of Fixed Effects:
#> (Intr) time treat1
#> time -0.367
#> treat1 -0.711 0.261
#> time:treat1 0.265 -0.722 -0.368
#> optimizer (nloptwrap) convergence code: 0 (OK)
#> boundary (singular) fit: see help('isSingular')
When we fit this model, we get a warning saying
boundary (singular) fit: see help('isSingular'). This is
due to the fact that the estimate of the variance components (the random
effects) is very small. Looking at the variance related to differences
between centers, this is equal to 0, while the variance related to the
different patients is equal to 6.51. The proportion of the total
variance that is related to the difference between patients is equal to
\(6.51/(6.51+0+6.958)=0.48\), meaning
that almost half of the variance in the dataset can be attributed to the
repeated measurement from patients. These results clearly indicate that
taking into account clustering at the center level might night be
important. So we test for the random effect of ‘center’:
# Fit a nested model
model1 <- lme4::lmer(nailbig ~ time + treat + time:treat + (1 | center/id), data = toenail)
#> boundary (singular) fit: see help('isSingular')
model2 <- lme4::lmer(nailbig ~ time + treat + time:treat + (1 | id), data = toenail)
anova(model2,model1)
#> refitting model(s) with ML (instead of REML)
#> Data: toenail
#> Models:
#> model2: nailbig ~ time + treat + time:treat + (1 | id)
#> model1: nailbig ~ time + treat + time:treat + (1 | center/id)
#> npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
#> model2 6 9426.8 9459.9 -4707.4 9414.8
#> model1 7 9429.5 9468.2 -4707.8 9415.5 0 1 1
The p-value is very large (larger than 0.05), indicating that adding the extra random intercept for ‘center’ does not significantly improve the model. Therefore we choose to work with the reduced model, i.e. the model that only includes a random intercept for patient.
Next, we evaluate a random slope for time at the patient level to determine whether subjects differ in their longitudinal evolution.
model2 <- lme4::lmer(nailbig ~ time + treat + time:treat + (1 | id), data = toenail)
model3 <- lme4::lmer(nailbig ~ time + treat + time:treat + (1 + time | id), data = toenail)
anova(model2,model3)
#> refitting model(s) with ML (instead of REML)
#> Data: toenail
#> Models:
#> model2: nailbig ~ time + treat + time:treat + (1 | id)
#> model3: nailbig ~ time + treat + time:treat + (1 + time | id)
#> npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
#> model2 6 9426.8 9459.9 -4707.4 9414.8
#> model3 8 8713.7 8757.9 -4348.8 8697.7 717.13 2 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model3)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: nailbig ~ time + treat + time:treat + (1 + time | id)
#> Data: toenail
#>
#> REML criterion at convergence: 8708.9
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.4726 -0.5161 -0.0633 0.4546 5.6549
#>
#> Random effects:
#> Groups Name Variance Std.Dev. Corr
#> id (Intercept) 7.3745 2.7156
#> time 0.2259 0.4753 -0.39
#> Residual 3.1506 1.7750
#> Number of obs: 1854, groups: id, 298
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 2.45589 0.24074 10.201
#> time 0.59325 0.04514 13.143
#> treat1 0.27927 0.33862 0.825
#> time:treat1 0.03502 0.06287 0.557
#>
#> Correlation of Fixed Effects:
#> (Intr) time treat1
#> time -0.424
#> treat1 -0.711 0.301
#> time:treat1 0.304 -0.718 -0.426
The LRT shows a very low p-value (below 0.05), indicating that the random slope for time on the patient level significantly increases the fit performance of the model, indicating that subjects differed not only in baseline nail length but also in their rate of change over time.. Therefore we conclude that our random effects structure include a random intercept for the patient level and a random slope for time on the patient level. This finding is consistent with the exploratory spaghetti plots, which showed heterogeneous trajectories across patients.
In this model, the variance related to the random intercept for patients is equal to 7.37, the variance related to the random slope for time within patients is equal to 0.23 and the residual variance or the variance within patients is equal to 3.15.
At first, we will still ignore the variable age, and only look at the variables time & treatment. We perform Wald tests for all the covariates in the model:
model3 <- lmer(nailbig ~ time + treat + time:treat + (1 + time | id), data = toenail)
anova(model3, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#> Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
#> time 1187.96 1187.96 1 266.93 377.0551 <2e-16 ***
#> treat 2.14 2.14 1 295.62 0.6801 0.4102
#> time:treat 0.98 0.98 1 266.93 0.3099 0.5782
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The interaction between time and treatment is evaluated to determine whether the treatment groups evolve differently over time. Based on the Wald test for the interaction effect, we see that the p-value is equal to 0.5782, which is above 0.05. This indicates that the interaction term is not significant, indicating that the treatment effect is constant across follow-up, and not dependent on time. Consequently, we remove the interaction effect from the model.
model4 <- lmer(nailbig ~ time + treat + age + (1 + time | id), data = toenail)
anova(model4, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#> Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
#> time 1191.26 1191.26 1 267.59 378.1522 <2e-16 ***
#> treat 2.07 2.07 1 294.13 0.6575 0.4181
#> age 1.32 1.32 1 308.55 0.4196 0.5176
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can now look at the effect of treatment on the response, as well as the effect of time on the response. The p-values are equal to 0.2426 and <2e-16, respectively. This indicates that there is no significant difference in the unaffected nail length between patients from different treatment groups. However, there is a significant time trend, indicating that the unaffected nail length changes over time.
We can then also include age as a covariate to the model:
model5 <- lmer(nailbig ~ time + age + (1 + time | id), data = toenail)
anova(model5, ddf = "Kenward-Roger")
#> Type III Analysis of Variance Table with Kenward-Roger's method
#> Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
#> time 1190.44 1190.44 1 267.50 377.8930 <2e-16 ***
#> age 3.64 3.64 1 309.32 1.1551 0.2833
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value for age is equal to 0.2833, indicating that the age of
the patient does not have a significant influence on the unaffected nail
length. Therefore, we end up with a model that only includes the effect
of time.
However, in our exploratory analysis, there were some clear indications that there might be a quadratic relationship between the response and the variable time. Therefore, we also test whether a quadratic effect of time should be added to the model:
toenail$time2 <- toenail$time^2
model6 <- lmer(nailbig ~ time + (1 + time | id), data = toenail, REML=F)
model7 <- lme4::lmer(nailbig ~ time + time2 + (1 + time | id), data = toenail, REML=F)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.0021041 (tol = 0.002, component 1)
When you try to fit the model with the quadratic effect, you get a
warning
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,:Model failed to converge with max|grad| = 0.0021041 (tol = 0.002, component 1).
Our model did not converge, meaning we cannot trust the outcome of the
model.
There are several reasons why a model might fail to converge, but one reason that often occurs, is due to the fact that variables have not been standardized before fitting them in an interaction / higher order term. Since the problem occured when we added a quadratic effect to the model, it is good to check first if standardizing the variable time, before adding it to the model, fixes the convergence issue.
toenail$time <- scale(toenail$time)
toenail$time2 <- toenail$time^2
model6 <- lmer(nailbig ~ time + (1 + time | id), data = toenail, REML=F)
model7 <- lme4::lmer(nailbig ~ time + time2 + (1 + time | id), data = toenail, REML=F)
anova(model6,model7)
#> Data: toenail
#> Models:
#> model6: nailbig ~ time + (1 + time | id)
#> model7: nailbig ~ time + time2 + (1 + time | id)
#> npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
#> model6 6 8711.3 8744.5 -4349.7 8699.3
#> model7 7 8298.9 8337.6 -4142.5 8284.9 414.43 1 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By standardizing the time variable, our convergence issue was indeed solved. When we look at the LRT, we see that the p-value is very low (below 0.05), indicating that the quadratic time effect is significant and should be retained in the model.
There aren’t any other variables or interaction effects that can be tested, so our final model is the model with a fixed effect for time, and a fixed quadratic effect of time, and a random intercept and slope for time, within patients.
We refit our model, using the REML option, and check whether the model assumptions are met.
model_final <- lme4::lmer(nailbig ~ time + time2 + (1 + time | id), data = toenail, REML=T)
# Residuals vs. Fitted Values Plot (Tests Equal Variance)
plot(model_final,
main = "Residuals vs. Fitted Values",
xlab = "Fitted Values", ylab = "Residuals")
# Q-Q Plot of Residuals (Tests Normality of Residuals)
qqnorm(resid(model_final), main = "Normal Q-Q Plot: Residuals")
qqline(resid(model_final), col = "red")
# Extract Empirical Bayes estimates (random effects)
eb_estimates <- ranef(model_final)
# Histogram to check for Class-level outliers
ggplot(mapping = aes(x = eb_estimates$id[[1]])) +
geom_histogram(bins = 30, fill = "lightblue", color = "black") +
labs(title = "Distribution of patient Random Effects", x = "Patient Intercept Deviation")
# Histogram to check for School-level outliers
ggplot(mapping = aes(x = eb_estimates$id[[2]])) +
geom_histogram(bins = 15, fill = "lightgreen", color = "black") +
labs(title = "Distribution of patient Random Effects", x = "Patient Slope Deviation")
# Plot the relationship between the levels
mapping_data <- data.frame(eb_estimates$id[[1]],eb_estimates$id[[2]])
names(mapping_data) <- c("intercepts","slopes")
ggplot(mapping_data, aes(x =intercepts, y = slopes)) +
geom_point(alpha = 0.5, color = "darkblue") +
geom_smooth(method = "lm", color = "red", se = FALSE) +
labs(title = "Class Effects vs. School Effects", x = "School Random Effect", y = "Class Random Effect")
#> `geom_smooth()` using formula = 'y ~ x'