Feel free to leave a comment in case of mistakes…

Data used in the exercises

The SII data

The data come from the SII (Study of Instructional Improvement), which was carried out to assess the math achievement scores of first- and third-grade pupils in randomly selected classrooms from a national US sample of elementary schools. The dataset includes results for 1190 first-grade pupils sampled from 312 classrooms in 107 schools. From the sample of elementary schools, all first-grade classrooms were considered (1 to 9). Within a classroom, 1 to 10 children were sampled to take a test on math. The following variables were measured:

  • schoolid: school’s ID number
  • classid: classroom’s ID number
  • childid: pupil’s ID number
  • sex: an indicator variable for sex (0 for males, 1 for females)
  • mathkind: pupil’s math score in the spring of the kindergarten year
  • mathgain: pupil’s gain in the math achievement score from the spring of kindergarten to the spring of first grade
  • ses: pupil’s socioeconomic status
  • housepov: % of households in the neighborhood of the school below the poverty level
  • yearstea: years of teacher’s experience in teaching in the first grade

The hip fracture Data

Hip fractures are a very common and important health problem among elderly people. They are often the result of an unfortunate, seemingly innocent fall in the patient’s home environment. Less fluent movements combined with bad vision and impaired reaction skills makes older people more at risk for such falls. The weaker bones and especially the presence of osteoporosis also causes the hip to break more easily. The hip surgery needed is a serious operation and can involve a lot of blood loss. A relatively long recovery period (8 to 12 days) in the hospital is therefore a real necessity. The combination of these overwhelming events can also cause the mental status of the patients to worsen. When this deterioration is very severe, typically causing a loss of orientation in time and space, this phenomenon is called a delirium. In daily practice, the ‘Mini-Mental State Examination score’ (MMSE) is sometimes being used to measure the cognitive status of patients. It is the sum of 30 binary outcomes which result from correct (1) or incorrect (0) answer to specific questions (e.g. ‘Where are you now?’). Obviously, higher MMSE values indicate better cognitive functioning.

It is of interest to study how this MMSE evolution depends on certain characteristics of the patients. The data set used to this purpose contains the MMSE score for 60 patients, at 1, 3, 5, 8 and 12 days after surgery. For each patient the following variables are measured:

  • IDNR: id of the patient
  • NEURO: pre-operative neuro-status (1: neuro-psychiatric, 0: not neuro-psychiatric)
  • AGE: age of the patient
  • MMSE: Mini-Mental State Examination score
  • TIME: days after surgery: 1, 3, 5, 8 and 12.

Packages used for the exercises

library(rio)
library(nlme)
library(ggplot2)
library(lmtest)
library(kableExtra)
library(dplyr)

Exercises day 1

1

Vidoes on the theory:


Exercise

Describe in detail why in both datasets, one cannot make use of basic statistical models, like linear regression or ANOVA. What assumptions are violated? What special kind of data is the hip fracture dataset?


Solution

Classic ANOVA and regression assume all observations are independent. Here, we have related data observations that are not independent:

Hip fracture: multiple measurements (of MMSE) over time in the same individual = longitudinal study.

SII: here there is non-independence at two levels:

  • pupils within one class are not independent.
  • classes within schools are not independent.

The purpose of multilevel (or mixed) models is to account for the relatedness between these observations.


2

Exercise

Read in the data in R, check the variable types via str() or the “environment” tab in RStudio, and coerce all variables into the correct type. Assign the appropriate factor levels.


Solution

Hip data

hip <- import("hipfracture2.txt")
hip$NEURO <- factor(hip$NEURO, levels = c(0, 1), labels = c("not neuro-psychiatric", "neuro-psychiatric"))
hip$IDNR <- factor(hip$IDNR)

hip %>% 
   slice(1:6) %>% 
   kbl() %>% 
   kable_paper()
IDNR NEURO AGE MMSE TIME
1 not neuro-psychiatric 74 28 1
1 not neuro-psychiatric 74 28 3
1 not neuro-psychiatric 74 28 5
1 not neuro-psychiatric 74 26 8
1 not neuro-psychiatric 74 25 12
2 not neuro-psychiatric 67 25 1

SII data

sii <- import("SII.txt")
sii$sex <- factor(sii$sex, levels = c(0,1), labels = c("males", "females"))
sii$schoolid <- factor(sii$schoolid)
sii$classid <- factor(sii$classid)
sii$childid <- factor(sii$childid)
sii %>% 
   slice(1:6) %>% 
   kbl() %>% 
   kable_paper()
schoolid classid childid sex mathkind mathgain ses housepov yearstea
1 160 1 females 448 32 0.46 0.082 1
1 160 2 males 460 109 -0.27 0.082 1
1 160 3 females 511 56 -0.03 0.082 1
1 217 4 males 449 83 -0.38 0.082 2
1 217 5 males 425 53 -0.03 0.082 2
1 217 6 females 450 65 0.76 0.082 2

3

Exercise

SII data: Explore the mathgain values in boys and girls in the different schools using an index plot.


Solution

Mean in both sexes for each school:

sii %>% 
  group_by(schoolid, sex) %>% 
  summarise(mean = mean(mathgain)) %>% 
  kbl() %>% 
  kable_paper() %>% 
  scroll_box(width = "300px", height = "200px")
schoolid sex mean
1 males 59.285714
1 females 60.250000
2 males 68.500000
2 females 59.750000
3 males 94.636364
3 females 67.666667
4 males 6.333333
4 females 64.000000
5 males 79.000000
5 females 50.750000
6 males 83.142857
6 females 72.400000
7 males 56.250000
7 females 57.800000
8 males 41.750000
8 females 52.750000
9 males 37.666667
9 females 32.000000
10 males 82.300000
10 females 65.750000
11 males 54.181818
11 females 48.400000
12 males 56.785714
12 females 47.692308
13 males 54.500000
13 females 42.200000
14 males 44.400000
14 females 44.200000
15 males 82.571429
15 females 66.166667
16 males 96.000000
16 females 89.750000
17 males 67.285714
17 females 104.000000
18 males 117.500000
18 females 62.000000
19 males 63.800000
19 females 48.666667
20 males 57.200000
20 females 65.000000
21 males 55.307692
21 females 61.250000
22 males 46.000000
22 females 37.333333
23 males 66.333333
23 females 75.500000
24 males -16.000000
24 females 45.285714
25 males 23.000000
25 females 30.400000
26 males 72.000000
26 females 36.833333
27 males 46.500000
27 females 74.888889
28 males 25.800000
28 females 67.200000
29 males 71.500000
29 females 35.666667
30 males 27.000000
30 females 29.000000
31 males 57.375000
31 females 46.785714
32 males 68.666667
32 females 44.666667
33 males 64.000000
33 females 61.416667
34 males 35.250000
34 females 71.333333
35 males 20.166667
35 females 42.200000
36 males 74.000000
36 females 71.333333
37 males 69.000000
37 females 40.714286
38 males 51.857143
38 females 33.500000
39 males 27.857143
39 females 26.400000
40 males 28.000000
40 females 46.500000
41 males 67.857143
41 females 72.000000
42 males 61.000000
42 females 58.083333
43 males 70.000000
43 females 50.666667
44 males 74.875000
44 females 63.777778
45 males 82.666667
45 females 42.500000
46 males 58.285714
46 females 49.000000
47 males 88.571429
47 females 39.333333
48 males 52.000000
48 females 47.500000
49 males 81.714286
49 females 80.000000
50 males 41.714286
50 females 47.000000
51 males 40.000000
51 females 73.000000
52 males 48.000000
52 females 39.000000
53 males 83.000000
53 females 65.800000
54 males 40.625000
54 females 18.500000
55 males 51.714286
55 females 55.666667
56 males 55.333333
56 females 47.250000
57 males 72.666667
57 females 69.200000
58 males 43.500000
58 females 54.333333
59 males 46.333333
59 females 41.000000
60 males 73.875000
60 females 73.800000
61 males 53.600000
61 females 34.833333
62 males 62.166667
62 females 47.000000
63 males 74.000000
63 females 17.000000
64 males 56.333333
64 females 84.000000
65 males 60.750000
65 females 52.800000
66 males 61.666667
66 females 54.500000
67 males 87.833333
67 females 72.333333
68 males 60.000000
68 females 84.000000
69 males 78.333333
69 females 44.000000
70 males 96.750000
70 females 69.000000
71 males 51.923077
71 females 53.000000
72 males 54.400000
72 females 47.666667
73 males 59.500000
73 females 39.000000
74 males 58.333333
75 males 99.000000
75 females 70.333333
76 males 51.266667
76 females 34.916667
77 males 58.666667
77 females 52.250000
78 males 39.250000
78 females 31.000000
79 males 85.000000
79 females 56.555556
80 males 66.333333
80 females 80.750000
81 males 70.000000
81 females 64.428571
82 males 57.384615
82 females 55.000000
83 males 46.250000
83 females 61.000000
84 males 72.400000
84 females 86.666667
85 males 39.600000
85 females 66.000000
86 males 31.600000
86 females 41.500000
87 males 43.500000
87 females 49.250000
88 males 52.000000
88 females 36.000000
89 males 40.000000
89 females 59.500000
90 males 58.500000
90 females 34.000000
91 males 36.625000
91 females 60.250000
92 males 72.666667
92 females 59.833333
93 males 31.900000
93 females 50.909091
94 males 80.833333
94 females 83.666667
95 males 35.000000
95 females 75.500000
96 males 61.500000
96 females 41.000000
97 males 53.000000
97 females 91.500000
98 males 61.000000
98 females 37.000000
99 males 54.000000
99 females 49.666667
100 males 44.000000
100 females 69.555556
101 males 61.909091
101 females 68.800000
102 males 47.200000
102 females 58.666667
103 males 104.500000
103 females 86.000000
104 males 91.333333
104 females 84.000000
105 males 59.000000
105 females 63.800000
106 males 9.500000
107 males 40.500000
107 females 50.125000

ggplot2::ggplot(sii, aes(x = schoolid, y = mathgain, color = sex)) + 
  stat_summary(fun = mean, geom = "point") +
  theme(axis.text.x = element_text(size = 3))

4

Exercise

Hip data: Plot the MMSE evolutions over time. You can make one plot with all the patients, using a different color for neuro and non-Neuro patients, or generate two separate plots for the neuro and non-neuro persons, respectively. Discuss what you see here.


Solution

ggplot(hip, aes(x = TIME, y = MMSE, color = NEURO)) + 
  stat_summary(fun = mean, geom = "point", size = 3) + 
  geom_jitter(size = 0.3, width = 0.2) +
  geom_line(aes(group = IDNR), size = 0.2)


Exercises day 2

Vidoes on the theory:


1

Exercise

SII data Is there a difference of math achievement score between males and females? A descriptive plot was already made:

sii <- import("SII.txt")
sii$sex <- factor(sii$sex, levels = c(0,1), labels = c("males", "females"))
sii$schoolid <- factor(sii$schoolid)

ggplot2::ggplot(sii, aes(x = schoolid, y = mathgain, color = sex)) + 
  stat_summary(fun = mean, geom = "point") +
  theme(axis.text.x = element_text(size = 3))

We will now look at a mixed model. We still only consider the clustering on school-level and ignore the clustering on the class-level (this will be included in a later stage). Write down the full model with assumptions.


Solution

\(Y_{ij} = (beta_{0} + beta_{i}) + beta_{2}*sex + \epsilon_{ij}\)

with \(b_{i}\) = random intercept


2

Exercise

SII data Fit now the model in R, discuss the within-cluster variability and between-cluster variability.


Solution

Basic model output

model<-lme(mathgain~sex,random=~1|schoolid,data=sii)
summary(model)
## Linear mixed-effects model fit by REML
##  Data: sii 
##       AIC      BIC    logLik
##   11780.3 11800.62 -5886.151
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept) Residual
## StdDev:    10.45341 33.07291
## 
## Fixed effects: mathgain ~ sex 
##                Value Std.Error   DF  t-value p-value
## (Intercept) 58.65798  1.748933 1082 33.53931  0.0000
## sexfemales  -2.13315  1.965660 1082 -1.08521  0.2781
##  Correlation: 
##            (Intr)
## sexfemales -0.566
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -4.8454731 -0.6000218 -0.0198554  0.5417158  5.6376937 
## 
## Number of Observations: 1190
## Number of Groups: 107
anova.lme(model,type="marginal")
##             numDF denDF   F-value p-value
## (Intercept)     1  1082 1124.8850  <.0001
## sex             1  1082    1.1777  0.2781

Test of fixed effects (Type 3): The Score in Girls is on average about 2 point lower, however the \(se\) and thus, the p-Value is large (28%).

fixed_effects <- anova.lme(model, type = "marginal")
fixed_effects
##             numDF denDF   F-value p-value
## (Intercept)     1  1082 1124.8850  <.0001
## sex             1  1082    1.1777  0.2781

Estimating the covariance of parameters: The first row (intercepts) represents the between school variability, the second row (residual) the within school variability.

covPar <- VarCorr(model)
covPar
## schoolid = pdLogChol(1) 
##             Variance  StdDev  
## (Intercept)  109.2739 10.45341
## Residual    1093.8173 33.07291

Total variability is decomposed as within-cluster variability (residuals) and between-cluster variability (schools).

betwVar <- as.numeric(covPar[1,1])
withVar <- as.numeric(covPar[2,1])
totVar <- betwVar+withVar
Expl <- betwVar/totVar
Expl
## [1] 0.09082761

Thus, 9% of thr variance due to variance between schools.


3

Exercise

Hip data

Fit a mixed model, taking into account both neuro-groups (exclude observations with a missing value). Discuss the variability between and within patients. Write down the model with assumptions.

  • Is there a different evolution of MMSE score in the two neuro-status groups?
  • Is there a MMSE evolution over time?

Solution

Mixed model for MMSE over time in both neuro groups: MMSE of \(measurement_{j}\) in \(ID_{i}\)

\(Y_{ij} = (beta_{0}+b_{i}) + beta_{1}*NEURO + beta_{2}*TIME + beta_{3}*NEURO*TIME + \epsilon_{ij}\)

Fit the model

model<- lme(MMSE~NEURO+TIME+NEURO:TIME,random=~1|IDNR,data=hip,na.action=na.exclude)
summary(model)
## Linear mixed-effects model fit by REML
##  Data: hip 
##        AIC      BIC    logLik
##   1464.789 1486.222 -726.3943
## 
## Random effects:
##  Formula: ~1 | IDNR
##         (Intercept) Residual
## StdDev:    6.478461 2.513081
## 
## Fixed effects: MMSE ~ NEURO + TIME + NEURO:TIME 
##                                 Value Std.Error  DF   t-value p-value
## (Intercept)                 22.011719 1.0770182 205 20.437648  0.0000
## NEUROneuro-psychiatric      -8.038534 1.8690377  58 -4.300894  0.0001
## TIME                         0.197531 0.0523993 205  3.769725  0.0002
## NEUROneuro-psychiatric:TIME -0.059808 0.0964234 205 -0.620264  0.5358
##  Correlation: 
##                             (Intr) NEUROn- TIME  
## NEUROneuro-psychiatric      -0.576               
## TIME                        -0.254  0.146        
## NEUROneuro-psychiatric:TIME  0.138 -0.258  -0.543
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.14473201 -0.44157131  0.01532865  0.48824740  2.81016393 
## 
## Number of Observations: 267
## Number of Groups: 60

Get the variances

covPar <- VarCorr(model)
covPar
## IDNR = pdLogChol(1) 
##             Variance  StdDev  
## (Intercept) 41.970459 6.478461
## Residual     6.315578 2.513081
betwVar <- as.numeric(covPar[1,1])
withVar <- as.numeric(covPar[2,1])
totVar <- betwVar+withVar
Expl <- betwVar/totVar
Expl
## [1] 0.8692049

There is no different evolution over time as the p-value for the interaction term is 53%

summary(model)
## Linear mixed-effects model fit by REML
##  Data: hip 
##        AIC      BIC    logLik
##   1464.789 1486.222 -726.3943
## 
## Random effects:
##  Formula: ~1 | IDNR
##         (Intercept) Residual
## StdDev:    6.478461 2.513081
## 
## Fixed effects: MMSE ~ NEURO + TIME + NEURO:TIME 
##                                 Value Std.Error  DF   t-value p-value
## (Intercept)                 22.011719 1.0770182 205 20.437648  0.0000
## NEUROneuro-psychiatric      -8.038534 1.8690377  58 -4.300894  0.0001
## TIME                         0.197531 0.0523993 205  3.769725  0.0002
## NEUROneuro-psychiatric:TIME -0.059808 0.0964234 205 -0.620264  0.5358
##  Correlation: 
##                             (Intr) NEUROn- TIME  
## NEUROneuro-psychiatric      -0.576               
## TIME                        -0.254  0.146        
## NEUROneuro-psychiatric:TIME  0.138 -0.258  -0.543
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.14473201 -0.44157131  0.01532865  0.48824740  2.81016393 
## 
## Number of Observations: 267
## Number of Groups: 60

When omitting the interaction term, TIME is a significant fixed effect.

model<- lme(MMSE~NEURO+TIME,random=~1|IDNR,data=hip, na.action= na.exclude)
anova.lme(model,type="marginal")
##             numDF denDF  F-value p-value
## (Intercept)     1   206 428.4846  <.0001
## NEURO           1    58  21.2744  <.0001
## TIME            1   206  16.7848   1e-04

Exercises day 3

1

Vidoes on the theory


Exercise

SII data

Give and fit the model, testing if there is a difference in mathgain between boys and girls, taking into account clustering on both school and class levels. Write down this model (no other covariates have to be considered yet).


Solution

\(Y_{ijk} = beta_{0} + a_{i} + b_{i(j)} + beta_{1}*sex + \epsilon_{ijk}\)

with \(a_{i}\) = school effect
and \(b_{i(j)}\) = class effect (within school)


2

Exercise

SII data

Is there a difference in mathgain between boys and girls?


Solution

The main effect of sex is not significant:

model<-lme(mathgain~sex,random=~1|schoolid/classid,data=sii, na.action=na.exclude)
anova.lme(model)
##             numDF denDF   F-value p-value
## (Intercept)     1   877 1583.0446  <.0001
## sex             1   877    1.4586  0.2275
summary(model)
## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC    logLik
##   11774.14 11799.54 -5882.068
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    8.772372
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    10.05193 32.04467
## 
## Fixed effects: mathgain ~ sex 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 58.61460  1.747250 877 33.54678  0.0000
## sexfemales  -2.35259  1.947955 877 -1.20772  0.2275
##  Correlation: 
##            (Intr)
## sexfemales -0.564
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.67888536 -0.58891986 -0.03238045  0.53965216  5.61255392 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

3

Exercise

SII data

Discuss the different variance components from the model above. What fraction of the total variance is due to the variability between classes and between schools?


Solution

VarCorr(model)
##             Variance     StdDev   
## schoolid =  pdLogChol(1)          
## (Intercept)   76.9545     8.772372
## classid =   pdLogChol(1)          
## (Intercept)  101.0413    10.051932
## Residual    1026.8606    32.044666
varSchool <- as.numeric(VarCorr(model)[2,1])
varClass <- as.numeric(VarCorr(model)[4,1])
varRes <- as.numeric(VarCorr(model)[5,1])
varTot <- varSchool + varClass + varRes

fracSchool <- varSchool/varTot
fracClass <- varClass/varTot

The fraction of the total variation explained by schools is 0.0638703 and 0.0838617 by classes.


4

Exercise

SII data

  1. What is the correlation between mathgain scores from the same school, but different classes?
  2. What is the correlation between mathgain scores from the same class?

Solution

  1. What is the correlation between mathgain scores from the same school, but different classes?
varSchool/varTot
## [1] 0.06387027
  1. What is the correlation between mathgain scores from the same class?
(varSchool+varClass)/varTot
## [1] 0.147732

One must add the variance due to school in numerator as students from same class are also within the same school!


5

Exercise

SII data

Fit 1 model including all covariates as main effects and as an interaction with sex, and taking into account all levels of clustering. Discuss significance of the covariates. Is the effect of (some of) the covariates sex-dependent? (Note: stepwise backward model building is the subject of exercise Day 4)


Solution

model<-lme(mathgain~sex*(ses+housepov+yearstea+mathkind),random=~1|schoolid/classid,data=sii, na.action=na.exclude) #short
model1 <- lme(mathgain~sex+ses+housepov+yearstea+mathkind+sex:ses+sex:housepov+sex:yearstea+sex:mathkind,random=~1|schoolid/classid,data=sii, na.action=na.exclude) #long  

summary(model)
## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC   logLik
##   11408.22 11474.17 -5691.11
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.195376
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:     8.51822 27.23456
## 
## Fixed effects: mathgain ~ sex * (ses + housepov + yearstea + mathkind) 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)         278.63790 14.572595 871  19.120679  0.0000
## sexfemales          -11.20361 20.032320 871  -0.559277  0.5761
## ses                   5.70738  1.703249 871   3.350879  0.0008
## housepov             -5.32490 11.796167 105  -0.451410  0.6526
## yearstea              0.22939  0.141292 204   1.623505  0.1060
## mathkind             -0.47700  0.029986 871 -15.907333  0.0000
## sexfemales:ses       -0.05452  2.351317 871  -0.023187  0.9815
## sexfemales:housepov -24.27888 12.855978 871  -1.888529  0.0593
## sexfemales:yearstea  -0.34430  0.175657 871  -1.960051  0.0503
## sexfemales:mathkind   0.03950  0.041685 871   0.947695  0.3435
##  Correlation: 
##                     (Intr) sxfmls ses    houspv yearst mthknd sxfmls:s sxfmls:h
## sexfemales          -0.672                                                     
## ses                  0.214 -0.170                                              
## housepov            -0.241  0.118  0.096                                       
## yearstea            -0.165  0.093 -0.053  0.097                                
## mathkind            -0.974  0.666 -0.231  0.085  0.032                         
## sexfemales:ses      -0.155  0.216 -0.687 -0.073  0.041  0.169                  
## sexfemales:housepov  0.163 -0.221 -0.084 -0.539 -0.075 -0.080  0.171           
## sexfemales:yearstea  0.106 -0.123  0.039 -0.061 -0.615 -0.024 -0.020    0.151  
## sexfemales:mathkind  0.657 -0.982  0.181 -0.052 -0.018 -0.675 -0.239    0.093  
##                     sxfmls:y
## sexfemales                  
## ses                         
## housepov                    
## yearstea                    
## mathkind                    
## sexfemales:ses              
## sexfemales:housepov         
## sexfemales:yearstea         
## sexfemales:mathkind -0.002  
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.89285360 -0.62785115 -0.02861974  0.55658062  4.29948404 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

6

Exercise

Hip Data:

Looking at the profiles from the hipfracture dataset, it might be of interest to include a quadratic time effect. Define the quadratic time and add it to the model.

Write down and fit the model. Is the quadratic time effect needed here?

Solution

hip$TIME2<-hip$TIME^2
hip %>% 
   slice(1:6) %>% 
   kbl() %>% 
   kable_paper()
IDNR NEURO AGE MMSE TIME TIME2
1 not neuro-psychiatric 74 28 1 1
1 not neuro-psychiatric 74 28 3 9
1 not neuro-psychiatric 74 28 5 25
1 not neuro-psychiatric 74 26 8 64
1 not neuro-psychiatric 74 25 12 144
2 not neuro-psychiatric 67 25 1 1

\(_{Yij} = (beta_{0}+b_{i}) + beta_{1}*TIME + beta_{2}*TIME2 + \epsilon_{eij}\)

Fitting the model:

model<- lme(MMSE~TIME+TIME2,random=~1|IDNR,data=hip, na.action=na.exclude)
summary(model)
## Linear mixed-effects model fit by REML
##  Data: hip 
##       AIC     BIC    logLik
##   1486.63 1504.51 -738.3152
## 
## Random effects:
##  Formula: ~1 | IDNR
##         (Intercept) Residual
## StdDev:    7.558976 2.501436
## 
## Fixed effects: MMSE ~ TIME + TIME2 
##                 Value Std.Error  DF   t-value p-value
## (Intercept) 18.861017 1.0627175 205 17.747912  0.0000
## TIME         0.406287 0.1635060 205  2.484842  0.0138
## TIME2       -0.017868 0.0125315 205 -1.425834  0.1554
##  Correlation: 
##       (Intr) TIME  
## TIME  -0.347       
## TIME2  0.301 -0.963
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.06437286 -0.43322995  0.04405184  0.47938518  2.82910055 
## 
## Number of Observations: 267
## Number of Groups: 60

7

Exercise

Hip data

Include the age of the patient as an extra covariate in the model. Is there a different effect of age in both neuro groups?


Solution

No, because the interaction term is not significant:

model<- lme(MMSE~TIME+NEURO+AGE+AGE:NEURO,random=~1|IDNR,data=hip, na.action=na.exclude)
summary(model)
## Linear mixed-effects model fit by REML
##  Data: hip 
##        AIC      BIC    logLik
##   1447.366 1472.345 -716.6832
## 
## Random effects:
##  Formula: ~1 | IDNR
##         (Intercept) Residual
## StdDev:    5.460354 2.508963
## 
## Fixed effects: MMSE ~ TIME + NEURO + AGE + AGE:NEURO 
##                               Value Std.Error  DF   t-value p-value
## (Intercept)                56.50126  7.962052 206  7.096320  0.0000
## TIME                        0.17633  0.043895 206  4.017058  0.0001
## NEUROneuro-psychiatric     -4.88675 17.718753  56 -0.275796  0.7837
## AGE                        -0.43988  0.101196  56 -4.346782  0.0001
## NEUROneuro-psychiatric:AGE -0.03616  0.222840  56 -0.162276  0.8717
##  Correlation: 
##                            (Intr) TIME   NEUROn- AGE   
## TIME                       -0.035                      
## NEUROneuro-psychiatric     -0.449 -0.006               
## AGE                        -0.993  0.007  0.446        
## NEUROneuro-psychiatric:AGE  0.451  0.007 -0.996  -0.454
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.21901376 -0.47124584  0.01957349  0.46873450  2.80219031 
## 
## Number of Observations: 267
## Number of Groups: 60

However, age is a significant fixed effect:

model<- lme(MMSE~TIME+NEURO+AGE,random=~1|IDNR,data=hip, na.action=na.exclude)
summary(model)
## Linear mixed-effects model fit by REML
##  Data: hip 
##       AIC      BIC    logLik
##   1444.22 1465.652 -716.1098
## 
## Random effects:
##  Formula: ~1 | IDNR
##         (Intercept) Residual
## StdDev:    5.411095 2.508976
## 
## Fixed effects: MMSE ~ TIME + NEURO + AGE 
##                           Value Std.Error  DF   t-value p-value
## (Intercept)            57.08459  7.046333 206  8.101319   0e+00
## TIME                    0.17627  0.043893 206  4.015965   1e-04
## NEUROneuro-psychiatric -7.75201  1.523968  57 -5.086727   0e+00
## AGE                    -0.44734  0.089389  57 -5.004366   0e+00
##  Correlation: 
##                        (Intr) TIME   NEURO-
## TIME                   -0.044              
## NEUROneuro-psychiatric  0.006  0.008       
## AGE                    -0.992  0.011 -0.079
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.22127751 -0.47530674  0.01848392  0.47026467  2.80123379 
## 
## Number of Observations: 267
## Number of Groups: 60
anova.lme(model)
##             numDF denDF  F-value p-value
## (Intercept)     1   206 800.2194  <.0001
## TIME            1   206  16.9625   1e-04
## NEURO           1    57  30.2343  <.0001
## AGE             1    57  25.0437  <.0001

Exercises day 4

1

Videos on the theory:

Exercise

SII data

Start from the last model from day 3. Perform a backward model selection. What covariates are important in explaining the mathgain? Discuss the effects in the final model.


Solution

Stepwise backward elimination in the model with interactions between sex and other covaraites. The fullmodel:

model<-lme(mathgain~sex*(ses+housepov+yearstea+mathkind),random=~1|schoolid/classid,data=sii, na.action = na.omit)
summary(model)
## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC   logLik
##   11408.22 11474.17 -5691.11
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.195376
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:     8.51822 27.23456
## 
## Fixed effects: mathgain ~ sex * (ses + housepov + yearstea + mathkind) 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)         278.63790 14.572595 871  19.120679  0.0000
## sexfemales          -11.20361 20.032320 871  -0.559277  0.5761
## ses                   5.70738  1.703249 871   3.350879  0.0008
## housepov             -5.32490 11.796167 105  -0.451410  0.6526
## yearstea              0.22939  0.141292 204   1.623505  0.1060
## mathkind             -0.47700  0.029986 871 -15.907333  0.0000
## sexfemales:ses       -0.05452  2.351317 871  -0.023187  0.9815
## sexfemales:housepov -24.27888 12.855978 871  -1.888529  0.0593
## sexfemales:yearstea  -0.34430  0.175657 871  -1.960051  0.0503
## sexfemales:mathkind   0.03950  0.041685 871   0.947695  0.3435
##  Correlation: 
##                     (Intr) sxfmls ses    houspv yearst mthknd sxfmls:s sxfmls:h
## sexfemales          -0.672                                                     
## ses                  0.214 -0.170                                              
## housepov            -0.241  0.118  0.096                                       
## yearstea            -0.165  0.093 -0.053  0.097                                
## mathkind            -0.974  0.666 -0.231  0.085  0.032                         
## sexfemales:ses      -0.155  0.216 -0.687 -0.073  0.041  0.169                  
## sexfemales:housepov  0.163 -0.221 -0.084 -0.539 -0.075 -0.080  0.171           
## sexfemales:yearstea  0.106 -0.123  0.039 -0.061 -0.615 -0.024 -0.020    0.151  
## sexfemales:mathkind  0.657 -0.982  0.181 -0.052 -0.018 -0.675 -0.239    0.093  
##                     sxfmls:y
## sexfemales                  
## ses                         
## housepov                    
## yearstea                    
## mathkind                    
## sexfemales:ses              
## sexfemales:housepov         
## sexfemales:yearstea         
## sexfemales:mathkind -0.002  
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.89285360 -0.62785115 -0.02861974  0.55658062  4.29948404 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312
  1. Omit least significant term (starting with interation terms):
model2<-update(model,.~.-sex:ses )
summary(model2)
## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC    logLik
##   11409.77 11470.66 -5692.884
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.196347
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.517937 27.22176
## 
## Fixed effects: mathgain ~ sex + ses + housepov + yearstea + mathkind + sex:housepov +      sex:yearstea + sex:mathkind 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)         278.59085 14.390994 872  19.358694  0.0000
## sexfemales          -11.10523 19.550559 872  -0.568026  0.5702
## ses                   5.68033  1.237910 872   4.588646  0.0000
## housepov             -5.34830 11.761602 105  -0.454726  0.6502
## yearstea              0.22949  0.141126 204   1.626147  0.1055
## mathkind             -0.47689  0.029541 872 -16.143229  0.0000
## sexfemales:housepov -24.22817 12.661608 872  -1.913514  0.0560
## sexfemales:yearstea  -0.34437  0.175544 872  -1.961714  0.0501
## sexfemales:mathkind   0.03928  0.040454 872   0.970900  0.3319
##  Correlation: 
##                     (Intr) sxfmls ses    houspv yearst mthknd sxfmls:h sxfmls:y
## sexfemales          -0.662                                                     
## ses                  0.150 -0.031                                              
## housepov            -0.256  0.137  0.063                                       
## yearstea            -0.161  0.086 -0.035  0.100                                
## mathkind            -0.973  0.655 -0.160  0.099  0.025                         
## sexfemales:housepov  0.195 -0.268  0.047 -0.536 -0.083 -0.113                  
## sexfemales:yearstea  0.104 -0.122  0.034 -0.063 -0.614 -0.021  0.157           
## sexfemales:mathkind  0.647 -0.982  0.024 -0.072 -0.008 -0.664  0.140   -0.007  
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.89519431 -0.62812346 -0.02846769  0.55761536  4.29985590 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312
  1. Omit least significant term:
model3<-update(model2,.~.-sex:mathkind)
summary(model3)
## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC    logLik
##   11404.13 11459.96 -5691.066
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.205218
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.461796 27.23086
## 
## Fixed effects: mathgain ~ sex + ses + housepov + yearstea + mathkind + sex:housepov +      sex:yearstea 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)         269.55968 10.978084 873  24.554346  0.0000
## sexfemales            7.53737  3.724963 873   2.023474  0.0433
## ses                   5.65186  1.237601 873   4.566791  0.0000
## housepov             -4.52070 11.727911 105  -0.385465  0.7007
## yearstea              0.23077  0.141006 204   1.636594  0.1033
## mathkind             -0.45787  0.022102 873 -20.716396  0.0000
## sexfemales:housepov -25.97646 12.537727 873  -2.071864  0.0386
## sexfemales:yearstea  -0.34347  0.175560 873  -1.956448  0.0507
##  Correlation: 
##                     (Intr) sxfmls ses    houspv yearst mthknd sxfmls:h
## sexfemales          -0.187                                            
## ses                  0.176 -0.037                                     
## housepov            -0.276  0.351  0.065                              
## yearstea            -0.204  0.412 -0.035  0.100                       
## mathkind            -0.954  0.022 -0.193  0.068  0.026                
## sexfemales:housepov  0.137 -0.688  0.044 -0.533 -0.083 -0.026         
## sexfemales:yearstea  0.142 -0.674  0.035 -0.063 -0.615 -0.034  0.159  
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.88869525 -0.63454292 -0.02443226  0.56174303  4.31673108 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312
  1. Omit least significant term. Note : housepow must stay in model as main effect, as long as housepov is part of an interaction term:
model4<-update(model3,.~.-sex:yearstea)
summary(model4)
## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC    logLik
##   11404.31 11455.07 -5692.154
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.152796
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.587203 27.24878
## 
## Fixed effects: mathgain ~ sex + ses + housepov + yearstea + mathkind + sex:housepov 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)         272.57162 10.877696 874  25.057844  0.0000
## sexfemales            2.61081  2.754110 874   0.947969  0.3434
## ses                   5.73408  1.238130 874   4.631239  0.0000
## housepov             -5.96054 11.707259 105  -0.509132  0.6117
## yearstea              0.06115  0.111576 204   0.548012  0.5843
## mathkind             -0.45924  0.022113 874 -20.768311  0.0000
## sexfemales:housepov -22.01667 12.390680 874  -1.776874  0.0759
##  Correlation: 
##                     (Intr) sxfmls ses    houspv yearst mthknd
## sexfemales          -0.124                                   
## ses                  0.173 -0.018                            
## housepov            -0.270  0.418  0.068                     
## yearstea            -0.150 -0.004 -0.017  0.077              
## mathkind            -0.959 -0.001 -0.192  0.066  0.007       
## sexfemales:housepov  0.118 -0.797  0.039 -0.531  0.019 -0.021
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.86847848 -0.62029105 -0.03509731  0.56863270  4.30389815 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

Now, main effect of yearstea can be removed, as yearstea is no longer part of an interaction term:

model5<-update(model4,.~.-yearstea)
summary(model5)
## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC   logLik
##   11400.06 11445.75 -5691.03
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.184765
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.552127 27.24273
## 
## Fixed effects: mathgain ~ sex + ses + housepov + mathkind + sex:housepov 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)         273.49208 10.752582 874  25.435015  0.0000
## sexfemales            2.62108  2.753277 874   0.951986  0.3414
## ses                   5.74628  1.237654 874   4.642884  0.0000
## housepov             -6.46999 11.678182 105  -0.554024  0.5807
## mathkind             -0.45938  0.022107 874 -20.779831  0.0000
## sexfemales:housepov -22.16862 12.384934 874  -1.789966  0.0738
##  Correlation: 
##                     (Intr) sxfmls ses    houspv mthknd
## sexfemales          -0.126                            
## ses                  0.172 -0.018                     
## housepov            -0.262  0.419  0.069              
## mathkind            -0.969 -0.001 -0.192  0.066       
## sexfemales:housepov  0.122 -0.797  0.039 -0.534 -0.021
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -5.8763441 -0.6230294 -0.0302178  0.5707610  4.3001268 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

….

model6<-update(model5,.~.-sex:housepov)
summary(model6)
## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC   logLik
##   11408.12 11448.74 -5696.06
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.061953
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.719333 27.25296
## 
## Fixed effects: mathgain ~ sex + ses + housepov + mathkind 
##                 Value Std.Error  DF    t-value p-value
## (Intercept) 275.73178 10.677641 875  25.823287  0.0000
## sexfemales   -1.30972  1.664641 875  -0.786789  0.4316
## ses           5.83023  1.237521 875   4.711221  0.0000
## housepov    -17.53600  9.848187 105  -1.780632  0.0779
## mathkind     -0.46003  0.022116 875 -20.800621  0.0000
##  Correlation: 
##            (Intr) sxfmls ses    houspv
## sexfemales -0.048                     
## ses         0.169  0.021              
## housepov   -0.234 -0.012  0.107       
## mathkind   -0.974 -0.029 -0.191  0.065
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.78800233 -0.62708538 -0.01617522  0.54849256  4.29842019 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

….

model7<-update(model6,.~.-sex)
summary(model7)
## Linear mixed-effects model fit by REML
##  Data: sii 
##       AIC      BIC    logLik
##   11409.6 11445.14 -5697.798
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.061356
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.685155 27.25499
## 
## Fixed effects: mathgain ~ ses + housepov + mathkind 
##                 Value Std.Error  DF    t-value p-value
## (Intercept) 275.32608 10.663464 876  25.819572  0.0000
## ses           5.85139  1.237085 876   4.729980  0.0000
## housepov    -17.62905  9.841465 105  -1.791303  0.0761
## mathkind     -0.46054  0.022104 876 -20.835602  0.0000
##  Correlation: 
##          (Intr) ses    houspv
## ses       0.171              
## housepov -0.235  0.107       
## mathkind -0.977 -0.191  0.065
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.76624565 -0.63606866 -0.02284166  0.55176293  4.27668545 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

…. final model

model8<-update(model7,.~.-housepov)
summary(model8)
## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC    logLik
##   11417.21 11447.68 -5702.604
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    8.994781
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.805618 27.26732
## 
## Fixed effects: mathgain ~ ses + mathkind 
##                 Value Std.Error  DF    t-value p-value
## (Intercept) 270.78470 10.371906 876  26.107515       0
## ses           6.08792  1.230806 876   4.946288       0
## mathkind     -0.45788  0.022074 876 -20.742919       0
##  Correlation: 
##          (Intr) ses   
## ses       0.203       
## mathkind -0.992 -0.199
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.81406721 -0.63030057 -0.02714651  0.56205803  4.27064533 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

2

Exercise

Hip data

Fit the model with only main effects, no interaction effects. Obtain the Empirical-Bayes estimates for the patients. Present them in a histogram.


Solution

Fit the model

model<-lme(MMSE ~ NEURO + TIME +AGE , random=~1|IDNR,data=hip, na.action = na.omit)

Get the Bayes estimates and plot histogram

BE <- data.frame(ranef(model)[1])
names(BE) <- "Intercept"
BE$y <- dnorm(BE$Intercept)

ggplot(BE, aes(x = Intercept)) + geom_histogram()


3

Exercise

Hip data

Make a dataset with all predictions for the 60 patients.


Solution

Get predicted mean (given covariates NEURO, AGE and TIME):

predMean<-predict(model,level=0)

Get prediction for the individual (given covariates NEURO, AGE and TIME, plus individual random intercept):

predIndiv<-predict(model,level=1)

df <- data.frame(cbind(cbind(predMean,predIndiv)))
df %>% 
   slice(1:10) %>% 
   kbl() %>% 
   kable_paper()
predMean predIndiv
X1 24.15806 26.07161
X1.1 24.51061 26.42416
X1.2 24.86315 26.77670
X1.3 25.39197 27.30552
X1.4 26.09706 28.01061
X2 27.28941 24.57311
X2.1 27.64196 24.92566
X2.2 27.99450 25.27820
X2.3 28.52332 25.80702
X3 27.28941 27.18284

Case Study

1

The toenail dataset is the result of a randomized, double-blind, parallel group, multicenter study for the comparison of two oral treatments (0 and 1) for toenail dermatophyte onychomycosis (TDO). TDO is a common toenail infection, difficult to treat, and affecting more than 2 out of 100 persons. In total, 36 hospitals were randomized. All patients with an infection on the big toenail were randomized in 2 treatment groups and were followed during 12 weeks (3 months) of treatment and followed further up to a total of 12 months. Measurements were taken at baseline, every month during treatment, and every 3 months afterwards, resulting in a maximum of 7 measurements per subject. The measure of interest is the unaffected nail length in mm (‘nailbig’). Also, the age of the patient at the beginning of the study is denoted.


Exercise

First, make a descriptive plot to see how the mean unaffected nail length evolves on average over time between the two treatment groups. Comment on the graph.


Solution

Import and clean data:

nail <- import("toenail.txt")
nail$treat <- factor(nail$treat, levels = c(0,1), labels = c("placebo", "active treatment"))
nail$id <- factor(nail$id)
nail$center <- factor(nail$center)
nail$timeNum <- nail$time
nail$time <- factor(nail$time, levels = c(0,1,2,3,6,9,12), labels = c("0","1","2","3","6","9","12")) # because not linearly measured

Generate the graph:

ggplot(nail, aes(x = time, y = nailbig, color = treat)) +
    stat_summary(fun = mean, geom = "point", size = 3) +
   geom_jitter(width = 0.3, size = 0.1) + labs(x = "time in months")

In general, the unaffeted nail length is increasing over time in both groups. The increase seems not to be linear with time. The increase seems to be higher in the active group. The variation between subjects is quite large.


2

Exercise

For the model we will first choose the random structure, based on a fixed structure with time and the interaction with time and treat, ignoring the covariate age (time as continuous variable).

  1. Fit a model, taking into account both levels of clustering.
  2. Interpret the variance components. Is this something you expected in this case?
  3. Do you think it is useful to take into account both levels of clustering? (No random slope, but 2 intercepts center/id, time continuous.)

Solution

  1. Fitting the model: treat is not included because it is a RCT
model <- lme(nailbig ~ time + time:treat, random = ~1 | center/id, data = nail, na.action = na.omit)
summary(model)
## Linear mixed-effects model fit by REML
##  Data: nail 
##        AIC      BIC    logLik
##   9274.068 9367.866 -4620.034
## 
## Random effects:
##  Formula: ~1 | center
##          (Intercept)
## StdDev: 0.0008601381
## 
##  Formula: ~1 | id %in% center
##         (Intercept) Residual
## StdDev:    2.560856 2.492617
## 
## Fixed effects: nailbig ~ time + time:treat 
##                                 Value Std.Error   DF   t-value p-value
## (Intercept)                  1.693921 0.2934113 1542  5.773197  0.0000
## time1                        0.896659 0.2943568 1542  3.046163  0.0024
## time2                        1.987248 0.2972278 1542  6.685941  0.0000
## time3                        3.504213 0.3022572 1542 11.593481  0.0000
## time6                        5.621216 0.3075350 1542 18.278298  0.0000
## time9                        6.224012 0.3207449 1542 19.404868  0.0000
## time12                       6.324421 0.3216902 1542 19.659971  0.0000
## time0:treatactive treatment  0.393414 0.4138003 1542  0.950733  0.3419
## time1:treatactive treatment  0.289620 0.4175257 1542  0.693659  0.4880
## time2:treatactive treatment  0.309150 0.4209154 1542  0.734470  0.4628
## time3:treatactive treatment  0.202418 0.4271594 1542  0.473870  0.6357
## time6:treatactive treatment  0.230427 0.4351366 1542  0.529551  0.5965
## time9:treatactive treatment  0.836164 0.4499432 1542  1.858376  0.0633
## time12:treatactive treatment 0.891608 0.4517271 1542  1.973777  0.0486
##  Correlation: 
##                              (Intr) time1  time2  time3  time6  time9  time12
## time1                        -0.486                                          
## time2                        -0.481  0.490                                   
## time3                        -0.473  0.482  0.482                            
## time6                        -0.465  0.474  0.474  0.473                     
## time9                        -0.448  0.454  0.454  0.453  0.451              
## time12                       -0.446  0.453  0.453  0.452  0.449  0.445       
## time0:treatactive treatment  -0.709  0.345  0.341  0.336  0.330  0.317  0.316
## time1:treatactive treatment  -0.360 -0.363 -0.007 -0.007 -0.007 -0.006 -0.006
## time2:treatactive treatment  -0.357 -0.007 -0.371 -0.010 -0.010 -0.009 -0.009
## time3:treatactive treatment  -0.352 -0.007 -0.011 -0.382 -0.015 -0.013 -0.013
## time6:treatactive treatment  -0.345 -0.007 -0.010 -0.015 -0.393 -0.017 -0.017
## time9:treatactive treatment  -0.333 -0.007 -0.010 -0.014 -0.018 -0.421 -0.026
## time12:treatactive treatment -0.332 -0.007 -0.010 -0.014 -0.018 -0.026 -0.422
##                              tm0:tt tm1:tt tm2:tt tm3:tt tm6:tt tm9:tt
## time1                                                                 
## time2                                                                 
## time3                                                                 
## time6                                                                 
## time9                                                                 
## time12                                                                
## time0:treatactive treatment                                           
## time1:treatactive treatment   0.508                                   
## time2:treatactive treatment   0.504  0.506                            
## time3:treatactive treatment   0.497  0.498  0.498                     
## time6:treatactive treatment   0.488  0.489  0.489  0.488              
## time9:treatactive treatment   0.471  0.472  0.472  0.471  0.468       
## time12:treatactive treatment  0.469  0.470  0.470  0.469  0.466  0.461
## 
## Standardized Within-Group Residuals:
##           Min            Q1           Med            Q3           Max 
## -3.7888417338 -0.5300417022  0.0001921644  0.5666066228  5.1732155056 
## 
## Number of Observations: 1854
## Number of Groups: 
##         center id %in% center 
##             26            299
anova.lme(model)
##             numDF denDF  F-value p-value
## (Intercept)     1  1542 980.5896  <.0001
## time            6  1542 291.9657  <.0001
## time:treat      7  1542   0.9105  0.4972
  1. Variance components
VarCorr(model)
##             Variance     StdDev      
## center =    pdLogChol(1)             
## (Intercept) 7.398375e-07 0.0008601381
## id =        pdLogChol(1)             
## (Intercept) 6.557983e+00 2.5608559349
## Residual    6.213138e+00 2.4926167123

Yes, the variance between subjects is much larger than the variance between hospitals. This makes sense.

  1. The fraction explained by the center is very low. there is no difference between the models. I think one can do the analysis without considering for center clustering.
model1 <- lme(nailbig ~ time + time:treat, random = ~1 | id, data = nail, na.action = na.omit)
anova(model, model1)
##        Model df      AIC      BIC    logLik   Test   L.Ratio p-value
## model      1 17 9274.068 9367.866 -4620.034                         
## model1     2 16 9271.112 9359.393 -4619.556 1 vs 2 0.9555801  0.3283

3

Exercise

  1. Calculate the variances of the unaffected nail length for the different time points (in both treatment groups separately).
  2. Do you think a random slope might be of interest here? If yes, fit the linear mixed model with both random intercept and slope.
  3. test if the random slope model is more likely than the random intercept model by using

Solution

  1. Calculate the variances of the unaffected nail length for the different time points (in both treatment groups separately):
nail_var <- nail %>% 
   group_by(timeNum, treat) %>% 
   summarise(Variance = var(nailbig))

nail_var %>% 
   kbl() %>% 
   kable_paper()
timeNum treat Variance
0 placebo 5.640835
0 active treatment 8.572255
1 placebo 6.440780
1 active treatment 8.449233
2 placebo 7.922346
2 active treatment 9.314727
3 placebo 10.898467
3 active treatment 10.144576
6 placebo 15.517773
6 active treatment 14.188550
9 placebo 20.460068
9 active treatment 20.571358
12 placebo 27.233707
12 active treatment 25.052586
ggplot(nail_var, aes(x = timeNum, y = Variance, fill = treat)) + geom_bar(stat = "identity") + labs(y = "total variance", x = "time")

The variance increases with time, but about at the same level in both groups.


  1. linear mixed model with both random intercept and slope
model2 <- lme(nailbig ~ timeNum + timeNum:treat, random = ~1 + timeNum | id, data = nail, na.action = na.exclude)
summary(model2)
## Linear mixed-effects model fit by REML
##  Data: nail 
##        AIC      BIC   logLik
##   8723.239 8761.904 -4354.62
## 
## Random effects:
##  Formula: ~1 + timeNum | id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 2.7133258 (Intr)
## timeNum     0.4752377 -0.389
## Residual    1.7750636       
## 
## Fixed effects: nailbig ~ timeNum + timeNum:treat 
##                                   Value  Std.Error   DF   t-value p-value
## (Intercept)                   2.5970699 0.16917771 1554 15.351135  0.0000
## timeNum                       0.5820213 0.04303633 1554 13.523953  0.0000
## timeNum:treatactive treatment 0.0571236 0.05687196 1554  1.004425  0.3153
##  Correlation: 
##                               (Intr) timeNm
## timeNum                       -0.312       
## timeNum:treatactive treatment  0.002 -0.684
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.46797352 -0.51623252 -0.06386359  0.45542574  5.65730820 
## 
## Number of Observations: 1854
## Number of Groups: 298
anova.lme(model2)
##               numDF denDF  F-value p-value
## (Intercept)       1  1554 683.0674  <.0001
## timeNum           1  1554 378.9960  <.0001
## timeNum:treat     1  1554   1.0089  0.3153

Remark: R crashes (in my case) when not using time as numeric varible…

  1. Compare model with and without random slope.
anova(model1, model2)
##        Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## model1     1 16 9271.112 9359.393 -4619.556                        
## model2     2  7 8723.239 8761.904 -4354.620 1 vs 2 529.8732  <.0001

The random slope improves the model…


4

Exercise

We will now look at the fixed structure, using the random structure that was chosen.

  1. We will start with ignoring the age effect. Formally test if there is an evolution over time and if it is equal in both treatment groups. What fixed effects structure would you use based on this result? Fit this model. What does this mean?
  2. Going further with the model you found above, incorporate age as an extra covariate. Does it have an effect?

Solution

  1. Test if there is an evolution over time and if it is equal in both treatment groups
anova.lme(model2)
##               numDF denDF  F-value p-value
## (Intercept)       1  1554 683.0674  <.0001
## timeNum           1  1554 378.9960  <.0001
## timeNum:treat     1  1554   1.0089  0.3153

There is a time effect, but the interaction is not significant, thus, no different evolution between groups.


  1. Incorporate age as an extra covariate (the interaction is omitted because not relevant)
model3 <- lme(nailbig ~ timeNum + age, ~1 + timeNum | id, data = nail, na.action = na.exclude)
summary(model3)
## Linear mixed-effects model fit by REML
##  Data: nail 
##        AIC      BIC   logLik
##   8725.119 8763.784 -4355.56
## 
## Random effects:
##  Formula: ~1 + timeNum | id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 2.7016795 (Intr)
## timeNum     0.4750058 -0.378
## Residual    1.7748809       
## 
## Fixed effects: nailbig ~ timeNum + age 
##                 Value Std.Error   DF   t-value p-value
## (Intercept) 1.1529033 1.3497224 1554  0.854178  0.3931
## timeNum     0.6109973 0.0314125 1554 19.450765  0.0000
## age         0.0225356 0.0208990 1554  1.078312  0.2811
##  Correlation: 
##         (Intr) timeNm
## timeNum -0.041       
## age     -0.992 -0.011
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.4914541 -0.5200767 -0.0587634  0.4530281  5.6613576 
## 
## Number of Observations: 1854
## Number of Groups: 298
anova.lme(model3)
##             numDF denDF  F-value p-value
## (Intercept)     1  1554 670.7432  <.0001
## timeNum         1  1554 378.8407  <.0001
## age             1  1554   1.1628  0.2811

No main effect of age.