Feel free to leave a comment in case of mistakes…

Data used in the exercises

The SII data

The data come from the SII (Study of Instructional Improvement), which was carried out to assess the math achievement scores of first- and third-grade pupils in randomly selected classrooms from a national US sample of elementary schools. The dataset includes results for 1190 first-grade pupils sampled from 312 classrooms in 107 schools. From the sample of elementary schools, all first-grade classrooms were considered (1 to 9). Within a classroom, 1 to 10 children were sampled to take a test on math. The following variables were measured:

schoolid: school’s ID number
classid: classroom’s ID number
childid: pupil’s ID number
sex: an indicator variable for sex (0 for males, 1 for females)
mathkind: pupil’s math score in the spring of the kindergarten year
mathgain: pupil’s gain in the math achievement score from the spring of kindergarten to the spring of first grade
ses: pupil’s socioeconomic status
housepov: % of households in the neighborhood of the school below the poverty level
yearstea: years of teacher’s experience in teaching in the first grade

The hip fracture Data

Hip fractures are a very common and important health problem among elderly people. They are often the result of an unfortunate, seemingly innocent fall in the patient’s home environment. Less fluent movements combined with bad vision and impaired reaction skills makes older people more at risk for such falls. The weaker bones and especially the presence of osteoporosis also causes the hip to break more easily. The hip surgery needed is a serious operation and can involve a lot of blood loss. A relatively long recovery period (8 to 12 days) in the hospital is therefore a real necessity. The combination of these overwhelming events can also cause the mental status of the patients to worsen. When this deterioration is very severe, typically causing a loss of orientation in time and space, this phenomenon is called a delirium. In daily practice, the ‘Mini-Mental State Examination score’ (MMSE) is sometimes being used to measure the cognitive status of patients. It is the sum of 30 binary outcomes which result from correct (1) or incorrect (0) answer to specific questions (e.g. ‘Where are you now?’). Obviously, higher MMSE values indicate better cognitive functioning.

It is of interest to study how this MMSE evolution depends on certain characteristics of the patients. The data set used to this purpose contains the MMSE score for 60 patients, at 1, 3, 5, 8 and 12 days after surgery. For each patient the following variables are measured:

IDNR: id of the patient
NEURO: pre-operative neuro-status (1: neuro-psychiatric, 0: not neuro-psychiatric)
AGE: age of the patient
MMSE: Mini-Mental State Examination score
TIME: days after surgery: 1, 3, 5, 8 and 12.

Packages used for the exercises

library(rio)
library(nlme)
library(ggplot2)
library(lmtest)
library(kableExtra)
library(dplyr)

Exercises day 1

1

Vidoes on the theory:

Exercise

Describe in detail why in both datasets, one cannot make use of basic statistical models, like linear regression or ANOVA. What assumptions are violated? What special kind of data is the hip fracture dataset?

Solution

Classic ANOVA and regression assume all observations are independent. Here, we have related data observations that are not independent:

Hip fracture: multiple measurements (of MMSE) over time in the same individual = longitudinal study.

SII: here there is non-independence at two levels:

pupils within one class are not independent.
classes within schools are not independent.

The purpose of multilevel (or mixed) models is to account for the relatedness between these observations.

2

Exercise

Read in the data in R, check the variable types via str() or the “environment” tab in RStudio, and coerce all variables into the correct type. Assign the appropriate factor levels.

Solution

Hip data

hip <- import("hipfracture2.txt")
hip$NEURO <- factor(hip$NEURO, levels = c(0, 1), labels = c("not neuro-psychiatric", "neuro-psychiatric"))
hip$IDNR <- factor(hip$IDNR)

hip %>% 
   slice(1:6) %>% 
   kbl() %>% 
   kable_paper()

IDNR	NEURO	AGE	MMSE	TIME
1	not neuro-psychiatric	74	28	1
1	not neuro-psychiatric	74	28	3
1	not neuro-psychiatric	74	28	5
1	not neuro-psychiatric	74	26	8
1	not neuro-psychiatric	74	25	12
2	not neuro-psychiatric	67	25	1

SII data

sii <- import("SII.txt")
sii$sex <- factor(sii$sex, levels = c(0,1), labels = c("males", "females"))
sii$schoolid <- factor(sii$schoolid)
sii$classid <- factor(sii$classid)
sii$childid <- factor(sii$childid)
sii %>% 
   slice(1:6) %>% 
   kbl() %>% 
   kable_paper()

schoolid	classid	childid	sex	mathkind	mathgain	ses	housepov	yearstea
1	160	1	females	448	32	0.46	0.082	1
1	160	2	males	460	109	-0.27	0.082	1
1	160	3	females	511	56	-0.03	0.082	1
1	217	4	males	449	83	-0.38	0.082	2
1	217	5	males	425	53	-0.03	0.082	2
1	217	6	females	450	65	0.76	0.082	2

3

Exercise

SII data: Explore the mathgain values in boys and girls in the different schools using an index plot.

Solution

Mean in both sexes for each school:

sii %>% 
  group_by(schoolid, sex) %>% 
  summarise(mean = mean(mathgain)) %>% 
  kbl() %>% 
  kable_paper() %>% 
  scroll_box(width = "300px", height = "200px")

schoolid	sex	mean
1	males	59.285714
1	females	60.250000
2	males	68.500000
2	females	59.750000
3	males	94.636364
3	females	67.666667
4	males	6.333333
4	females	64.000000
5	males	79.000000
5	females	50.750000
6	males	83.142857
6	females	72.400000
7	males	56.250000
7	females	57.800000
8	males	41.750000
8	females	52.750000
9	males	37.666667
9	females	32.000000
10	males	82.300000
10	females	65.750000
11	males	54.181818
11	females	48.400000
12	males	56.785714
12	females	47.692308
13	males	54.500000
13	females	42.200000
14	males	44.400000
14	females	44.200000
15	males	82.571429
15	females	66.166667
16	males	96.000000
16	females	89.750000
17	males	67.285714
17	females	104.000000
18	males	117.500000
18	females	62.000000
19	males	63.800000
19	females	48.666667
20	males	57.200000
20	females	65.000000
21	males	55.307692
21	females	61.250000
22	males	46.000000
22	females	37.333333
23	males	66.333333
23	females	75.500000
24	males	-16.000000
24	females	45.285714
25	males	23.000000
25	females	30.400000
26	males	72.000000
26	females	36.833333
27	males	46.500000
27	females	74.888889
28	males	25.800000
28	females	67.200000
29	males	71.500000
29	females	35.666667
30	males	27.000000
30	females	29.000000
31	males	57.375000
31	females	46.785714
32	males	68.666667
32	females	44.666667
33	males	64.000000
33	females	61.416667
34	males	35.250000
34	females	71.333333
35	males	20.166667
35	females	42.200000
36	males	74.000000
36	females	71.333333
37	males	69.000000
37	females	40.714286
38	males	51.857143
38	females	33.500000
39	males	27.857143
39	females	26.400000
40	males	28.000000
40	females	46.500000
41	males	67.857143
41	females	72.000000
42	males	61.000000
42	females	58.083333
43	males	70.000000
43	females	50.666667
44	males	74.875000
44	females	63.777778
45	males	82.666667
45	females	42.500000
46	males	58.285714
46	females	49.000000
47	males	88.571429
47	females	39.333333
48	males	52.000000
48	females	47.500000
49	males	81.714286
49	females	80.000000
50	males	41.714286
50	females	47.000000
51	males	40.000000
51	females	73.000000
52	males	48.000000
52	females	39.000000
53	males	83.000000
53	females	65.800000
54	males	40.625000
54	females	18.500000
55	males	51.714286
55	females	55.666667
56	males	55.333333
56	females	47.250000
57	males	72.666667
57	females	69.200000
58	males	43.500000
58	females	54.333333
59	males	46.333333
59	females	41.000000
60	males	73.875000
60	females	73.800000
61	males	53.600000
61	females	34.833333
62	males	62.166667
62	females	47.000000
63	males	74.000000
63	females	17.000000
64	males	56.333333
64	females	84.000000
65	males	60.750000
65	females	52.800000
66	males	61.666667
66	females	54.500000
67	males	87.833333
67	females	72.333333
68	males	60.000000
68	females	84.000000
69	males	78.333333
69	females	44.000000
70	males	96.750000
70	females	69.000000
71	males	51.923077
71	females	53.000000
72	males	54.400000
72	females	47.666667
73	males	59.500000
73	females	39.000000
74	males	58.333333
75	males	99.000000
75	females	70.333333
76	males	51.266667
76	females	34.916667
77	males	58.666667
77	females	52.250000
78	males	39.250000
78	females	31.000000
79	males	85.000000
79	females	56.555556
80	males	66.333333
80	females	80.750000
81	males	70.000000
81	females	64.428571
82	males	57.384615
82	females	55.000000
83	males	46.250000
83	females	61.000000
84	males	72.400000
84	females	86.666667
85	males	39.600000
85	females	66.000000
86	males	31.600000
86	females	41.500000
87	males	43.500000
87	females	49.250000
88	males	52.000000
88	females	36.000000
89	males	40.000000
89	females	59.500000
90	males	58.500000
90	females	34.000000
91	males	36.625000
91	females	60.250000
92	males	72.666667
92	females	59.833333
93	males	31.900000
93	females	50.909091
94	males	80.833333
94	females	83.666667
95	males	35.000000
95	females	75.500000
96	males	61.500000
96	females	41.000000
97	males	53.000000
97	females	91.500000
98	males	61.000000
98	females	37.000000
99	males	54.000000
99	females	49.666667
100	males	44.000000
100	females	69.555556
101	males	61.909091
101	females	68.800000
102	males	47.200000
102	females	58.666667
103	males	104.500000
103	females	86.000000
104	males	91.333333
104	females	84.000000
105	males	59.000000
105	females	63.800000
106	males	9.500000
107	males	40.500000
107	females	50.125000

ggplot2::ggplot(sii, aes(x = schoolid, y = mathgain, color = sex)) + 
  stat_summary(fun = mean, geom = "point") +
  theme(axis.text.x = element_text(size = 3))

4

Exercise

Hip data: Plot the MMSE evolutions over time. You can make one plot with all the patients, using a different color for neuro and non-Neuro patients, or generate two separate plots for the neuro and non-neuro persons, respectively. Discuss what you see here.

Solution

ggplot(hip, aes(x = TIME, y = MMSE, color = NEURO)) + 
  stat_summary(fun = mean, geom = "point", size = 3) + 
  geom_jitter(size = 0.3, width = 0.2) +
  geom_line(aes(group = IDNR), size = 0.2)

Exercises day 2

Vidoes on the theory:

1

Exercise

SII data Is there a difference of math achievement score between males and females? A descriptive plot was already made:

sii <- import("SII.txt")
sii$sex <- factor(sii$sex, levels = c(0,1), labels = c("males", "females"))
sii$schoolid <- factor(sii$schoolid)

ggplot2::ggplot(sii, aes(x = schoolid, y = mathgain, color = sex)) + 
  stat_summary(fun = mean, geom = "point") +
  theme(axis.text.x = element_text(size = 3))

We will now look at a mixed model. We still only consider the clustering on school-level and ignore the clustering on the class-level (this will be included in a later stage). Write down the full model with assumptions.

Solution

\(Y_{ij} = (beta_{0} + beta_{i}) + beta_{2}*sex + \epsilon_{ij}\)

with \(b_{i}\) = random intercept

2

Exercise

SII data Fit now the model in R, discuss the within-cluster variability and between-cluster variability.

Solution

Basic model output

model<-lme(mathgain~sex,random=~1|schoolid,data=sii)
summary(model)

## Linear mixed-effects model fit by REML
##  Data: sii 
##       AIC      BIC    logLik
##   11780.3 11800.62 -5886.151
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept) Residual
## StdDev:    10.45341 33.07291
## 
## Fixed effects: mathgain ~ sex 
##                Value Std.Error   DF  t-value p-value
## (Intercept) 58.65798  1.748933 1082 33.53931  0.0000
## sexfemales  -2.13315  1.965660 1082 -1.08521  0.2781
##  Correlation: 
##            (Intr)
## sexfemales -0.566
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -4.8454731 -0.6000218 -0.0198554  0.5417158  5.6376937 
## 
## Number of Observations: 1190
## Number of Groups: 107

anova.lme(model,type="marginal")

##             numDF denDF   F-value p-value
## (Intercept)     1  1082 1124.8850  <.0001
## sex             1  1082    1.1777  0.2781

Test of fixed effects (Type 3): The Score in Girls is on average about 2 point lower, however the \(se\) and thus, the p-Value is large (28%).

fixed_effects <- anova.lme(model, type = "marginal")
fixed_effects

##             numDF denDF   F-value p-value
## (Intercept)     1  1082 1124.8850  <.0001
## sex             1  1082    1.1777  0.2781

Estimating the covariance of parameters: The first row (intercepts) represents the between school variability, the second row (residual) the within school variability.

covPar <- VarCorr(model)
covPar

## schoolid = pdLogChol(1) 
##             Variance  StdDev  
## (Intercept)  109.2739 10.45341
## Residual    1093.8173 33.07291

Total variability is decomposed as within-cluster variability (residuals) and between-cluster variability (schools).

betwVar <- as.numeric(covPar[1,1])
withVar <- as.numeric(covPar[2,1])
totVar <- betwVar+withVar
Expl <- betwVar/totVar
Expl

## [1] 0.09082761

Thus, 9% of thr variance due to variance between schools.

3

Exercise

Hip data

Fit a mixed model, taking into account both neuro-groups (exclude observations with a missing value). Discuss the variability between and within patients. Write down the model with assumptions.

Is there a different evolution of MMSE score in the two neuro-status groups?
Is there a MMSE evolution over time?

Solution

Mixed model for MMSE over time in both neuro groups: MMSE of \(measurement_{j}\) in \(ID_{i}\)

\(Y_{ij} = (beta_{0}+b_{i}) + beta_{1}*NEURO + beta_{2}*TIME + beta_{3}*NEURO*TIME + \epsilon_{ij}\)

Fit the model

model<- lme(MMSE~NEURO+TIME+NEURO:TIME,random=~1|IDNR,data=hip,na.action=na.exclude)
summary(model)

## Linear mixed-effects model fit by REML
##  Data: hip 
##        AIC      BIC    logLik
##   1464.789 1486.222 -726.3943
## 
## Random effects:
##  Formula: ~1 | IDNR
##         (Intercept) Residual
## StdDev:    6.478461 2.513081
## 
## Fixed effects: MMSE ~ NEURO + TIME + NEURO:TIME 
##                                 Value Std.Error  DF   t-value p-value
## (Intercept)                 22.011719 1.0770182 205 20.437648  0.0000
## NEUROneuro-psychiatric      -8.038534 1.8690377  58 -4.300894  0.0001
## TIME                         0.197531 0.0523993 205  3.769725  0.0002
## NEUROneuro-psychiatric:TIME -0.059808 0.0964234 205 -0.620264  0.5358
##  Correlation: 
##                             (Intr) NEUROn- TIME  
## NEUROneuro-psychiatric      -0.576               
## TIME                        -0.254  0.146        
## NEUROneuro-psychiatric:TIME  0.138 -0.258  -0.543
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.14473201 -0.44157131  0.01532865  0.48824740  2.81016393 
## 
## Number of Observations: 267
## Number of Groups: 60

Get the variances

covPar <- VarCorr(model)
covPar

## IDNR = pdLogChol(1) 
##             Variance  StdDev  
## (Intercept) 41.970459 6.478461
## Residual     6.315578 2.513081

betwVar <- as.numeric(covPar[1,1])
withVar <- as.numeric(covPar[2,1])
totVar <- betwVar+withVar
Expl <- betwVar/totVar
Expl

## [1] 0.8692049

There is no different evolution over time as the p-value for the interaction term is 53%

summary(model)

## Linear mixed-effects model fit by REML
##  Data: hip 
##        AIC      BIC    logLik
##   1464.789 1486.222 -726.3943
## 
## Random effects:
##  Formula: ~1 | IDNR
##         (Intercept) Residual
## StdDev:    6.478461 2.513081
## 
## Fixed effects: MMSE ~ NEURO + TIME + NEURO:TIME 
##                                 Value Std.Error  DF   t-value p-value
## (Intercept)                 22.011719 1.0770182 205 20.437648  0.0000
## NEUROneuro-psychiatric      -8.038534 1.8690377  58 -4.300894  0.0001
## TIME                         0.197531 0.0523993 205  3.769725  0.0002
## NEUROneuro-psychiatric:TIME -0.059808 0.0964234 205 -0.620264  0.5358
##  Correlation: 
##                             (Intr) NEUROn- TIME  
## NEUROneuro-psychiatric      -0.576               
## TIME                        -0.254  0.146        
## NEUROneuro-psychiatric:TIME  0.138 -0.258  -0.543
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.14473201 -0.44157131  0.01532865  0.48824740  2.81016393 
## 
## Number of Observations: 267
## Number of Groups: 60

When omitting the interaction term, TIME is a significant fixed effect.

model<- lme(MMSE~NEURO+TIME,random=~1|IDNR,data=hip, na.action= na.exclude)
anova.lme(model,type="marginal")

##             numDF denDF  F-value p-value
## (Intercept)     1   206 428.4846  <.0001
## NEURO           1    58  21.2744  <.0001
## TIME            1   206  16.7848   1e-04

Exercises day 3

1

Vidoes on the theory

Exercise

SII data

Give and fit the model, testing if there is a difference in mathgain between boys and girls, taking into account clustering on both school and class levels. Write down this model (no other covariates have to be considered yet).

Solution

\(Y_{ijk} = beta_{0} + a_{i} + b_{i(j)} + beta_{1}*sex + \epsilon_{ijk}\)

with \(a_{i}\) = school effect
and \(b_{i(j)}\) = class effect (within school)

2

Exercise

SII data

Is there a difference in mathgain between boys and girls?

Solution

The main effect of sex is not significant:

model<-lme(mathgain~sex,random=~1|schoolid/classid,data=sii, na.action=na.exclude)
anova.lme(model)

##             numDF denDF   F-value p-value
## (Intercept)     1   877 1583.0446  <.0001
## sex             1   877    1.4586  0.2275

summary(model)

## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC    logLik
##   11774.14 11799.54 -5882.068
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    8.772372
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    10.05193 32.04467
## 
## Fixed effects: mathgain ~ sex 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 58.61460  1.747250 877 33.54678  0.0000
## sexfemales  -2.35259  1.947955 877 -1.20772  0.2275
##  Correlation: 
##            (Intr)
## sexfemales -0.564
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.67888536 -0.58891986 -0.03238045  0.53965216  5.61255392 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

3

Exercise

SII data

Discuss the different variance components from the model above. What fraction of the total variance is due to the variability between classes and between schools?

Solution

VarCorr(model)

##             Variance     StdDev   
## schoolid =  pdLogChol(1)          
## (Intercept)   76.9545     8.772372
## classid =   pdLogChol(1)          
## (Intercept)  101.0413    10.051932
## Residual    1026.8606    32.044666

varSchool <- as.numeric(VarCorr(model)[2,1])
varClass <- as.numeric(VarCorr(model)[4,1])
varRes <- as.numeric(VarCorr(model)[5,1])
varTot <- varSchool + varClass + varRes

fracSchool <- varSchool/varTot
fracClass <- varClass/varTot

The fraction of the total variation explained by schools is 0.0638703 and 0.0838617 by classes.

4

Exercise

SII data

What is the correlation between mathgain scores from the same school, but different classes?
What is the correlation between mathgain scores from the same class?

Solution

What is the correlation between mathgain scores from the same school, but different classes?

varSchool/varTot

## [1] 0.06387027

What is the correlation between mathgain scores from the same class?

(varSchool+varClass)/varTot

## [1] 0.147732

One must add the variance due to school in numerator as students from same class are also within the same school!

5

Exercise

SII data

Fit 1 model including all covariates as main effects and as an interaction with sex, and taking into account all levels of clustering. Discuss significance of the covariates. Is the effect of (some of) the covariates sex-dependent? (Note: stepwise backward model building is the subject of exercise Day 4)

Solution

model<-lme(mathgain~sex*(ses+housepov+yearstea+mathkind),random=~1|schoolid/classid,data=sii, na.action=na.exclude) #short
model1 <- lme(mathgain~sex+ses+housepov+yearstea+mathkind+sex:ses+sex:housepov+sex:yearstea+sex:mathkind,random=~1|schoolid/classid,data=sii, na.action=na.exclude) #long  

summary(model)

## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC   logLik
##   11408.22 11474.17 -5691.11
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.195376
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:     8.51822 27.23456
## 
## Fixed effects: mathgain ~ sex * (ses + housepov + yearstea + mathkind) 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)         278.63790 14.572595 871  19.120679  0.0000
## sexfemales          -11.20361 20.032320 871  -0.559277  0.5761
## ses                   5.70738  1.703249 871   3.350879  0.0008
## housepov             -5.32490 11.796167 105  -0.451410  0.6526
## yearstea              0.22939  0.141292 204   1.623505  0.1060
## mathkind             -0.47700  0.029986 871 -15.907333  0.0000
## sexfemales:ses       -0.05452  2.351317 871  -0.023187  0.9815
## sexfemales:housepov -24.27888 12.855978 871  -1.888529  0.0593
## sexfemales:yearstea  -0.34430  0.175657 871  -1.960051  0.0503
## sexfemales:mathkind   0.03950  0.041685 871   0.947695  0.3435
##  Correlation: 
##                     (Intr) sxfmls ses    houspv yearst mthknd sxfmls:s sxfmls:h
## sexfemales          -0.672                                                     
## ses                  0.214 -0.170                                              
## housepov            -0.241  0.118  0.096                                       
## yearstea            -0.165  0.093 -0.053  0.097                                
## mathkind            -0.974  0.666 -0.231  0.085  0.032                         
## sexfemales:ses      -0.155  0.216 -0.687 -0.073  0.041  0.169                  
## sexfemales:housepov  0.163 -0.221 -0.084 -0.539 -0.075 -0.080  0.171           
## sexfemales:yearstea  0.106 -0.123  0.039 -0.061 -0.615 -0.024 -0.020    0.151  
## sexfemales:mathkind  0.657 -0.982  0.181 -0.052 -0.018 -0.675 -0.239    0.093  
##                     sxfmls:y
## sexfemales                  
## ses                         
## housepov                    
## yearstea                    
## mathkind                    
## sexfemales:ses              
## sexfemales:housepov         
## sexfemales:yearstea         
## sexfemales:mathkind -0.002  
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.89285360 -0.62785115 -0.02861974  0.55658062  4.29948404 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

6

Exercise

Hip Data:

Looking at the profiles from the hipfracture dataset, it might be of interest to include a quadratic time effect. Define the quadratic time and add it to the model.

Write down and fit the model. Is the quadratic time effect needed here?

Solution

hip$TIME2<-hip$TIME^2
hip %>% 
   slice(1:6) %>% 
   kbl() %>% 
   kable_paper()

IDNR	NEURO	AGE	MMSE	TIME	TIME2
1	not neuro-psychiatric	74	28	1	1
1	not neuro-psychiatric	74	28	3	9
1	not neuro-psychiatric	74	28	5	25
1	not neuro-psychiatric	74	26	8	64
1	not neuro-psychiatric	74	25	12	144
2	not neuro-psychiatric	67	25	1	1

\(_{Yij} = (beta_{0}+b_{i}) + beta_{1}*TIME + beta_{2}*TIME2 + \epsilon_{eij}\)

Fitting the model:

model<- lme(MMSE~TIME+TIME2,random=~1|IDNR,data=hip, na.action=na.exclude)
summary(model)

## Linear mixed-effects model fit by REML
##  Data: hip 
##       AIC     BIC    logLik
##   1486.63 1504.51 -738.3152
## 
## Random effects:
##  Formula: ~1 | IDNR
##         (Intercept) Residual
## StdDev:    7.558976 2.501436
## 
## Fixed effects: MMSE ~ TIME + TIME2 
##                 Value Std.Error  DF   t-value p-value
## (Intercept) 18.861017 1.0627175 205 17.747912  0.0000
## TIME         0.406287 0.1635060 205  2.484842  0.0138
## TIME2       -0.017868 0.0125315 205 -1.425834  0.1554
##  Correlation: 
##       (Intr) TIME  
## TIME  -0.347       
## TIME2  0.301 -0.963
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.06437286 -0.43322995  0.04405184  0.47938518  2.82910055 
## 
## Number of Observations: 267
## Number of Groups: 60

7

Exercise

Hip data

Include the age of the patient as an extra covariate in the model. Is there a different effect of age in both neuro groups?

Solution

No, because the interaction term is not significant:

model<- lme(MMSE~TIME+NEURO+AGE+AGE:NEURO,random=~1|IDNR,data=hip, na.action=na.exclude)
summary(model)

## Linear mixed-effects model fit by REML
##  Data: hip 
##        AIC      BIC    logLik
##   1447.366 1472.345 -716.6832
## 
## Random effects:
##  Formula: ~1 | IDNR
##         (Intercept) Residual
## StdDev:    5.460354 2.508963
## 
## Fixed effects: MMSE ~ TIME + NEURO + AGE + AGE:NEURO 
##                               Value Std.Error  DF   t-value p-value
## (Intercept)                56.50126  7.962052 206  7.096320  0.0000
## TIME                        0.17633  0.043895 206  4.017058  0.0001
## NEUROneuro-psychiatric     -4.88675 17.718753  56 -0.275796  0.7837
## AGE                        -0.43988  0.101196  56 -4.346782  0.0001
## NEUROneuro-psychiatric:AGE -0.03616  0.222840  56 -0.162276  0.8717
##  Correlation: 
##                            (Intr) TIME   NEUROn- AGE   
## TIME                       -0.035                      
## NEUROneuro-psychiatric     -0.449 -0.006               
## AGE                        -0.993  0.007  0.446        
## NEUROneuro-psychiatric:AGE  0.451  0.007 -0.996  -0.454
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.21901376 -0.47124584  0.01957349  0.46873450  2.80219031 
## 
## Number of Observations: 267
## Number of Groups: 60

However, age is a significant fixed effect:

model<- lme(MMSE~TIME+NEURO+AGE,random=~1|IDNR,data=hip, na.action=na.exclude)
summary(model)

## Linear mixed-effects model fit by REML
##  Data: hip 
##       AIC      BIC    logLik
##   1444.22 1465.652 -716.1098
## 
## Random effects:
##  Formula: ~1 | IDNR
##         (Intercept) Residual
## StdDev:    5.411095 2.508976
## 
## Fixed effects: MMSE ~ TIME + NEURO + AGE 
##                           Value Std.Error  DF   t-value p-value
## (Intercept)            57.08459  7.046333 206  8.101319   0e+00
## TIME                    0.17627  0.043893 206  4.015965   1e-04
## NEUROneuro-psychiatric -7.75201  1.523968  57 -5.086727   0e+00
## AGE                    -0.44734  0.089389  57 -5.004366   0e+00
##  Correlation: 
##                        (Intr) TIME   NEURO-
## TIME                   -0.044              
## NEUROneuro-psychiatric  0.006  0.008       
## AGE                    -0.992  0.011 -0.079
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.22127751 -0.47530674  0.01848392  0.47026467  2.80123379 
## 
## Number of Observations: 267
## Number of Groups: 60

anova.lme(model)

##             numDF denDF  F-value p-value
## (Intercept)     1   206 800.2194  <.0001
## TIME            1   206  16.9625   1e-04
## NEURO           1    57  30.2343  <.0001
## AGE             1    57  25.0437  <.0001

Exercises day 4

1

Videos on the theory:

Exercise

SII data

Start from the last model from day 3. Perform a backward model selection. What covariates are important in explaining the mathgain? Discuss the effects in the final model.

Solution

Stepwise backward elimination in the model with interactions between sex and other covaraites. The fullmodel:

model<-lme(mathgain~sex*(ses+housepov+yearstea+mathkind),random=~1|schoolid/classid,data=sii, na.action = na.omit)
summary(model)

## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC   logLik
##   11408.22 11474.17 -5691.11
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.195376
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:     8.51822 27.23456
## 
## Fixed effects: mathgain ~ sex * (ses + housepov + yearstea + mathkind) 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)         278.63790 14.572595 871  19.120679  0.0000
## sexfemales          -11.20361 20.032320 871  -0.559277  0.5761
## ses                   5.70738  1.703249 871   3.350879  0.0008
## housepov             -5.32490 11.796167 105  -0.451410  0.6526
## yearstea              0.22939  0.141292 204   1.623505  0.1060
## mathkind             -0.47700  0.029986 871 -15.907333  0.0000
## sexfemales:ses       -0.05452  2.351317 871  -0.023187  0.9815
## sexfemales:housepov -24.27888 12.855978 871  -1.888529  0.0593
## sexfemales:yearstea  -0.34430  0.175657 871  -1.960051  0.0503
## sexfemales:mathkind   0.03950  0.041685 871   0.947695  0.3435
##  Correlation: 
##                     (Intr) sxfmls ses    houspv yearst mthknd sxfmls:s sxfmls:h
## sexfemales          -0.672                                                     
## ses                  0.214 -0.170                                              
## housepov            -0.241  0.118  0.096                                       
## yearstea            -0.165  0.093 -0.053  0.097                                
## mathkind            -0.974  0.666 -0.231  0.085  0.032                         
## sexfemales:ses      -0.155  0.216 -0.687 -0.073  0.041  0.169                  
## sexfemales:housepov  0.163 -0.221 -0.084 -0.539 -0.075 -0.080  0.171           
## sexfemales:yearstea  0.106 -0.123  0.039 -0.061 -0.615 -0.024 -0.020    0.151  
## sexfemales:mathkind  0.657 -0.982  0.181 -0.052 -0.018 -0.675 -0.239    0.093  
##                     sxfmls:y
## sexfemales                  
## ses                         
## housepov                    
## yearstea                    
## mathkind                    
## sexfemales:ses              
## sexfemales:housepov         
## sexfemales:yearstea         
## sexfemales:mathkind -0.002  
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.89285360 -0.62785115 -0.02861974  0.55658062  4.29948404 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

Omit least significant term (starting with interation terms):

model2<-update(model,.~.-sex:ses )
summary(model2)

## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC    logLik
##   11409.77 11470.66 -5692.884
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.196347
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.517937 27.22176
## 
## Fixed effects: mathgain ~ sex + ses + housepov + yearstea + mathkind + sex:housepov +      sex:yearstea + sex:mathkind 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)         278.59085 14.390994 872  19.358694  0.0000
## sexfemales          -11.10523 19.550559 872  -0.568026  0.5702
## ses                   5.68033  1.237910 872   4.588646  0.0000
## housepov             -5.34830 11.761602 105  -0.454726  0.6502
## yearstea              0.22949  0.141126 204   1.626147  0.1055
## mathkind             -0.47689  0.029541 872 -16.143229  0.0000
## sexfemales:housepov -24.22817 12.661608 872  -1.913514  0.0560
## sexfemales:yearstea  -0.34437  0.175544 872  -1.961714  0.0501
## sexfemales:mathkind   0.03928  0.040454 872   0.970900  0.3319
##  Correlation: 
##                     (Intr) sxfmls ses    houspv yearst mthknd sxfmls:h sxfmls:y
## sexfemales          -0.662                                                     
## ses                  0.150 -0.031                                              
## housepov            -0.256  0.137  0.063                                       
## yearstea            -0.161  0.086 -0.035  0.100                                
## mathkind            -0.973  0.655 -0.160  0.099  0.025                         
## sexfemales:housepov  0.195 -0.268  0.047 -0.536 -0.083 -0.113                  
## sexfemales:yearstea  0.104 -0.122  0.034 -0.063 -0.614 -0.021  0.157           
## sexfemales:mathkind  0.647 -0.982  0.024 -0.072 -0.008 -0.664  0.140   -0.007  
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.89519431 -0.62812346 -0.02846769  0.55761536  4.29985590 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

Omit least significant term:

model3<-update(model2,.~.-sex:mathkind)
summary(model3)

## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC    logLik
##   11404.13 11459.96 -5691.066
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.205218
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.461796 27.23086
## 
## Fixed effects: mathgain ~ sex + ses + housepov + yearstea + mathkind + sex:housepov +      sex:yearstea 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)         269.55968 10.978084 873  24.554346  0.0000
## sexfemales            7.53737  3.724963 873   2.023474  0.0433
## ses                   5.65186  1.237601 873   4.566791  0.0000
## housepov             -4.52070 11.727911 105  -0.385465  0.7007
## yearstea              0.23077  0.141006 204   1.636594  0.1033
## mathkind             -0.45787  0.022102 873 -20.716396  0.0000
## sexfemales:housepov -25.97646 12.537727 873  -2.071864  0.0386
## sexfemales:yearstea  -0.34347  0.175560 873  -1.956448  0.0507
##  Correlation: 
##                     (Intr) sxfmls ses    houspv yearst mthknd sxfmls:h
## sexfemales          -0.187                                            
## ses                  0.176 -0.037                                     
## housepov            -0.276  0.351  0.065                              
## yearstea            -0.204  0.412 -0.035  0.100                       
## mathkind            -0.954  0.022 -0.193  0.068  0.026                
## sexfemales:housepov  0.137 -0.688  0.044 -0.533 -0.083 -0.026         
## sexfemales:yearstea  0.142 -0.674  0.035 -0.063 -0.615 -0.034  0.159  
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.88869525 -0.63454292 -0.02443226  0.56174303  4.31673108 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

Omit least significant term. Note : housepow must stay in model as main effect, as long as housepov is part of an interaction term:

model4<-update(model3,.~.-sex:yearstea)
summary(model4)

## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC    logLik
##   11404.31 11455.07 -5692.154
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.152796
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.587203 27.24878
## 
## Fixed effects: mathgain ~ sex + ses + housepov + yearstea + mathkind + sex:housepov 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)         272.57162 10.877696 874  25.057844  0.0000
## sexfemales            2.61081  2.754110 874   0.947969  0.3434
## ses                   5.73408  1.238130 874   4.631239  0.0000
## housepov             -5.96054 11.707259 105  -0.509132  0.6117
## yearstea              0.06115  0.111576 204   0.548012  0.5843
## mathkind             -0.45924  0.022113 874 -20.768311  0.0000
## sexfemales:housepov -22.01667 12.390680 874  -1.776874  0.0759
##  Correlation: 
##                     (Intr) sxfmls ses    houspv yearst mthknd
## sexfemales          -0.124                                   
## ses                  0.173 -0.018                            
## housepov            -0.270  0.418  0.068                     
## yearstea            -0.150 -0.004 -0.017  0.077              
## mathkind            -0.959 -0.001 -0.192  0.066  0.007       
## sexfemales:housepov  0.118 -0.797  0.039 -0.531  0.019 -0.021
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.86847848 -0.62029105 -0.03509731  0.56863270  4.30389815 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

Now, main effect of yearstea can be removed, as yearstea is no longer part of an interaction term:

model5<-update(model4,.~.-yearstea)
summary(model5)

## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC   logLik
##   11400.06 11445.75 -5691.03
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.184765
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.552127 27.24273
## 
## Fixed effects: mathgain ~ sex + ses + housepov + mathkind + sex:housepov 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)         273.49208 10.752582 874  25.435015  0.0000
## sexfemales            2.62108  2.753277 874   0.951986  0.3414
## ses                   5.74628  1.237654 874   4.642884  0.0000
## housepov             -6.46999 11.678182 105  -0.554024  0.5807
## mathkind             -0.45938  0.022107 874 -20.779831  0.0000
## sexfemales:housepov -22.16862 12.384934 874  -1.789966  0.0738
##  Correlation: 
##                     (Intr) sxfmls ses    houspv mthknd
## sexfemales          -0.126                            
## ses                  0.172 -0.018                     
## housepov            -0.262  0.419  0.069              
## mathkind            -0.969 -0.001 -0.192  0.066       
## sexfemales:housepov  0.122 -0.797  0.039 -0.534 -0.021
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -5.8763441 -0.6230294 -0.0302178  0.5707610  4.3001268 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

….

model6<-update(model5,.~.-sex:housepov)
summary(model6)

## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC   logLik
##   11408.12 11448.74 -5696.06
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.061953
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.719333 27.25296
## 
## Fixed effects: mathgain ~ sex + ses + housepov + mathkind 
##                 Value Std.Error  DF    t-value p-value
## (Intercept) 275.73178 10.677641 875  25.823287  0.0000
## sexfemales   -1.30972  1.664641 875  -0.786789  0.4316
## ses           5.83023  1.237521 875   4.711221  0.0000
## housepov    -17.53600  9.848187 105  -1.780632  0.0779
## mathkind     -0.46003  0.022116 875 -20.800621  0.0000
##  Correlation: 
##            (Intr) sxfmls ses    houspv
## sexfemales -0.048                     
## ses         0.169  0.021              
## housepov   -0.234 -0.012  0.107       
## mathkind   -0.974 -0.029 -0.191  0.065
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.78800233 -0.62708538 -0.01617522  0.54849256  4.29842019 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

….

model7<-update(model6,.~.-sex)
summary(model7)

## Linear mixed-effects model fit by REML
##  Data: sii 
##       AIC      BIC    logLik
##   11409.6 11445.14 -5697.798
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    9.061356
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.685155 27.25499
## 
## Fixed effects: mathgain ~ ses + housepov + mathkind 
##                 Value Std.Error  DF    t-value p-value
## (Intercept) 275.32608 10.663464 876  25.819572  0.0000
## ses           5.85139  1.237085 876   4.729980  0.0000
## housepov    -17.62905  9.841465 105  -1.791303  0.0761
## mathkind     -0.46054  0.022104 876 -20.835602  0.0000
##  Correlation: 
##          (Intr) ses    houspv
## ses       0.171              
## housepov -0.235  0.107       
## mathkind -0.977 -0.191  0.065
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.76624565 -0.63606866 -0.02284166  0.55176293  4.27668545 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

…. final model

model8<-update(model7,.~.-housepov)
summary(model8)

## Linear mixed-effects model fit by REML
##  Data: sii 
##        AIC      BIC    logLik
##   11417.21 11447.68 -5702.604
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept)
## StdDev:    8.994781
## 
##  Formula: ~1 | classid %in% schoolid
##         (Intercept) Residual
## StdDev:    8.805618 27.26732
## 
## Fixed effects: mathgain ~ ses + mathkind 
##                 Value Std.Error  DF    t-value p-value
## (Intercept) 270.78470 10.371906 876  26.107515       0
## ses           6.08792  1.230806 876   4.946288       0
## mathkind     -0.45788  0.022074 876 -20.742919       0
##  Correlation: 
##          (Intr) ses   
## ses       0.203       
## mathkind -0.992 -0.199
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.81406721 -0.63030057 -0.02714651  0.56205803  4.27064533 
## 
## Number of Observations: 1190
## Number of Groups: 
##              schoolid classid %in% schoolid 
##                   107                   312

2

Exercise

Hip data

Fit the model with only main effects, no interaction effects. Obtain the Empirical-Bayes estimates for the patients. Present them in a histogram.

Solution

Fit the model

model<-lme(MMSE ~ NEURO + TIME +AGE , random=~1|IDNR,data=hip, na.action = na.omit)

Get the Bayes estimates and plot histogram

BE <- data.frame(ranef(model)[1])
names(BE) <- "Intercept"
BE$y <- dnorm(BE$Intercept)

ggplot(BE, aes(x = Intercept)) + geom_histogram()

3

Exercise

Hip data

Make a dataset with all predictions for the 60 patients.

Solution

Get predicted mean (given covariates NEURO, AGE and TIME):

predMean<-predict(model,level=0)

Get prediction for the individual (given covariates NEURO, AGE and TIME, plus individual random intercept):

predIndiv<-predict(model,level=1)

df <- data.frame(cbind(cbind(predMean,predIndiv)))
df %>% 
   slice(1:10) %>% 
   kbl() %>% 
   kable_paper()

	predMean	predIndiv
X1	24.15806	26.07161
X1.1	24.51061	26.42416
X1.2	24.86315	26.77670
X1.3	25.39197	27.30552
X1.4	26.09706	28.01061
X2	27.28941	24.57311
X2.1	27.64196	24.92566
X2.2	27.99450	25.27820
X2.3	28.52332	25.80702
X3	27.28941	27.18284

Case Study

1

The toenail dataset is the result of a randomized, double-blind, parallel group, multicenter study for the comparison of two oral treatments (0 and 1) for toenail dermatophyte onychomycosis (TDO). TDO is a common toenail infection, difficult to treat, and affecting more than 2 out of 100 persons. In total, 36 hospitals were randomized. All patients with an infection on the big toenail were randomized in 2 treatment groups and were followed during 12 weeks (3 months) of treatment and followed further up to a total of 12 months. Measurements were taken at baseline, every month during treatment, and every 3 months afterwards, resulting in a maximum of 7 measurements per subject. The measure of interest is the unaffected nail length in mm (‘nailbig’). Also, the age of the patient at the beginning of the study is denoted.

Exercise

First, make a descriptive plot to see how the mean unaffected nail length evolves on average over time between the two treatment groups. Comment on the graph.

Solution

Import and clean data:

nail <- import("toenail.txt")
nail$treat <- factor(nail$treat, levels = c(0,1), labels = c("placebo", "active treatment"))
nail$id <- factor(nail$id)
nail$center <- factor(nail$center)
nail$timeNum <- nail$time
nail$time <- factor(nail$time, levels = c(0,1,2,3,6,9,12), labels = c("0","1","2","3","6","9","12")) # because not linearly measured

Generate the graph:

ggplot(nail, aes(x = time, y = nailbig, color = treat)) +
    stat_summary(fun = mean, geom = "point", size = 3) +
   geom_jitter(width = 0.3, size = 0.1) + labs(x = "time in months")

In general, the unaffeted nail length is increasing over time in both groups. The increase seems not to be linear with time. The increase seems to be higher in the active group. The variation between subjects is quite large.

2

Exercise

For the model we will first choose the random structure, based on a fixed structure with time and the interaction with time and treat, ignoring the covariate age (time as continuous variable).

Fit a model, taking into account both levels of clustering.
Interpret the variance components. Is this something you expected in this case?
Do you think it is useful to take into account both levels of clustering? (No random slope, but 2 intercepts center/id, time continuous.)

Solution

Fitting the model: treat is not included because it is a RCT

model <- lme(nailbig ~ time + time:treat, random = ~1 | center/id, data = nail, na.action = na.omit)
summary(model)

## Linear mixed-effects model fit by REML
##  Data: nail 
##        AIC      BIC    logLik
##   9274.068 9367.866 -4620.034
## 
## Random effects:
##  Formula: ~1 | center
##          (Intercept)
## StdDev: 0.0008601381
## 
##  Formula: ~1 | id %in% center
##         (Intercept) Residual
## StdDev:    2.560856 2.492617
## 
## Fixed effects: nailbig ~ time + time:treat 
##                                 Value Std.Error   DF   t-value p-value
## (Intercept)                  1.693921 0.2934113 1542  5.773197  0.0000
## time1                        0.896659 0.2943568 1542  3.046163  0.0024
## time2                        1.987248 0.2972278 1542  6.685941  0.0000
## time3                        3.504213 0.3022572 1542 11.593481  0.0000
## time6                        5.621216 0.3075350 1542 18.278298  0.0000
## time9                        6.224012 0.3207449 1542 19.404868  0.0000
## time12                       6.324421 0.3216902 1542 19.659971  0.0000
## time0:treatactive treatment  0.393414 0.4138003 1542  0.950733  0.3419
## time1:treatactive treatment  0.289620 0.4175257 1542  0.693659  0.4880
## time2:treatactive treatment  0.309150 0.4209154 1542  0.734470  0.4628
## time3:treatactive treatment  0.202418 0.4271594 1542  0.473870  0.6357
## time6:treatactive treatment  0.230427 0.4351366 1542  0.529551  0.5965
## time9:treatactive treatment  0.836164 0.4499432 1542  1.858376  0.0633
## time12:treatactive treatment 0.891608 0.4517271 1542  1.973777  0.0486
##  Correlation: 
##                              (Intr) time1  time2  time3  time6  time9  time12
## time1                        -0.486                                          
## time2                        -0.481  0.490                                   
## time3                        -0.473  0.482  0.482                            
## time6                        -0.465  0.474  0.474  0.473                     
## time9                        -0.448  0.454  0.454  0.453  0.451              
## time12                       -0.446  0.453  0.453  0.452  0.449  0.445       
## time0:treatactive treatment  -0.709  0.345  0.341  0.336  0.330  0.317  0.316
## time1:treatactive treatment  -0.360 -0.363 -0.007 -0.007 -0.007 -0.006 -0.006
## time2:treatactive treatment  -0.357 -0.007 -0.371 -0.010 -0.010 -0.009 -0.009
## time3:treatactive treatment  -0.352 -0.007 -0.011 -0.382 -0.015 -0.013 -0.013
## time6:treatactive treatment  -0.345 -0.007 -0.010 -0.015 -0.393 -0.017 -0.017
## time9:treatactive treatment  -0.333 -0.007 -0.010 -0.014 -0.018 -0.421 -0.026
## time12:treatactive treatment -0.332 -0.007 -0.010 -0.014 -0.018 -0.026 -0.422
##                              tm0:tt tm1:tt tm2:tt tm3:tt tm6:tt tm9:tt
## time1                                                                 
## time2                                                                 
## time3                                                                 
## time6                                                                 
## time9                                                                 
## time12                                                                
## time0:treatactive treatment                                           
## time1:treatactive treatment   0.508                                   
## time2:treatactive treatment   0.504  0.506                            
## time3:treatactive treatment   0.497  0.498  0.498                     
## time6:treatactive treatment   0.488  0.489  0.489  0.488              
## time9:treatactive treatment   0.471  0.472  0.472  0.471  0.468       
## time12:treatactive treatment  0.469  0.470  0.470  0.469  0.466  0.461
## 
## Standardized Within-Group Residuals:
##           Min            Q1           Med            Q3           Max 
## -3.7888417338 -0.5300417022  0.0001921644  0.5666066228  5.1732155056 
## 
## Number of Observations: 1854
## Number of Groups: 
##         center id %in% center 
##             26            299

anova.lme(model)

##             numDF denDF  F-value p-value
## (Intercept)     1  1542 980.5896  <.0001
## time            6  1542 291.9657  <.0001
## time:treat      7  1542   0.9105  0.4972

Variance components

VarCorr(model)

##             Variance     StdDev      
## center =    pdLogChol(1)             
## (Intercept) 7.398375e-07 0.0008601381
## id =        pdLogChol(1)             
## (Intercept) 6.557983e+00 2.5608559349
## Residual    6.213138e+00 2.4926167123

Yes, the variance between subjects is much larger than the variance between hospitals. This makes sense.

The fraction explained by the center is very low. there is no difference between the models. I think one can do the analysis without considering for center clustering.

model1 <- lme(nailbig ~ time + time:treat, random = ~1 | id, data = nail, na.action = na.omit)
anova(model, model1)

##        Model df      AIC      BIC    logLik   Test   L.Ratio p-value
## model      1 17 9274.068 9367.866 -4620.034                         
## model1     2 16 9271.112 9359.393 -4619.556 1 vs 2 0.9555801  0.3283

3

Exercise

Calculate the variances of the unaffected nail length for the different time points (in both treatment groups separately).
Do you think a random slope might be of interest here? If yes, fit the linear mixed model with both random intercept and slope.
test if the random slope model is more likely than the random intercept model by using

Solution

Calculate the variances of the unaffected nail length for the different time points (in both treatment groups separately):

nail_var <- nail %>% 
   group_by(timeNum, treat) %>% 
   summarise(Variance = var(nailbig))

nail_var %>% 
   kbl() %>% 
   kable_paper()

timeNum	treat	Variance
0	placebo	5.640835
0	active treatment	8.572255
1	placebo	6.440780
1	active treatment	8.449233
2	placebo	7.922346
2	active treatment	9.314727
3	placebo	10.898467
3	active treatment	10.144576
6	placebo	15.517773
6	active treatment	14.188550
9	placebo	20.460068
9	active treatment	20.571358
12	placebo	27.233707
12	active treatment	25.052586

ggplot(nail_var, aes(x = timeNum, y = Variance, fill = treat)) + geom_bar(stat = "identity") + labs(y = "total variance", x = "time")

The variance increases with time, but about at the same level in both groups.

linear mixed model with both random intercept and slope

model2 <- lme(nailbig ~ timeNum + timeNum:treat, random = ~1 + timeNum | id, data = nail, na.action = na.exclude)
summary(model2)

## Linear mixed-effects model fit by REML
##  Data: nail 
##        AIC      BIC   logLik
##   8723.239 8761.904 -4354.62
## 
## Random effects:
##  Formula: ~1 + timeNum | id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 2.7133258 (Intr)
## timeNum     0.4752377 -0.389
## Residual    1.7750636       
## 
## Fixed effects: nailbig ~ timeNum + timeNum:treat 
##                                   Value  Std.Error   DF   t-value p-value
## (Intercept)                   2.5970699 0.16917771 1554 15.351135  0.0000
## timeNum                       0.5820213 0.04303633 1554 13.523953  0.0000
## timeNum:treatactive treatment 0.0571236 0.05687196 1554  1.004425  0.3153
##  Correlation: 
##                               (Intr) timeNm
## timeNum                       -0.312       
## timeNum:treatactive treatment  0.002 -0.684
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.46797352 -0.51623252 -0.06386359  0.45542574  5.65730820 
## 
## Number of Observations: 1854
## Number of Groups: 298

anova.lme(model2)

##               numDF denDF  F-value p-value
## (Intercept)       1  1554 683.0674  <.0001
## timeNum           1  1554 378.9960  <.0001
## timeNum:treat     1  1554   1.0089  0.3153

Remark: R crashes (in my case) when not using time as numeric varible…

Compare model with and without random slope.

anova(model1, model2)

##        Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## model1     1 16 9271.112 9359.393 -4619.556                        
## model2     2  7 8723.239 8761.904 -4354.620 1 vs 2 529.8732  <.0001

The random slope improves the model…

4

Exercise

We will now look at the fixed structure, using the random structure that was chosen.

We will start with ignoring the age effect. Formally test if there is an evolution over time and if it is equal in both treatment groups. What fixed effects structure would you use based on this result? Fit this model. What does this mean?
Going further with the model you found above, incorporate age as an extra covariate. Does it have an effect?

Solution

Test if there is an evolution over time and if it is equal in both treatment groups

anova.lme(model2)

##               numDF denDF  F-value p-value
## (Intercept)       1  1554 683.0674  <.0001
## timeNum           1  1554 378.9960  <.0001
## timeNum:treat     1  1554   1.0089  0.3153

There is a time effect, but the interaction is not significant, thus, no different evolution between groups.

Incorporate age as an extra covariate (the interaction is omitted because not relevant)

model3 <- lme(nailbig ~ timeNum + age, ~1 + timeNum | id, data = nail, na.action = na.exclude)
summary(model3)

## Linear mixed-effects model fit by REML
##  Data: nail 
##        AIC      BIC   logLik
##   8725.119 8763.784 -4355.56
## 
## Random effects:
##  Formula: ~1 + timeNum | id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 2.7016795 (Intr)
## timeNum     0.4750058 -0.378
## Residual    1.7748809       
## 
## Fixed effects: nailbig ~ timeNum + age 
##                 Value Std.Error   DF   t-value p-value
## (Intercept) 1.1529033 1.3497224 1554  0.854178  0.3931
## timeNum     0.6109973 0.0314125 1554 19.450765  0.0000
## age         0.0225356 0.0208990 1554  1.078312  0.2811
##  Correlation: 
##         (Intr) timeNm
## timeNum -0.041       
## age     -0.992 -0.011
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.4914541 -0.5200767 -0.0587634  0.4530281  5.6613576 
## 
## Number of Observations: 1854
## Number of Groups: 298

anova.lme(model3)

##             numDF denDF  F-value p-value
## (Intercept)     1  1554 670.7432  <.0001
## timeNum         1  1554 378.8407  <.0001
## age             1  1554   1.1628  0.2811

No main effect of age.

Exercises on mixed linear models

Nathanael Lutz

Data used in the exercises

The SII data

The hip fracture Data

Packages used for the exercises

Exercises day 1

1

Exercise

Solution

2

Exercise

Solution

3

Exercise

Solution

4

Exercise

Solution

Exercises day 2

1

Exercise

Solution

2

Exercise

Solution

3

Exercise

Solution

Exercises day 3

1

Exercise

Solution

2

Exercise

Solution

3

Exercise

Solution

4

Exercise

Solution

5

Exercise

Solution

6

Exercise

Solution

7

Exercise

Solution

Exercises day 4

1

Exercise

Solution

2

Exercise

Solution

3

Exercise

Solution

Case Study

1

Exercise

Solution

2

Exercise

Solution

3

Exercise

Solution

4

Exercise

Solution