Replicate the analysis reported in the arrhythmogenic dose of epinephrine example.
The arrhythmogenic dose of epinephrine (ADE) is determined by infusing epinephrine into an animal until arrhythmia criterion is reached. The arrhythmia criterion was the occurrence of four intermittent or continuous premature ventricular contractions. Once the criterion has been reached, the infusion required to produce it is recorded. The ADE is then calculated by multiplying the duration of the infusion by the infusion rate. The data give the ADE for six dogs measured at ten consecutive times.
Source: Shoukri, M.M., & Chaudhary, M.A. (2007). Analysis of Correlated Data with SAS and R. p. 218.
#
pacman::p_load(tidyverse, nlme)
#
dta <- read.csv("Data/ade.csv")
#
str(dta)
'data.frame': 60 obs. of 3 variables:
$ Dog : chr "D11" "D11" "D11" "D11" ...
$ Hour: num 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ...
$ ADE : num 5.7 4.7 4.8 4.9 3.88 3.51 2.8 2.6 2.5 2.5 ...
head(dta)
Dog Hour ADE
1 D11 0.0 5.70
2 D11 0.5 4.70
3 D11 1.0 4.80
4 D11 1.5 4.90
5 D11 2.0 3.88
6 D11 2.5 3.51
#
dta <- mutate(dta, Hour_f = factor(Hour))
theme_set(theme_bw())
# means with error bars
ggplot(dta, aes(Hour, ADE)) +
geom_point(alpha = .3) +
stat_summary(fun.y = mean, geom = "line") +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
labs(x = "Time (in hours)", y = "ADE")
summary(m1 <- gls(ADE ~ Hour,
weights = varIdent(form = ~ 1 | Hour_f),
data = dta))
Generalized least squares fit by REML
Model: ADE ~ Hour
Data: dta
AIC BIC logLik
196.65 221.38 -86.326
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Hour_f
Parameter estimates:
0 0.5 1 1.5 2 2.5 3 3.5
1.000000 0.554474 0.641329 0.519215 0.431879 0.082458 0.109015 0.170461
4 4.5
0.217706 0.111331
Coefficients:
Value Std.Error t-value p-value
(Intercept) 5.4356 0.274414 19.8078 0
Hour -0.7019 0.083864 -8.3692 0
Correlation:
(Intr)
Hour -0.964
Standardized residuals:
Min Q1 Med Q3 Max
-1.38547 -0.62729 -0.18267 0.58240 2.35547
Residual standard error: 3.5086
Degrees of freedom: 60 total; 58 residual
Time effect
summary(m2 <- gls(ADE ~ Hour,
weights = varIdent(form = ~ 1 | Hour_f),
correlation = corAR1(form = ~ Hour | Dog),
data = dta))
Generalized least squares fit by REML
Model: ADE ~ Hour
Data: dta
AIC BIC logLik
168.59 195.38 -71.296
Correlation Structure: ARMA(1,0)
Formula: ~Hour | Dog
Parameter estimate(s):
Phi1
0.75535
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Hour_f
Parameter estimates:
0 0.5 1 1.5 2 2.5 3 3.5
1.000000 0.328032 0.371947 0.285906 0.340188 0.125715 0.098970 0.168949
4 4.5
0.222191 0.099784
Coefficients:
Value Std.Error t-value p-value
(Intercept) 4.7498 0.39991 11.8774 0
Hour -0.5992 0.10967 -5.4634 0
Correlation:
(Intr)
Hour -0.94
Standardized residuals:
Min Q1 Med Q3 Max
-1.435667 -0.095305 0.292164 1.342058 5.066346
Residual standard error: 3.2785
Degrees of freedom: 60 total; 58 residual
Time effect
summary(m3 <- gls(ADE ~ Hour,
weights = varIdent(form = ~ 1 | Hour_f),
correlation = corCompSymm(form = ~ 1 | Dog),
data = dta))
Generalized least squares fit by REML
Model: ADE ~ Hour
Data: dta
AIC BIC logLik
161.61 188.39 -67.804
Correlation Structure: Compound symmetry
Formula: ~1 | Dog
Parameter estimate(s):
Rho
0.72179
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Hour_f
Parameter estimates:
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
1.00000 0.51005 0.61404 0.52281 0.44793 0.18527 0.11295 0.16117 0.23097 0.11314
Coefficients:
Value Std.Error t-value p-value
(Intercept) 4.4950 0.307665 14.6100 0
Hour -0.5292 0.069813 -7.5799 0
Correlation:
(Intr)
Hour -0.957
Standardized residuals:
Min Q1 Med Q3 Max
-0.665284 -0.023909 0.264057 0.890736 2.444602
Residual standard error: 3.833
Degrees of freedom: 60 total; 58 residual
Time effect
summary(m3a <- gls(ADE ~ Hour,
weights = varExp(form = ~ Hour),
correlation = corCompSymm(form = ~ 1 | Dog),
data = dta))
Generalized least squares fit by REML
Model: ADE ~ Hour
Data: dta
AIC BIC logLik
163.97 174.27 -76.985
Correlation Structure: Compound symmetry
Formula: ~1 | Dog
Parameter estimate(s):
Rho
0.61459
Variance function:
Structure: Exponential of variance covariate
Formula: ~Hour
Parameter estimates:
expon
-0.42045
Coefficients:
Value Std.Error t-value p-value
(Intercept) 5.3028 0.67774 7.8243 0
Hour -0.6742 0.13462 -5.0078 0
Correlation:
(Intr)
Hour -0.982
Standardized residuals:
Min Q1 Med Q3 Max
-1.10177 -0.39452 -0.10310 0.37117 2.72638
Residual standard error: 3.08
Degrees of freedom: 60 total; 58 residual
Time effect
summary(m3b <- lme(ADE ~ Hour, random = ~ 1 | Dog,
weights = varIdent(form = ~ 1 | Hour_f),
correlation = corCompSymm(form = ~ 1 | Dog),
data = dta))
Linear mixed-effects model fit by REML
Data: dta
AIC BIC logLik
153.25 182.09 -62.623
Random effects:
Formula: ~1 | Dog
(Intercept) Residual
StdDev: 0.36546 3.9282
Correlation Structure: Compound symmetry
Formula: ~1 | Dog
Parameter estimate(s):
Rho
0.84577
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Hour_f
Parameter estimates:
0 0.5 1 1.5 2 2.5 3 3.5
1.000000 0.546183 0.637856 0.539979 0.471999 0.160643 0.063004 0.216035
4 4.5
0.253378 0.182688
Fixed effects: ADE ~ Hour
Value Std.Error DF t-value p-value
(Intercept) 4.8115 0.234641 53 20.506 0
Hour -0.6126 0.053787 53 -11.390 0
Correlation:
(Intr)
Hour -0.745
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-0.92234 -0.12995 0.16459 0.69954 2.40956
Number of Observations: 60
Number of Groups: 6
Time effect
anova(m1, m2)
Model df AIC BIC logLik Test L.Ratio p-value
m1 1 12 196.65 221.38 -86.326
m2 2 13 168.59 195.38 -71.296 1 vs 2 30.061 <.0001
Simple model may not fit here.
anova(m1, m3, m3a)
Model df AIC BIC logLik Test L.Ratio p-value
m1 1 12 196.65 221.38 -86.326
m3 2 13 161.61 188.39 -67.804 1 vs 2 37.045 <.0001
m3a 3 5 163.97 174.27 -76.985 2 vs 3 18.363 0.0187
Compound Symmetry may be the best fit model than Unstructured model. Also, the design is a repeated measure design.
plot(m3, resid(., type = "pearson") ~ fitted(.) | Dog,
layout = c(6, 1), aspect = 2, abline = 0, pch = 20, cex = .8,
xlab = "Ftted values", ylab = "Pearson residuals")
Replicate the analysis reported in the smoker example.
In the Netherlands, residents in the rural area of Vlagtwedde in the north-east and residents in the urban, industrial area of Vlaardingen in the south-west, were recruited to participate in a study on chronic obstructive lung diseases. The participants, initially aged 15-44, were surveyed approximately every 3 years for up to 21 years. At each survey, information on respiratory symptoms and smoking status was collected by questionnaire and spirometry was performed. Pulmonary function was determined by spirometry and a measure of forced expiratory volume (FEV1) was obtained every three years for the first 15 years of the study, and also at year 19. The number of repeated measurements of FEV1 on each subject varied from 1 to 7.
Here the dataset is comprised of a sub-sample of 133 residents aged 36 or older at their entry into the study and whose smoking status did not change over the 19 years of follow-up. Each participant was either a current or former smoker. Current smoking was defined as smoking at least one cigarette per day.
Source: van der Lende, R., Kok, T.J., Peset, R., Quanjer, P.H., Schouten, J.P. & Orie, N.G.M. (1981). Decreases in VC and FEV1 with time: Indicators for effects of smoking and air pollution. Bulletin of European Physiopathology and Respiration, 17, 775-792.
#
pacman::p_load(tidyverse, nlme, car)
#
dta <- read.csv("Data/fev_smoker.csv", h = T)
#
str(dta)
'data.frame': 771 obs. of 4 variables:
$ ID : chr "P1" "P1" "P1" "P1" ...
$ Smoker: chr "Former" "Former" "Former" "Former" ...
$ Time : int 0 3 6 9 15 19 0 3 6 9 ...
$ FEV1 : num 3.4 3.4 3.45 3.2 2.95 2.4 3.1 3.15 3.5 2.95 ...
head(dta)
ID Smoker Time FEV1
1 P1 Former 0 3.40
2 P1 Former 3 3.40
3 P1 Former 6 3.45
4 P1 Former 9 3.20
5 P1 Former 15 2.95
6 P1 Former 19 2.40
dtaW <- dta %>%
spread(key = Time, value = FEV1)
names(dtaW)[3:9] <- paste0("Year", names(dtaW)[3:9])
scatterplotMatrix(~ Year0 + Year3 + Year6 + Year9 + Year12 + Year15 + Year19 | Smoker,
data = dtaW, smooth = F, by.group = TRUE,
ellipse = T, pch = c(1, 20),
col = c("firebrick", "forestgreen"))
aggregate(FEV1 ~ Time + Smoker, data = dta, mean, na.rm=T)
Time Smoker FEV1
1 0 Current 3.2293
2 3 Current 3.1195
3 6 Current 3.0915
4 9 Current 2.8707
5 12 Current 2.7979
6 15 Current 2.6810
7 19 Current 2.4977
8 0 Former 3.5191
9 3 Former 3.5778
10 6 Former 3.2646
11 9 Former 3.1717
12 12 Former 3.1362
13 15 Former 2.8700
14 19 Former 2.9079
aggregate(FEV1 ~ Time + Smoker, data = dta, var, na.rm=T)
Time Smoker FEV1
1 0 Current 0.34029
2 3 Current 0.30071
3 6 Current 0.31916
4 9 Current 0.31650
5 12 Current 0.29699
6 15 Current 0.26426
7 19 Current 0.25466
8 0 Former 0.23583
9 3 Former 0.39410
10 6 Former 0.38720
11 9 Former 0.33805
12 12 Former 0.37605
13 15 Former 0.37094
14 19 Former 0.41788
dtaW %>%
filter( Smoker == "Former") %>%
select( -(1:2)) %>%
var(na.rm = T)
Year0 Year3 Year6 Year9 Year12 Year15 Year19
Year0 0.36174 0.34426 0.35508 0.23582 0.23983 0.30918 0.21440
Year3 0.34426 0.35669 0.34792 0.24344 0.24556 0.30143 0.21413
Year6 0.35508 0.34792 0.39903 0.25833 0.25028 0.33200 0.26747
Year9 0.23582 0.24344 0.25833 0.19267 0.18278 0.23229 0.19534
Year12 0.23983 0.24556 0.25028 0.18278 0.20222 0.22011 0.14067
Year15 0.30918 0.30143 0.33200 0.23229 0.22011 0.30467 0.23510
Year19 0.21440 0.21413 0.26747 0.19534 0.14067 0.23510 0.35097
dtaW %>%
filter( Smoker == "Current") %>%
select( -(1:2)) %>%
var(na.rm = T)
Year0 Year3 Year6 Year9 Year12 Year15 Year19
Year0 0.20686 0.19587 0.15502 0.17409 0.15801 0.15272 0.15502
Year3 0.19587 0.24184 0.16763 0.17708 0.16478 0.18184 0.16750
Year6 0.15502 0.16763 0.16186 0.13285 0.14849 0.14792 0.15713
Year9 0.17409 0.17708 0.13285 0.24970 0.15014 0.15082 0.15334
Year12 0.15801 0.16478 0.14849 0.15014 0.18944 0.17100 0.15285
Year15 0.15272 0.18184 0.14792 0.15082 0.17100 0.18733 0.16541
Year19 0.15502 0.16750 0.15713 0.15334 0.15285 0.16541 0.18767
dta$Smoker <- factor(dta$Smoker, levels = c('Former', 'Current'))
#
theme_set(theme_bw())
# means with error bars
ggplot(dta, aes(Time, FEV1,
group = Smoker, shape = Smoker, linetype = Smoker)) +
stat_summary(fun = mean, geom = "line") +
stat_summary(fun = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
scale_shape_manual(values = c(1, 2)) +
labs(x = "Time (years since baseline)", y = "Mean FEV1 (liters)",
linetype = "Smoker", shape = "Smoker") +
theme(legend.justification = c(1, 1), legend.position = c(1, 1),
legend.background = element_rect(fill = "white", color = "black"))
dta <- mutate(dta, Time_f = as.factor(Time))
#
summary(m0 <- gls(FEV1 ~ Smoker*Time_f, data = dta))
Generalized least squares fit by REML
Model: FEV1 ~ Smoker * Time_f
Data: dta
AIC BIC logLik
1359.3 1428.7 -664.64
Coefficients:
Value Std.Error t-value p-value
(Intercept) 3.5191 0.11715 30.0405 0.0000
SmokerCurrent -0.2898 0.13205 -2.1949 0.0285
Time_f3 0.0586 0.15942 0.3679 0.7131
Time_f6 -0.2545 0.15810 -1.6097 0.1079
Time_f9 -0.3475 0.15571 -2.2315 0.0259
Time_f12 -0.3829 0.15687 -2.4411 0.0149
Time_f15 -0.6491 0.16393 -3.9597 0.0001
Time_f19 -0.6113 0.15810 -3.8664 0.0001
SmokerCurrent:Time_f3 -0.1685 0.18014 -0.9352 0.3500
SmokerCurrent:Time_f6 0.1167 0.17960 0.6495 0.5162
SmokerCurrent:Time_f9 -0.0111 0.17796 -0.0625 0.9502
SmokerCurrent:Time_f12 -0.0485 0.17949 -0.2700 0.7872
SmokerCurrent:Time_f15 0.1008 0.18685 0.5395 0.5897
SmokerCurrent:Time_f19 -0.1203 0.18159 -0.6626 0.5078
Correlation:
(Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
SmokerCurrent -0.887
Time_f3 -0.735 0.652
Time_f6 -0.741 0.657 0.544
Time_f9 -0.752 0.667 0.553 0.557
Time_f12 -0.747 0.663 0.549 0.553 0.562
Time_f15 -0.715 0.634 0.525 0.529 0.538 0.534
Time_f19 -0.741 0.657 0.544 0.549 0.557 0.553 0.529
SmokerCurrent:Time_f3 0.650 -0.733 -0.885 -0.482 -0.489 -0.486 -0.465 -0.482
SmokerCurrent:Time_f6 0.652 -0.735 -0.479 -0.880 -0.491 -0.487 -0.466 -0.483
SmokerCurrent:Time_f9 0.658 -0.742 -0.484 -0.488 -0.875 -0.492 -0.470 -0.488
SmokerCurrent:Time_f12 0.653 -0.736 -0.480 -0.484 -0.491 -0.874 -0.466 -0.484
SmokerCurrent:Time_f15 0.627 -0.707 -0.461 -0.465 -0.472 -0.468 -0.877 -0.465
SmokerCurrent:Time_f19 0.645 -0.727 -0.474 -0.478 -0.485 -0.482 -0.461 -0.871
SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
SmokerCurrent
Time_f3
Time_f6
Time_f9
Time_f12
Time_f15
Time_f19
SmokerCurrent:Time_f3
SmokerCurrent:Time_f6 0.539
SmokerCurrent:Time_f9 0.544 0.546
SmokerCurrent:Time_f12 0.539 0.541 0.546
SmokerCurrent:Time_f15 0.518 0.520 0.524 0.520
SmokerCurrent:Time_f19 0.533 0.535 0.540 0.535 0.514
Standardized residuals:
Min Q1 Med Q3 Max
-2.971581 -0.673775 0.050432 0.667481 3.139465
Residual standard error: 0.56181
Degrees of freedom: 771 total; 757 residual
summary(m1 <- gls(FEV1 ~ Smoker*Time_f,
weights = varIdent(form = ~ 1 | Time_f*Smoker),
data = dta))
Generalized least squares fit by REML
Model: FEV1 ~ Smoker * Time_f
Data: dta
AIC BIC logLik
1378 1507.6 -660.99
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Time_f * Smoker
Parameter estimates:
0*Former 3*Former 6*Former 9*Former 15*Former 19*Former 0*Current
1.0000 1.2927 1.2813 1.1972 1.2542 1.3311 1.2012
3*Current 6*Current 9*Current 12*Current 15*Current 19*Current 12*Former
1.1292 1.1633 1.1585 1.1222 1.0586 1.0392 1.2628
Coefficients:
Value Std.Error t-value p-value
(Intercept) 3.5191 0.10126 34.754 0.0000
SmokerCurrent -0.2898 0.11940 -2.427 0.0154
Time_f3 0.0586 0.15764 0.372 0.7100
Time_f6 -0.2545 0.15518 -1.640 0.1014
Time_f9 -0.3475 0.14670 -2.368 0.0181
Time_f12 -0.3829 0.15238 -2.513 0.0122
Time_f15 -0.6491 0.16034 -4.048 0.0001
Time_f19 -0.6113 0.15868 -3.852 0.0001
SmokerCurrent:Time_f3 -0.1685 0.17894 -0.941 0.3468
SmokerCurrent:Time_f6 0.1167 0.17797 0.655 0.5124
SmokerCurrent:Time_f9 -0.0111 0.17102 -0.065 0.9482
SmokerCurrent:Time_f12 -0.0485 0.17576 -0.276 0.7828
SmokerCurrent:Time_f15 0.1008 0.18257 0.552 0.5811
SmokerCurrent:Time_f19 -0.1203 0.18062 -0.666 0.5055
Correlation:
(Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
SmokerCurrent -0.848
Time_f3 -0.642 0.545
Time_f6 -0.653 0.553 0.419
Time_f9 -0.690 0.585 0.443 0.450
Time_f12 -0.665 0.564 0.427 0.434 0.459
Time_f15 -0.632 0.536 0.406 0.412 0.436 0.420
Time_f19 -0.638 0.541 0.410 0.416 0.440 0.424 0.403
SmokerCurrent:Time_f3 0.566 -0.667 -0.881 -0.369 -0.391 -0.376 -0.357 -0.361
SmokerCurrent:Time_f6 0.569 -0.671 -0.365 -0.872 -0.393 -0.378 -0.359 -0.363
SmokerCurrent:Time_f9 0.592 -0.698 -0.380 -0.386 -0.858 -0.393 -0.374 -0.378
SmokerCurrent:Time_f12 0.576 -0.679 -0.370 -0.376 -0.398 -0.867 -0.364 -0.368
SmokerCurrent:Time_f15 0.555 -0.654 -0.356 -0.362 -0.383 -0.369 -0.878 -0.354
SmokerCurrent:Time_f19 0.561 -0.661 -0.360 -0.366 -0.387 -0.373 -0.354 -0.879
SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
SmokerCurrent
Time_f3
Time_f6
Time_f9
Time_f12
Time_f15
Time_f19
SmokerCurrent:Time_f3
SmokerCurrent:Time_f6 0.448
SmokerCurrent:Time_f9 0.466 0.468
SmokerCurrent:Time_f12 0.453 0.456 0.474
SmokerCurrent:Time_f15 0.436 0.439 0.457 0.444
SmokerCurrent:Time_f19 0.441 0.444 0.462 0.449 0.432
Standardized residuals:
Min Q1 Med Q3 Max
-3.044453 -0.681392 0.048732 0.674197 2.876223
Residual standard error: 0.48562
Degrees of freedom: 771 total; 757 residual
summary(m2 <- gls(FEV1 ~ Smoker*Time_f,
weights = varExp(form = ~ Time),
data = dta))
Generalized least squares fit by REML
Model: FEV1 ~ Smoker * Time_f
Data: dta
AIC BIC logLik
1360.9 1435 -664.46
Variance function:
Structure: Exponential of variance covariate
Formula: ~Time
Parameter estimates:
expon
-0.0025288
Coefficients:
Value Std.Error t-value p-value
(Intercept) 3.5191 0.11976 29.3837 0.0000
SmokerCurrent -0.2898 0.13500 -2.1469 0.0321
Time_f3 0.0586 0.16241 0.3611 0.7181
Time_f6 -0.2545 0.16054 -1.5852 0.1133
Time_f9 -0.3475 0.15764 -2.2041 0.0278
Time_f12 -0.3829 0.15827 -2.4194 0.0158
Time_f15 -0.6491 0.16458 -3.9443 0.0001
Time_f19 -0.6113 0.15826 -3.8625 0.0001
SmokerCurrent:Time_f3 -0.1685 0.18352 -0.9180 0.3589
SmokerCurrent:Time_f6 0.1167 0.18235 0.6397 0.5225
SmokerCurrent:Time_f9 -0.0111 0.18011 -0.0618 0.9508
SmokerCurrent:Time_f12 -0.0485 0.18101 -0.2678 0.7889
SmokerCurrent:Time_f15 0.1008 0.18750 0.5376 0.5910
SmokerCurrent:Time_f19 -0.1203 0.18160 -0.6626 0.5078
Correlation:
(Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
SmokerCurrent -0.887
Time_f3 -0.737 0.654
Time_f6 -0.746 0.662 0.550
Time_f9 -0.760 0.674 0.560 0.567
Time_f12 -0.757 0.671 0.558 0.565 0.575
Time_f15 -0.728 0.646 0.537 0.543 0.553 0.551
Time_f19 -0.757 0.671 0.558 0.565 0.575 0.573 0.551
SmokerCurrent:Time_f3 0.653 -0.736 -0.885 -0.487 -0.496 -0.494 -0.475 -0.494
SmokerCurrent:Time_f6 0.657 -0.740 -0.484 -0.880 -0.499 -0.497 -0.478 -0.497
SmokerCurrent:Time_f9 0.665 -0.750 -0.490 -0.496 -0.875 -0.503 -0.484 -0.503
SmokerCurrent:Time_f12 0.662 -0.746 -0.488 -0.494 -0.503 -0.874 -0.481 -0.501
SmokerCurrent:Time_f15 0.639 -0.720 -0.471 -0.477 -0.485 -0.483 -0.878 -0.483
SmokerCurrent:Time_f19 0.660 -0.743 -0.486 -0.492 -0.501 -0.499 -0.480 -0.871
SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
SmokerCurrent
Time_f3
Time_f6
Time_f9
Time_f12
Time_f15
Time_f19
SmokerCurrent:Time_f3
SmokerCurrent:Time_f6 0.545
SmokerCurrent:Time_f9 0.551 0.555
SmokerCurrent:Time_f12 0.549 0.552 0.559
SmokerCurrent:Time_f15 0.530 0.533 0.540 0.537
SmokerCurrent:Time_f19 0.547 0.550 0.557 0.554 0.535
Standardized residuals:
Min Q1 Med Q3 Max
-2.928743 -0.666776 0.050448 0.667453 3.165435
Residual standard error: 0.57437
Degrees of freedom: 771 total; 757 residual
summary(m3 <- gls(FEV1 ~ Smoker*Time_f,
weights = varExp(form = ~ Time | Smoker),
data = dta))
Generalized least squares fit by REML
Model: FEV1 ~ Smoker * Time_f
Data: dta
AIC BIC logLik
1358.2 1436.9 -662.11
Variance function:
Structure: Exponential of variance covariate, different strata
Formula: ~Time | Smoker
Parameter estimates:
Former Current
0.0054259 -0.0061700
Coefficients:
Value Std.Error t-value p-value
(Intercept) 3.5191 0.12014 29.2929 0.0000
SmokerCurrent -0.2898 0.13542 -2.1403 0.0326
Time_f3 0.0586 0.16472 0.3560 0.7219
Time_f6 -0.2545 0.16458 -1.5463 0.1224
Time_f9 -0.3475 0.16320 -2.1291 0.0336
Time_f12 -0.3829 0.16574 -2.3103 0.0211
Time_f15 -0.6491 0.17524 -3.7042 0.0002
Time_f19 -0.6113 0.17030 -3.5894 0.0004
SmokerCurrent:Time_f3 -0.1685 0.18549 -0.9082 0.3640
SmokerCurrent:Time_f6 0.1167 0.18562 0.6285 0.5299
SmokerCurrent:Time_f9 -0.0111 0.18448 -0.0603 0.9519
SmokerCurrent:Time_f12 -0.0485 0.18684 -0.2594 0.7954
SmokerCurrent:Time_f15 0.1008 0.19594 0.5144 0.6071
SmokerCurrent:Time_f19 -0.1203 0.19093 -0.6302 0.5288
Correlation:
(Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
SmokerCurrent -0.887
Time_f3 -0.729 0.647
Time_f6 -0.730 0.648 0.532
Time_f9 -0.736 0.653 0.537 0.537
Time_f12 -0.725 0.643 0.529 0.529 0.534
Time_f15 -0.686 0.608 0.500 0.500 0.505 0.497
Time_f19 -0.705 0.626 0.514 0.515 0.519 0.511 0.484
SmokerCurrent:Time_f3 0.648 -0.730 -0.888 -0.473 -0.477 -0.469 -0.444 -0.457
SmokerCurrent:Time_f6 0.647 -0.730 -0.472 -0.887 -0.476 -0.469 -0.444 -0.457
SmokerCurrent:Time_f9 0.651 -0.734 -0.475 -0.475 -0.885 -0.472 -0.446 -0.459
SmokerCurrent:Time_f12 0.643 -0.725 -0.469 -0.469 -0.473 -0.887 -0.441 -0.454
SmokerCurrent:Time_f15 0.613 -0.691 -0.447 -0.448 -0.451 -0.444 -0.894 -0.433
SmokerCurrent:Time_f19 0.629 -0.709 -0.459 -0.459 -0.463 -0.456 -0.431 -0.892
SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
SmokerCurrent
Time_f3
Time_f6
Time_f9
Time_f12
Time_f15
Time_f19
SmokerCurrent:Time_f3
SmokerCurrent:Time_f6 0.533
SmokerCurrent:Time_f9 0.536 0.536
SmokerCurrent:Time_f12 0.529 0.529 0.532
SmokerCurrent:Time_f15 0.505 0.504 0.507 0.501
SmokerCurrent:Time_f19 0.518 0.517 0.521 0.514 0.490
Standardized residuals:
Min Q1 Med Q3 Max
-2.951764 -0.675274 0.046833 0.672801 2.868361
Residual standard error: 0.57615
Degrees of freedom: 771 total; 757 residual
summary(m4 <- gls(FEV1 ~ Smoker*Time_f,
correlation = corSymm(form = ~ 1 | ID),
weights = varIdent(form = ~ 1 | Time_f*Smoker),
data = dta))
Generalized least squares fit by REML
Model: FEV1 ~ Smoker * Time_f
Data: dta
AIC BIC logLik
361.1 587.94 -131.55
Correlation Structure: General
Formula: ~1 | ID
Parameter estimate(s):
Correlation:
1 2 3 4 5 6
2 0.852
3 0.856 0.888
4 0.849 0.846 0.894
5 0.841 0.853 0.891 0.892
6 0.861 0.866 0.896 0.872 0.935
7 0.753 0.756 0.844 0.789 0.789 0.866
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Time_f * Smoker
Parameter estimates:
0*Former 3*Former 6*Former 9*Former 15*Former 19*Former 0*Current
1.00000 1.15483 1.09813 1.00951 1.19701 1.07961 1.12418
6*Current 9*Current 12*Current 15*Current 19*Current 3*Current 12*Former
1.05188 1.10138 1.03810 1.01589 0.96763 1.03632 1.05276
Coefficients:
Value Std.Error t-value p-value
(Intercept) 3.4847 0.098506 35.375 0.0000
SmokerCurrent -0.2579 0.115994 -2.223 0.0265
Time_f3 -0.0241 0.066099 -0.365 0.7156
Time_f6 -0.2212 0.062781 -3.524 0.0005
Time_f9 -0.3289 0.059362 -5.540 0.0000
Time_f12 -0.3949 0.062233 -6.346 0.0000
Time_f15 -0.4790 0.067774 -7.068 0.0000
Time_f19 -0.5987 0.065807 -9.098 0.0000
SmokerCurrent:Time_f3 -0.0653 0.074608 -0.876 0.3816
SmokerCurrent:Time_f6 0.0639 0.071681 0.891 0.3730
SmokerCurrent:Time_f9 -0.0529 0.069333 -0.763 0.4455
SmokerCurrent:Time_f12 -0.0149 0.071996 -0.207 0.8362
SmokerCurrent:Time_f15 -0.1037 0.076426 -1.357 0.1752
SmokerCurrent:Time_f19 -0.1270 0.075320 -1.687 0.0921
Correlation:
(Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
SmokerCurrent -0.849
Time_f3 -0.129 0.110
Time_f6 -0.197 0.168 0.608
Time_f9 -0.334 0.283 0.521 0.662
Time_f12 -0.289 0.245 0.524 0.647 0.700
Time_f15 -0.062 0.053 0.515 0.588 0.572 0.719
Time_f19 -0.258 0.219 0.493 0.627 0.613 0.653 0.690
SmokerCurrent:Time_f3 0.115 -0.204 -0.886 -0.539 -0.462 -0.465 -0.456 -0.437
SmokerCurrent:Time_f6 0.173 -0.249 -0.533 -0.876 -0.579 -0.566 -0.515 -0.549
SmokerCurrent:Time_f9 0.286 -0.332 -0.446 -0.566 -0.856 -0.600 -0.490 -0.524
SmokerCurrent:Time_f12 0.250 -0.326 -0.453 -0.559 -0.605 -0.864 -0.622 -0.565
SmokerCurrent:Time_f15 0.055 -0.154 -0.457 -0.522 -0.507 -0.638 -0.887 -0.612
SmokerCurrent:Time_f19 0.225 -0.322 -0.431 -0.548 -0.535 -0.571 -0.603 -0.874
SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
SmokerCurrent
Time_f3
Time_f6
Time_f9
Time_f12
Time_f15
Time_f19
SmokerCurrent:Time_f3
SmokerCurrent:Time_f6 0.620
SmokerCurrent:Time_f9 0.528 0.661
SmokerCurrent:Time_f12 0.539 0.650 0.692
SmokerCurrent:Time_f15 0.532 0.600 0.579 0.724
SmokerCurrent:Time_f19 0.512 0.632 0.604 0.654 0.701
Standardized residuals:
Min Q1 Med Q3 Max
-3.008572 -0.652245 0.053685 0.646604 3.177245
Residual standard error: 0.54119
Degrees of freedom: 771 total; 757 residual
After controlling the baseline, current smoker showed worse forced expiratory volume which decreased as long as their smoking duration.
anova(m0, m1, m2, m3, m4)
Model df AIC BIC logLik Test L.Ratio p-value
m0 1 15 1359.3 1428.72 -664.64
m1 2 28 1378.0 1507.60 -660.99 1 vs 2 7.30 0.8860
m2 3 16 1360.9 1435.00 -664.46 2 vs 3 6.95 0.8612
m3 4 17 1358.2 1436.91 -662.11 3 vs 4 4.71 0.0300
m4 5 49 361.1 587.94 -131.55 4 vs 5 1061.11 <.0001
The results revealed that Compound symmetry Model may be the best fit model here.
plot(m4, resid(., type = "pearson") ~ fitted(.) | Smoker,
layout = c(2, 1), aspect = 2, abline = 0, pch = 20, cex = .8,
xlab = "Ftted values", ylab = "Pearson residuals")
dta_m4 <- data.frame(dta, fit = fitted(m4), fit0 = fitted(m0))
# use fitted values to get error bars and see if they cover the data points
ggplot(dta_m4, aes(Time, fit, linetype = Smoker, shape = Smoker,
group = Smoker)) +
stat_summary(fun = mean, geom = "line") +
stat_summary(fun = mean, geom = "point") +
stat_summary(aes(Time, fit0), fun = mean, geom = "point", col = "skyblue") +
stat_summary(aes(Time, FEV1), fun.data = mean_se,
geom = "pointrange", col = "firebrick", alpha = .3) +
scale_shape_manual(values = c(1, 2)) +
labs(x = "Time (years since baseline)", y = "FEV1", linetype = "Smoker",
shape = "Smoker") +
theme(legend.position = c(0.9, 0.9),
legend.background = element_rect(fill = "white", color = "black"))
Reproduce the results of analysis with the markdown file in the reading programs example and compare them with that of an anlysis of covariance.
pacman::p_load(knitr, simstudy, tidyverse, nlme, rms)
An experiment was conducted to study two new reading programs in the fourth grade. Thirty subjects were randomly assigned to three conditions: a control (traditional program), and two experimental programs. Before the study took place, a test was administered to obtain grade-equivalent reading levels for the subjects. At the end of the study, six months later, similar reading evaluation scores were obtained.
Each subject has equal chance to be assigned to either the Control group or the Experimental groups. The assignment has no influence on the pre measurement. The post measurement depends on group and on the baseline measurement (Pre).
dta <- read.csv('data/reading_g3.csv')
dtaL <- gather(data=dta, key=Time, value=Score, Pre:Post, factor_key=TRUE)
kable(head(arrange(dtaL, ID), 10))
| Group | ID | Time | Score |
|---|---|---|---|
| C | S11 | Pre | 32 |
| C | S11 | Post | 41 |
| C | S12 | Pre | 42 |
| C | S12 | Post | 46 |
| C | S13 | Pre | 45 |
| C | S13 | Post | 48 |
| C | S14 | Pre | 46 |
| C | S14 | Post | 54 |
| C | S15 | Pre | 49 |
| C | S15 | Post | 52 |
We inspect the data with a boxplot and a scatter plot.
boxp <- ggplot(dtaL ,
aes(x=Time, y=Score, col=Group))+
geom_boxplot(outlier.size=0) +
geom_point(aes(fill=Group, col=NULL),
shape=21, alpha=0.5, size=2,
position=position_jitterdodge(jitter.width=0.2)) +
theme_bw() +
xlab("")
boxp
ggplot(dta,
aes (x=Pre, y=Post, col=Group)) +
geom_point() +
geom_smooth(method="lm", se=FALSE) +
labs(x="Pre-test reading score",
y="Post-test reading score")+
theme_minimal()
The terms of the most basic constrained model are ‘Time’ and ‘Time * Group’. By omitting ‘Group’, there is no term allowing for a difference in means at baseline between the groups. Consequently, the baseline means are assumed to be equal. However, a general rule in regression modeling is that when there is an interaction in the model, the lower ranked effects should also be included in the model. For this reason, R automatically includes Time and Group as main effects in the model when including the interaction term Time*Group. To avoid to have Group as the main effect in our model, we will create an alternative model matrix: Xalt.
X <- model.matrix(~ Time * Group, data = dtaL)
colnames(X)
[1] "(Intercept)" "TimePost" "GroupT1" "GroupT2"
[5] "TimePost:GroupT1" "TimePost:GroupT2"
Xalt <- X[, c("TimePost", "TimePost:GroupT1", "TimePost:GroupT2")]
colnames(Xalt)
[1] "TimePost" "TimePost:GroupT1" "TimePost:GroupT2"
The gls() function of the ‘nlme’ package can be used for modeling data.
The standard deviations and correlations that should be estimated are defined by the constant variance function varIdent(). By setting weights = varIdent(form = ~ 1 | Time) a separate standard deviation will be estimated for each time point and a separate correlation will be estimated for each pair of time points (= unstructured variance-covariance matrix). In this example, we expect an estimated standard deviation for Time=Pre, an estimated standard deviation for Time=Post and one estimated correlation between the pre- and post reading measurements of the same subject.
By setting weights = varIdent(form = ~ 1 | Time:Group), a separate variance is estimated for each combination of Group and Time (Pre-C Post-C Pre-T1 Post-T1 Pre-T2 Post-T2).
The argument correlation=corSymm (form = ~ 1 | ID) defines the subject levels. The correlation structure is assumed to apply only to observations within the same subject (here: ID); observations from different subjects (a different value for ‘ID’) are assumed to be uncorrelated.
m0_gls <- gls(Score ~ Xalt,
weights = varIdent(form = ~ 1 | Time),
correlation=corSymm (form = ~ 1 | ID),
data = dtaL)
summary(m0_gls)
Generalized least squares fit by REML
Model: Score ~ Xalt
Data: dtaL
AIC BIC logLik
350.21 364.39 -168.11
Correlation Structure: General
Formula: ~1 | ID
Parameter estimate(s):
Correlation:
1
2 0.91
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Time
Parameter estimates:
Pre Post
1.0000 1.0164
Coefficients:
Value Std.Error t-value p-value
(Intercept) 50.500 1.2297 41.068 0.0000
XaltTimePost 6.381 0.9027 7.069 0.0000
XaltTimePost:GroupT1 -2.803 1.2699 -2.208 0.0314
XaltTimePost:GroupT2 7.159 1.2699 5.637 0.0000
Correlation:
(Intr) XltTmP XTP:GT1
XaltTimePost -0.102
XaltTimePost:GroupT1 0.000 -0.703
XaltTimePost:GroupT2 0.000 -0.703 0.500
Standardized residuals:
Min Q1 Med Q3 Max
-2.74679 -0.38362 0.13742 0.74751 1.44937
Residual standard error: 6.7351
Degrees of freedom: 60 total; 56 residual
The estimated correlation between Pre and Post is 0.91. The estimated standard deviation for the Pre measurement is 6.735 and the estimated standard deviation for the Post measurement is 1.016 * 6.735.
We can obtain predicted means using the gls() function. As expected, the predicted value is the same for three groups at baseline (Time == Pre).
To obtain the confidence intervals, the Gls() function of the rms package can be used exactly as the gls() function, leading to exactly the same predictions but with confidence intervals.
m0_Gls <- Gls(Score ~ Xalt,
weights = varIdent(form = ~ 1 | Time),
correlation=corSymm (form = ~ 1 | ID),
data = dtaL)
predictions <- cbind(dtaL, predict(m0_Gls, dtaL, conf.int=0.95))
predictions$Group2 <- factor(predictions$Group, levels=c("All", "C", "T1", "T2"))
predictions$Group2[predictions$Time=="Pre"] <- "All"
pd <- position_dodge(.08)
limits <- aes(ymax=upper , ymin=lower, shape=Group2)
pCI <- ggplot(predictions,
aes(y=linear.predictors,
x=Time)) +
geom_errorbar(limits,
width=0.09,
position=pd) +
geom_line(aes(group=Group,
y=linear.predictors),
position=pd)+
geom_point(aes(fill=Group2, shape=Group2),
position=pd,
size=rel(4)) +
scale_fill_manual(values=c("white", "gray80", "gray20", "black")) +
theme_minimal() +
theme(panel.grid.major=element_blank(),
legend.title=element_blank(),
text=element_text(size=14),
legend.position="bottom") +
labs(x="Time",
y="Estimated mean with corresponding 95% CI")
pCI
When we constrained their performance before the exp as baseline, we found that the T2 group exhibited better performance than other groups after the exp.
In the case of non-randomized experiments or studies an equal mean at baseline cannot be assumed. Consequently, applying constrained model is not appropriate.
Timm, N.H. (1975). Multivariate Analysis with Applications in Education and Psychology, p. 490.
Liu, G.F., Lu, K., Mogg, R., et al. (2009). Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Statistics in Medicine, 28, 2509-2530.