Covariance patterns models
#load package
pacman::p_load(tidyverse, nlme)
#read data
dta <- read.csv("ade.csv")
#see data
str(dta)## 'data.frame': 60 obs. of 3 variables:
## $ Dog : Factor w/ 6 levels "D11","D12","D13",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Hour: num 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ...
## $ ADE : num 5.7 4.7 4.8 4.9 3.88 3.51 2.8 2.6 2.5 2.5 ...
## Dog Hour ADE
## 1 D11 0.0 5.70
## 2 D11 0.5 4.70
## 3 D11 1.0 4.80
## 4 D11 1.5 4.90
## 5 D11 2.0 3.88
## 6 D11 2.5 3.51
#add a columns(hour as factor)
dta <- mutate(dta, Hour_f = factor(Hour))
#set ggplot theme
theme_set(theme_bw())
# means with error bars
ggplot(dta, aes(Hour, ADE)) +
geom_point(alpha = .3) +
stat_summary(fun = mean, geom = "line") +
stat_summary(fun = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
labs(x = "Time (in hours)", y = "ADE")Visdualization of ADE and Time
#Generalized least squares by ADE and hours
#the effect of Hour on ADE
summary(m1 <- gls(ADE ~ Hour,
weights = varIdent(form = ~ 1 | Hour_f),
data = dta))## Generalized least squares fit by REML
## Model: ADE ~ Hour
## Data: dta
## AIC BIC logLik
## 196.6523 221.3776 -86.32617
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Hour_f
## Parameter estimates:
## 0 0.5 1 1.5 2 2.5 3
## 1.00000000 0.55447427 0.64132902 0.51921510 0.43187908 0.08245758 0.10901472
## 3.5 4 4.5
## 0.17046144 0.21770576 0.11133075
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 5.435552 0.27441443 19.807821 0
## Hour -0.701874 0.08386437 -8.369153 0
##
## Correlation:
## (Intr)
## Hour -0.964
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.3854719 -0.6272929 -0.1826742 0.5824007 2.3554705
##
## Residual standard error: 3.508619
## Degrees of freedom: 60 total; 58 residual
The correlation between ADE and Hour is -0.964.
# corAR1, representing an autocorrelation structure of order1
summary(m2 <- gls(ADE ~ Hour,
weights = varIdent(form = ~ 1 | Hour_f),
correlation = corAR1(form = ~ Hour | Dog),
data = dta))## Generalized least squares fit by REML
## Model: ADE ~ Hour
## Data: dta
## AIC BIC logLik
## 168.5913 195.3771 -71.29566
##
## Correlation Structure: ARMA(1,0)
## Formula: ~Hour | Dog
## Parameter estimate(s):
## Phi1
## 0.7553498
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Hour_f
## Parameter estimates:
## 0 0.5 1 1.5 2 2.5 3
## 1.00000000 0.32803241 0.37194739 0.28590639 0.34018814 0.12571538 0.09896982
## 3.5 4 4.5
## 0.16894933 0.22219099 0.09978382
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 4.749848 0.3999065 11.877395 0
## Hour -0.599160 0.1096671 -5.463442 0
##
## Correlation:
## (Intr)
## Hour -0.94
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.43566691 -0.09530459 0.29216375 1.34205811 5.06634565
##
## Residual standard error: 3.278488
## Degrees of freedom: 60 total; 58 residual
The correlation between ADE and Hour is -0.94.
#corCompSymm representing a compound symmetry structure corresponding to uniform correlation.
summary(m3 <- gls(ADE ~ Hour,
weights = varIdent(form = ~ 1 | Hour_f),
correlation = corCompSymm(form = ~ 1 | Dog),
data = dta))## Generalized least squares fit by REML
## Model: ADE ~ Hour
## Data: dta
## AIC BIC logLik
## 161.6077 188.3935 -67.80387
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | Dog
## Parameter estimate(s):
## Rho
## 0.7217933
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Hour_f
## Parameter estimates:
## 0 0.5 1 1.5 2 2.5 3 3.5
## 1.0000000 0.5100528 0.6140381 0.5228105 0.4479331 0.1852681 0.1129502 0.1611728
## 4 4.5
## 0.2309740 0.1131425
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 4.494975 0.30766482 14.609974 0
## Hour -0.529169 0.06981253 -7.579863 0
##
## Correlation:
## (Intr)
## Hour -0.957
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.66528429 -0.02390899 0.26405734 0.89073611 2.44460225
##
## Residual standard error: 3.832969
## Degrees of freedom: 60 total; 58 residual
The correlation between ADE and Hour is -0.957
#Use exponential time correlation structure
summary(m3a <- gls(ADE ~ Hour,
weights = varExp(form = ~ Hour),
correlation = corCompSymm(form = ~ 1 | Dog),
data = dta))## Generalized least squares fit by REML
## Model: ADE ~ Hour
## Data: dta
## AIC BIC logLik
## 163.9703 174.2725 -76.98516
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | Dog
## Parameter estimate(s):
## Rho
## 0.6145863
## Variance function:
## Structure: Exponential of variance covariate
## Formula: ~Hour
## Parameter estimates:
## expon
## -0.4204529
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 5.302798 0.6777356 7.824287 0
## Hour -0.674162 0.1346238 -5.007751 0
##
## Correlation:
## (Intr)
## Hour -0.982
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.1017685 -0.3945215 -0.1031040 0.3711706 2.7263751
##
## Residual standard error: 3.079988
## Degrees of freedom: 60 total; 58 residual
#add Dog as random effect
summary(m3b <- lme(ADE ~ Hour, random = ~ 1 | Dog,
weights = varIdent(form = ~ 1 | Hour_f),
correlation = corCompSymm(form = ~ 1 | Dog),
data = dta))## Linear mixed-effects model fit by REML
## Data: dta
## AIC BIC logLik
## 153.2468 182.093 -62.62338
##
## Random effects:
## Formula: ~1 | Dog
## (Intercept) Residual
## StdDev: 0.3654553 3.928157
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | Dog
## Parameter estimate(s):
## Rho
## 0.8457657
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Hour_f
## Parameter estimates:
## 0 0.5 1 1.5 2 2.5 3
## 1.00000000 0.54618337 0.63785632 0.53997928 0.47199948 0.16064282 0.06300446
## 3.5 4 4.5
## 0.21603493 0.25337758 0.18268767
## Fixed effects: ADE ~ Hour
## Value Std.Error DF t-value p-value
## (Intercept) 4.811462 0.23464110 53 20.50562 0
## Hour -0.612618 0.05378694 53 -11.38973 0
## Correlation:
## (Intr)
## Hour -0.745
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -0.9223391 -0.1299528 0.1645922 0.6995419 2.4095565
##
## Number of Observations: 60
## Number of Groups: 6
## Model df AIC BIC logLik Test L.Ratio p-value
## m1 1 12 196.6523 221.3776 -86.32617
## m2 2 13 168.5913 195.3771 -71.29566 1 vs 2 30.06102 <.0001
## Model df AIC BIC logLik Test L.Ratio p-value
## m1 1 12 196.6523 221.3776 -86.32617
## m3 2 13 161.6077 188.3935 -67.80387 1 vs 2 37.04460 <.0001
## m3a 3 5 163.9703 174.2725 -76.98516 2 vs 3 18.36258 0.0187
The model 2 is better than model 1. The model 3 is better in predict,but m3a also has good model fit.
#load package
pacman::p_load(tidyverse, nlme,car)
#read data
dta2 <- read.csv("fev_smoker.csv", h = T)
#see
str(dta2)## 'data.frame': 771 obs. of 4 variables:
## $ ID : Factor w/ 133 levels "P1","P10","P100",..: 1 1 1 1 1 1 46 46 46 46 ...
## $ Smoker: Factor w/ 2 levels "Current","Former": 2 2 2 2 2 2 1 1 1 1 ...
## $ Time : int 0 3 6 9 15 19 0 3 6 9 ...
## $ FEV1 : num 3.4 3.4 3.45 3.2 2.95 2.4 3.1 3.15 3.5 2.95 ...
## ID Smoker Time FEV1
## 1 P1 Former 0 3.40
## 2 P1 Former 3 3.40
## 3 P1 Former 6 3.45
## 4 P1 Former 9 3.20
## 5 P1 Former 15 2.95
## 6 P1 Former 19 2.40
# save a copy of data in wide format
dtaW <- dta2 %>%
spread(key = Time, value = FEV1)
names(dtaW)[3:9] <- paste0("Year", names(dtaW)[3:9])
# make ellipses in pairwise plot
scatterplotMatrix(~ Year0 + Year3 + Year6 + Year9 + Year12 + Year15 + Year19 | Smoker, data = dtaW, smooth = F, by.group = TRUE,
ellipse = T, pch = c(1, 20),
col = c("firebrick", "forestgreen")) ## Time Smoker FEV1
## 1 0 Current 3.229294
## 2 3 Current 3.119474
## 3 6 Current 3.091461
## 4 9 Current 2.870706
## 5 12 Current 2.797901
## 6 15 Current 2.680959
## 7 19 Current 2.497703
## 8 0 Former 3.519130
## 9 3 Former 3.577778
## 10 6 Former 3.264643
## 11 9 Former 3.171667
## 12 12 Former 3.136207
## 13 15 Former 2.870000
## 14 19 Former 2.907857
## Time Smoker FEV1
## 1 0 Current 0.3402876
## 2 3 Current 0.3007072
## 3 6 Current 0.3191581
## 4 9 Current 0.3165019
## 5 12 Current 0.2969943
## 6 15 Current 0.2642588
## 7 19 Current 0.2546563
## 8 0 Former 0.2358265
## 9 3 Former 0.3941026
## 10 6 Former 0.3871962
## 11 9 Former 0.3380489
## 12 12 Former 0.3760530
## 13 15 Former 0.3709391
## 14 19 Former 0.4178767
## Year0 Year3 Year6 Year9 Year12 Year15 Year19
## Year0 0.3617433 0.3442611 0.3550833 0.2358222 0.2398333 0.3091844 0.2144011
## Year3 0.3442611 0.3566944 0.3479167 0.2434444 0.2455556 0.3014333 0.2141278
## Year6 0.3550833 0.3479167 0.3990278 0.2583333 0.2502778 0.3320000 0.2674722
## Year9 0.2358222 0.2434444 0.2583333 0.1926667 0.1827778 0.2322889 0.1953444
## Year12 0.2398333 0.2455556 0.2502778 0.1827778 0.2022222 0.2201111 0.1406667
## Year15 0.3091844 0.3014333 0.3320000 0.2322889 0.2201111 0.3046711 0.2351044
## Year19 0.2144011 0.2141278 0.2674722 0.1953444 0.1406667 0.2351044 0.3509656
## Year0 Year3 Year6 Year9 Year12 Year15 Year19
## Year0 0.2068623 0.1958723 0.1550242 0.1740866 0.1580108 0.1527199 0.1550177
## Year3 0.1958723 0.2418398 0.1676342 0.1770779 0.1647835 0.1818398 0.1675022
## Year6 0.1550242 0.1676342 0.1618623 0.1328485 0.1484870 0.1479152 0.1571320
## Year9 0.1740866 0.1770779 0.1328485 0.2496970 0.1501407 0.1508160 0.1533355
## Year12 0.1580108 0.1647835 0.1484870 0.1501407 0.1894372 0.1709978 0.1528485
## Year15 0.1527199 0.1818398 0.1479152 0.1508160 0.1709978 0.1873327 0.1654093
## Year19 0.1550177 0.1675022 0.1571320 0.1533355 0.1528485 0.1654093 0.1876729
## 'data.frame': 133 obs. of 9 variables:
## $ ID : Factor w/ 133 levels "P1","P10","P100",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Smoker: Factor w/ 2 levels "Current","Former": 2 1 1 1 1 1 1 1 1 2 ...
## $ Year0 : num 3.4 2.65 3 2.7 2.8 NA NA 4.1 2.07 3.25 ...
## $ Year3 : num 3.4 NA 2.75 3 3.35 3.1 4.4 3.65 2.2 3.3 ...
## $ Year6 : num 3.45 3.3 2.45 NA NA NA 3.5 3.9 2.35 3.2 ...
## $ Year9 : num 3.2 2.4 NA NA 2.95 2.75 3.6 2.75 2.25 3.3 ...
## $ Year12: num NA 2.15 NA 2.25 2.5 2.5 NA 3.35 2.2 2.85 ...
## $ Year15: num 2.95 2.15 2.2 1.85 2.45 1.95 3.4 2.95 NA 3.2 ...
## $ Year19: num 2.4 2.25 2.18 1.6 NA 2.65 3.2 3.17 NA 3.1 ...
# relevel factor level to produce consistent legend in the plot below
dta2$Smoker <- relevel(dta2$Smoker, ref="Former")
#
theme_set(theme_bw())
# means with error bars
ggplot(dta2, aes(Time, FEV1,
group = Smoker, shape = Smoker, linetype = Smoker)) +
stat_summary(fun = mean, geom = "line") +
stat_summary(fun = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
scale_shape_manual(values = c(1, 2)) +
labs(x = "Time (years since baseline)", y = "Mean FEV1 (liters)",
linetype = "Smoker", shape = "Smoker") +
theme(legend.justification = c(1, 1), legend.position = c(1, 1),
legend.background = element_rect(fill = "white", color = "black"))Smokers’ FEV1 declined over time.
# The interaction of FEV1 with smoking and smoking time.
summary(m0 <- gls(FEV1 ~ Smoker*Time_f, data = dta2))## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta2
## AIC BIC logLik
## 1359.281 1428.722 -664.6407
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.519130 0.1171462 30.040506 0.0000
## SmokerCurrent -0.289836 0.1320476 -2.194938 0.0285
## Time_f3 0.058647 0.1594157 0.367889 0.7131
## Time_f6 -0.254488 0.1581008 -1.609653 0.1079
## Time_f9 -0.347464 0.1557060 -2.231537 0.0259
## Time_f12 -0.382924 0.1568667 -2.441076 0.0149
## Time_f15 -0.649130 0.1639349 -3.959684 0.0001
## Time_f19 -0.611273 0.1581008 -3.866351 0.0001
## SmokerCurrent:Time_f3 -0.168468 0.1801366 -0.935222 0.3500
## SmokerCurrent:Time_f6 0.116654 0.1795986 0.649527 0.5162
## SmokerCurrent:Time_f9 -0.011124 0.1779636 -0.062510 0.9502
## SmokerCurrent:Time_f12 -0.048469 0.1794916 -0.270037 0.7872
## SmokerCurrent:Time_f15 0.100795 0.1868469 0.539454 0.5897
## SmokerCurrent:Time_f19 -0.120318 0.1815889 -0.662585 0.5078
##
## Correlation:
## (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent -0.887
## Time_f3 -0.735 0.652
## Time_f6 -0.741 0.657 0.544
## Time_f9 -0.752 0.667 0.553 0.557
## Time_f12 -0.747 0.663 0.549 0.553 0.562
## Time_f15 -0.715 0.634 0.525 0.529 0.538 0.534
## Time_f19 -0.741 0.657 0.544 0.549 0.557 0.553 0.529
## SmokerCurrent:Time_f3 0.650 -0.733 -0.885 -0.482 -0.489 -0.486 -0.465 -0.482
## SmokerCurrent:Time_f6 0.652 -0.735 -0.479 -0.880 -0.491 -0.487 -0.466 -0.483
## SmokerCurrent:Time_f9 0.658 -0.742 -0.484 -0.488 -0.875 -0.492 -0.470 -0.488
## SmokerCurrent:Time_f12 0.653 -0.736 -0.480 -0.484 -0.491 -0.874 -0.466 -0.484
## SmokerCurrent:Time_f15 0.627 -0.707 -0.461 -0.465 -0.472 -0.468 -0.877 -0.465
## SmokerCurrent:Time_f19 0.645 -0.727 -0.474 -0.478 -0.485 -0.482 -0.461 -0.871
## SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerCurrent:Time_f3
## SmokerCurrent:Time_f6 0.539
## SmokerCurrent:Time_f9 0.544 0.546
## SmokerCurrent:Time_f12 0.539 0.541 0.546
## SmokerCurrent:Time_f15 0.518 0.520 0.524 0.520
## SmokerCurrent:Time_f19 0.533 0.535 0.540 0.535 0.514
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.97158077 -0.67377531 0.05043194 0.66748149 3.13946469
##
## Residual standard error: 0.5618133
## Degrees of freedom: 771 total; 757 residual
#varIdent for each level of the Smoker*Time
summary(m1 <- gls(FEV1 ~ Smoker*Time_f,
weights = varIdent(form = ~ 1 | Time_f*Smoker),
data = dta2))## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta2
## AIC BIC logLik
## 1377.981 1507.604 -660.9907
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Time_f * Smoker
## Parameter estimates:
## 0*Former 3*Former 6*Former 9*Former 15*Former 19*Former 0*Current
## 1.000000 1.292730 1.281354 1.197259 1.254158 1.331145 1.201226
## 3*Current 6*Current 9*Current 12*Current 15*Current 19*Current 12*Former
## 1.129204 1.163337 1.158488 1.122216 1.058562 1.039166 1.262785
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.519130 0.1012590 34.75377 0.0000
## SmokerCurrent -0.289836 0.1194016 -2.42741 0.0154
## Time_f3 0.058647 0.1576382 0.37204 0.7100
## Time_f6 -0.254488 0.1551834 -1.63992 0.1014
## Time_f9 -0.347464 0.1467019 -2.36850 0.0181
## Time_f12 -0.382924 0.1523839 -2.51289 0.0122
## Time_f15 -0.649130 0.1603404 -4.04845 0.0001
## Time_f19 -0.611273 0.1586741 -3.85238 0.0001
## SmokerCurrent:Time_f3 -0.168468 0.1789371 -0.94149 0.3468
## SmokerCurrent:Time_f6 0.116654 0.1779643 0.65549 0.5124
## SmokerCurrent:Time_f9 -0.011124 0.1710216 -0.06505 0.9482
## SmokerCurrent:Time_f12 -0.048469 0.1757578 -0.27577 0.7828
## SmokerCurrent:Time_f15 0.100795 0.1825716 0.55209 0.5811
## SmokerCurrent:Time_f19 -0.120318 0.1806162 -0.66615 0.5055
##
## Correlation:
## (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent -0.848
## Time_f3 -0.642 0.545
## Time_f6 -0.653 0.553 0.419
## Time_f9 -0.690 0.585 0.443 0.450
## Time_f12 -0.664 0.564 0.427 0.434 0.459
## Time_f15 -0.632 0.536 0.406 0.412 0.436 0.420
## Time_f19 -0.638 0.541 0.410 0.416 0.440 0.424 0.403
## SmokerCurrent:Time_f3 0.566 -0.667 -0.881 -0.369 -0.391 -0.376 -0.357 -0.361
## SmokerCurrent:Time_f6 0.569 -0.671 -0.365 -0.872 -0.393 -0.378 -0.359 -0.363
## SmokerCurrent:Time_f9 0.592 -0.698 -0.380 -0.386 -0.858 -0.393 -0.374 -0.378
## SmokerCurrent:Time_f12 0.576 -0.679 -0.370 -0.376 -0.398 -0.867 -0.364 -0.368
## SmokerCurrent:Time_f15 0.555 -0.654 -0.356 -0.362 -0.383 -0.369 -0.878 -0.354
## SmokerCurrent:Time_f19 0.561 -0.661 -0.360 -0.366 -0.387 -0.373 -0.354 -0.879
## SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerCurrent:Time_f3
## SmokerCurrent:Time_f6 0.448
## SmokerCurrent:Time_f9 0.466 0.468
## SmokerCurrent:Time_f12 0.453 0.456 0.474
## SmokerCurrent:Time_f15 0.436 0.439 0.457 0.444
## SmokerCurrent:Time_f19 0.441 0.444 0.462 0.449 0.432
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.04445773 -0.68139172 0.04873177 0.67419760 2.87621234
##
## Residual standard error: 0.4856209
## Degrees of freedom: 771 total; 757 residual
#Exponential of variance covariate
summary(m2 <- gls(FEV1 ~ Smoker*Time_f,
weights = varExp(form = ~ Time),
data = dta2))## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta2
## AIC BIC logLik
## 1360.927 1434.996 -664.4633
##
## Variance function:
## Structure: Exponential of variance covariate
## Formula: ~Time
## Parameter estimates:
## expon
## -0.002528783
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.519130 0.1197648 29.383681 0.0000
## SmokerCurrent -0.289836 0.1349993 -2.146947 0.0321
## Time_f3 0.058647 0.1624138 0.361098 0.7181
## Time_f6 -0.254488 0.1605419 -1.585179 0.1133
## Time_f9 -0.347464 0.1576421 -2.204130 0.0278
## Time_f12 -0.382924 0.1582709 -2.419418 0.0158
## Time_f15 -0.649130 0.1645762 -3.944255 0.0001
## Time_f19 -0.611273 0.1582604 -3.862453 0.0001
## SmokerCurrent:Time_f3 -0.168468 0.1835206 -0.917977 0.3589
## SmokerCurrent:Time_f6 0.116654 0.1823482 0.639733 0.5225
## SmokerCurrent:Time_f9 -0.011124 0.1801132 -0.061764 0.9508
## SmokerCurrent:Time_f12 -0.048469 0.1810081 -0.267774 0.7889
## SmokerCurrent:Time_f15 0.100795 0.1874982 0.537580 0.5910
## SmokerCurrent:Time_f19 -0.120318 0.1815964 -0.662558 0.5078
##
## Correlation:
## (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent -0.887
## Time_f3 -0.737 0.654
## Time_f6 -0.746 0.662 0.550
## Time_f9 -0.760 0.674 0.560 0.567
## Time_f12 -0.757 0.671 0.558 0.565 0.575
## Time_f15 -0.728 0.646 0.537 0.543 0.553 0.551
## Time_f19 -0.757 0.671 0.558 0.565 0.575 0.573 0.551
## SmokerCurrent:Time_f3 0.653 -0.736 -0.885 -0.487 -0.496 -0.494 -0.475 -0.494
## SmokerCurrent:Time_f6 0.657 -0.740 -0.484 -0.880 -0.499 -0.497 -0.478 -0.497
## SmokerCurrent:Time_f9 0.665 -0.750 -0.490 -0.496 -0.875 -0.503 -0.484 -0.503
## SmokerCurrent:Time_f12 0.662 -0.746 -0.488 -0.494 -0.503 -0.874 -0.481 -0.501
## SmokerCurrent:Time_f15 0.639 -0.720 -0.471 -0.477 -0.485 -0.483 -0.878 -0.483
## SmokerCurrent:Time_f19 0.660 -0.743 -0.486 -0.492 -0.501 -0.499 -0.480 -0.871
## SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerCurrent:Time_f3
## SmokerCurrent:Time_f6 0.545
## SmokerCurrent:Time_f9 0.551 0.555
## SmokerCurrent:Time_f12 0.549 0.552 0.559
## SmokerCurrent:Time_f15 0.530 0.533 0.540 0.537
## SmokerCurrent:Time_f19 0.547 0.550 0.557 0.554 0.535
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.92874262 -0.66677628 0.05044755 0.66745312 3.16543499
##
## Residual standard error: 0.5743718
## Degrees of freedom: 771 total; 757 residual
#Exponential time correlation structure in differ type of smoker
summary(m3 <- gls(FEV1 ~ Smoker*Time_f,
weights = varExp(form = ~ Time | Smoker),
data = dta2))## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta2
## AIC BIC logLik
## 1358.216 1436.915 -662.1079
##
## Variance function:
## Structure: Exponential of variance covariate, different strata
## Formula: ~Time | Smoker
## Parameter estimates:
## Former Current
## 0.005425870 -0.006170002
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.519130 0.1201359 29.292914 0.0000
## SmokerCurrent -0.289836 0.1354176 -2.140315 0.0326
## Time_f3 0.058647 0.1647238 0.356034 0.7219
## Time_f6 -0.254488 0.1645770 -1.546313 0.1224
## Time_f9 -0.347464 0.1631958 -2.129122 0.0336
## Time_f12 -0.382924 0.1657444 -2.310326 0.0211
## Time_f15 -0.649130 0.1752396 -3.704246 0.0002
## Time_f19 -0.611273 0.1703016 -3.589357 0.0004
## SmokerCurrent:Time_f3 -0.168468 0.1854898 -0.908232 0.3640
## SmokerCurrent:Time_f6 0.116654 0.1856193 0.628459 0.5299
## SmokerCurrent:Time_f9 -0.011124 0.1844802 -0.060302 0.9519
## SmokerCurrent:Time_f12 -0.048469 0.1868438 -0.259411 0.7954
## SmokerCurrent:Time_f15 0.100795 0.1959415 0.514415 0.6071
## SmokerCurrent:Time_f19 -0.120318 0.1909351 -0.630152 0.5288
##
## Correlation:
## (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent -0.887
## Time_f3 -0.729 0.647
## Time_f6 -0.730 0.648 0.532
## Time_f9 -0.736 0.653 0.537 0.537
## Time_f12 -0.725 0.643 0.529 0.529 0.534
## Time_f15 -0.686 0.608 0.500 0.500 0.505 0.497
## Time_f19 -0.705 0.626 0.514 0.515 0.519 0.511 0.484
## SmokerCurrent:Time_f3 0.648 -0.730 -0.888 -0.473 -0.477 -0.469 -0.444 -0.457
## SmokerCurrent:Time_f6 0.647 -0.730 -0.472 -0.887 -0.476 -0.469 -0.444 -0.457
## SmokerCurrent:Time_f9 0.651 -0.734 -0.475 -0.475 -0.885 -0.472 -0.446 -0.459
## SmokerCurrent:Time_f12 0.643 -0.725 -0.469 -0.469 -0.473 -0.887 -0.441 -0.454
## SmokerCurrent:Time_f15 0.613 -0.691 -0.447 -0.448 -0.451 -0.444 -0.894 -0.433
## SmokerCurrent:Time_f19 0.629 -0.709 -0.459 -0.459 -0.463 -0.456 -0.431 -0.892
## SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerCurrent:Time_f3
## SmokerCurrent:Time_f6 0.533
## SmokerCurrent:Time_f9 0.536 0.536
## SmokerCurrent:Time_f12 0.529 0.529 0.532
## SmokerCurrent:Time_f15 0.505 0.504 0.507 0.501
## SmokerCurrent:Time_f19 0.518 0.517 0.521 0.514 0.490
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.95176426 -0.67527408 0.04683313 0.67280125 2.86836115
##
## Residual standard error: 0.5761515
## Degrees of freedom: 771 total; 757 residual
#A general correlation structure.
summary(m4 <- gls(FEV1 ~ Smoker*Time_f,
correlation = corSymm(form = ~ 1 | ID),
weights = varIdent(form = ~ 1 | Time_f*Smoker),
data = dta2))## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta2
## AIC BIC logLik
## 361.1039 587.9427 -131.5519
##
## Correlation Structure: General
## Formula: ~1 | ID
## Parameter estimate(s):
## Correlation:
## 1 2 3 4 5 6
## 2 0.852
## 3 0.856 0.888
## 4 0.849 0.846 0.894
## 5 0.841 0.853 0.891 0.892
## 6 0.861 0.866 0.896 0.872 0.935
## 7 0.753 0.756 0.844 0.789 0.789 0.866
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Time_f * Smoker
## Parameter estimates:
## 0*Former 3*Former 6*Former 9*Former 15*Former 19*Former 0*Current
## 1.0000000 1.1548253 1.0981217 1.0095093 1.1970028 1.0796092 1.1241763
## 6*Current 9*Current 12*Current 15*Current 19*Current 3*Current 12*Former
## 1.0518756 1.1013750 1.0380913 1.0158861 0.9676265 1.0363118 1.0527522
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.484680 0.09850685 35.37500 0.0000
## SmokerCurrent -0.257911 0.11599433 -2.22348 0.0265
## Time_f3 -0.024095 0.06609965 -0.36453 0.7156
## Time_f6 -0.221219 0.06278074 -3.52368 0.0005
## Time_f9 -0.328856 0.05936269 -5.53977 0.0000
## Time_f12 -0.394901 0.06223303 -6.34552 0.0000
## Time_f15 -0.479012 0.06777450 -7.06774 0.0000
## Time_f19 -0.598701 0.06580692 -9.09784 0.0000
## SmokerCurrent:Time_f3 -0.065321 0.07460797 -0.87552 0.3816
## SmokerCurrent:Time_f6 0.063890 0.07168101 0.89130 0.3730
## SmokerCurrent:Time_f9 -0.052926 0.06933374 -0.76336 0.4455
## SmokerCurrent:Time_f12 -0.014889 0.07199623 -0.20680 0.8362
## SmokerCurrent:Time_f15 -0.103704 0.07642577 -1.35692 0.1752
## SmokerCurrent:Time_f19 -0.127043 0.07531976 -1.68672 0.0921
##
## Correlation:
## (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent -0.849
## Time_f3 -0.129 0.110
## Time_f6 -0.197 0.168 0.608
## Time_f9 -0.334 0.283 0.521 0.662
## Time_f12 -0.289 0.245 0.524 0.647 0.700
## Time_f15 -0.062 0.053 0.515 0.588 0.572 0.719
## Time_f19 -0.258 0.219 0.493 0.627 0.613 0.653 0.690
## SmokerCurrent:Time_f3 0.115 -0.204 -0.886 -0.539 -0.462 -0.465 -0.456 -0.437
## SmokerCurrent:Time_f6 0.173 -0.249 -0.533 -0.876 -0.579 -0.566 -0.515 -0.549
## SmokerCurrent:Time_f9 0.286 -0.332 -0.446 -0.566 -0.856 -0.600 -0.490 -0.524
## SmokerCurrent:Time_f12 0.250 -0.326 -0.453 -0.559 -0.605 -0.864 -0.622 -0.565
## SmokerCurrent:Time_f15 0.055 -0.154 -0.457 -0.522 -0.507 -0.638 -0.887 -0.612
## SmokerCurrent:Time_f19 0.225 -0.322 -0.431 -0.548 -0.535 -0.571 -0.603 -0.874
## SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerCurrent:Time_f3
## SmokerCurrent:Time_f6 0.620
## SmokerCurrent:Time_f9 0.528 0.661
## SmokerCurrent:Time_f12 0.539 0.650 0.692
## SmokerCurrent:Time_f15 0.532 0.600 0.579 0.724
## SmokerCurrent:Time_f19 0.512 0.632 0.604 0.654 0.701
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.00857813 -0.65224622 0.05368483 0.64660517 3.17724882
##
## Residual standard error: 0.5411956
## Degrees of freedom: 771 total; 757 residual
## Model df AIC BIC logLik Test L.Ratio p-value
## m0 1 15 1359.2814 1428.7218 -664.6407
## m1 2 28 1377.9814 1507.6035 -660.9907 1 vs 2 7.3000 0.8860
## m2 3 16 1360.9267 1434.9965 -664.4633 2 vs 3 6.9453 0.8612
## m3 4 17 1358.2158 1436.9149 -662.1079 3 vs 4 4.7109 0.0300
## m4 5 49 361.1039 587.9427 -131.5519 4 vs 5 1061.1119 <.0001
## Approximate 95% confidence intervals
##
## Correlation structure:
## lower est. upper
## cor(1,2) 0.7984104 0.8523042 0.8926498
## cor(1,3) 0.8033256 0.8559242 0.8952727
## cor(1,4) 0.7947990 0.8498786 0.8910711
## cor(1,5) 0.7853424 0.8419941 0.8846604
## cor(1,6) 0.8045082 0.8615122 0.9027897
## cor(1,7) 0.5939791 0.7501549 0.8518415
## cor(2,3) 0.8467963 0.8884937 0.9193380
## cor(2,4) 0.7906006 0.8468000 0.8888527
## cor(2,5) 0.8008437 0.8539526 0.8937347
## cor(2,6) 0.8117837 0.8671055 0.9070014
## cor(2,7) 0.6008394 0.7537021 0.8533860
## cor(3,4) 0.8536486 0.8939198 0.9235665
## cor(3,5) 0.8508037 0.8914253 0.9214563
## cor(3,6) 0.8481543 0.8962899 0.9297465
## cor(3,7) 0.7174308 0.8426634 0.9151364
## cor(4,5) 0.8516577 0.8924253 0.9224583
## cor(4,6) 0.8179564 0.8724600 0.9114402
## cor(4,7) 0.6580127 0.7872924 0.8714891
## cor(5,6) 0.9056436 0.9358770 0.9566434
## cor(5,7) 0.6478585 0.7869055 0.8752062
## cor(6,7) 0.7618984 0.8638875 0.9240729
## attr(,"label")
## [1] "Correlation structure:"
##
## Variance function:
## lower est. upper
## 3*Former 0.9687025 1.1565059 1.380719
## 6*Former 0.9312166 1.0999878 1.299347
## 9*Former 0.8537168 1.0114297 1.198278
## 15*Former 0.9942021 1.1984605 1.444684
## 19*Former 0.9035607 1.0802535 1.291499
## 0*Current 0.9718621 1.1411996 1.340042
## 6*Current 0.9271746 1.0677355 1.229605
## 9*Current 0.9585674 1.1179806 1.303905
## 12*Current 0.9069306 1.0541140 1.225183
## 15*Current 0.8851592 1.0311232 1.201157
## 19*Current 0.8413179 0.9817466 1.145615
## 3*Current 0.9129852 1.0521247 1.212469
## 12*Former 0.8828825 1.0546222 1.259769
## attr(,"label")
## [1] "Variance function:"
##
## Residual standard error:
## lower est. upper
## 0.4530627 0.5310278 0.6224094
The model 4 is better.
# residual plot
plot(m4, resid(., type = "pearson") ~ fitted(.) | Smoker,
layout = c(2, 1), aspect = 2, abline = 0, pch = 20, cex = .8,
xlab = "Ftted values", ylab = "Pearson residuals")#
dta_m4 <- data.frame(dta2, fit = fitted(m4), fit0 = fitted(m0))
# use fitted values to get error bars and see if they cover the data points
ggplot(dta_m4, aes(Time, fit, linetype = Smoker, shape = Smoker,
group = Smoker)) +
stat_summary(fun = mean, geom = "line") +
stat_summary(fun = mean, geom = "point") +
stat_summary(aes(Time, fit0), fun = mean, geom = "point", col = "skyblue") +
stat_summary(aes(Time, FEV1), fun.data = mean_se,
geom = "pointrange", col = "firebrick", alpha = .3) +
scale_shape_manual(values = c(1, 2)) +
labs(x = "Time (years since baseline)", y = "FEV1", linetype = "Smoker",
shape = "Smoker") +
theme(legend.position = c(0.9, 0.9),
legend.background = element_rect(fill = "white", color = "black"))Constrained Pre-Post Analysis
package 'mvnfast' successfully unpacked and MD5 sums checked
package 'simstudy' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Tjlee\AppData\Local\Temp\RtmpiQAuBX\downloaded_packages
package 'TH.data' successfully unpacked and MD5 sums checked
package 'sandwich' successfully unpacked and MD5 sums checked
package 'survival' successfully unpacked and MD5 sums checked
package 'polspline' successfully unpacked and MD5 sums checked
package 'multcomp' successfully unpacked and MD5 sums checked
package 'rms' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Tjlee\AppData\Local\Temp\RtmpiQAuBX\downloaded_packages
An experiment was conducted to study two new reading programs in the fourth grade. Thirty subjects were randomly assigned to three conditions: a control (traditional program), and two experimental programs. Before the study took place, a test was administered to obtain grade-equivalent reading levels for the subjects. At the end of the study, six months later, similar reading evaluation scores were obtained.
Each subject has equal chance to be assigned to either the Control group or the Experimental groups. The assignment has no influence on the pre measurement. The post measurement depends on group and on the baseline measurement (Pre).
dtaL <- gather(data=dta3, key=Time, value=Score, Pre:Post, factor_key=TRUE)
kable(head(arrange(dtaL, ID), 10))| Group | ID | Time | Score |
|---|---|---|---|
| C | S11 | Pre | 32 |
| C | S11 | Post | 41 |
| C | S12 | Pre | 42 |
| C | S12 | Post | 46 |
| C | S13 | Pre | 45 |
| C | S13 | Post | 48 |
| C | S14 | Pre | 46 |
| C | S14 | Post | 54 |
| C | S15 | Pre | 49 |
| C | S15 | Post | 52 |
We inspect the data with a boxplot and a scatter plot.
boxp <- ggplot(dtaL ,
aes(x=Time, y=Score, col=Group))+
geom_boxplot(outlier.size=0) +
geom_point(aes(fill=Group, col=NULL),
shape=21, alpha=0.5, size=2,
position=position_jitterdodge(jitter.width=0.2)) +
theme_bw() +
xlab("")
boxpggplot(dta3,
aes (x=Pre, y=Post, col=Group)) +
geom_point() +
geom_smooth(method="lm", se=FALSE) +
labs(x="Pre-test reading score",
y="Post-test reading score")+
theme_minimal()The terms of the most basic constrained model are ‘Time’ and ‘Time * Group’. By omitting ‘Group’, there is no term allowing for a difference in means at baseline between the groups. Consequently, the baseline means are assumed to be equal. However, a general rule in regression modeling is that when there is an interaction in the model, the lower ranked effects should also be included in the model. For this reason, R automatically includes Time and Group as main effects in the model when including the interaction term Time*Group. To avoid to have Group as the main effect in our model, we will create an alternative model matrix: Xalt.
[1] "(Intercept)" "TimePost" "GroupT1" "GroupT2"
[5] "TimePost:GroupT1" "TimePost:GroupT2"
[1] "TimePost" "TimePost:GroupT1" "TimePost:GroupT2"
The gls() function of the ‘nlme’ package can be used for modeling data.
The standard deviations and correlations that should be estimated are defined by the constant variance function varIdent(). By setting weights = varIdent(form = ~ 1 | Time) a separate standard deviation will be estimated for each time point and a separate correlation will be estimated for each pair of time points (= unstructured variance-covariance matrix). In this example, we expect an estimated standard deviation for Time=Pre, an estimated standard deviation for Time=Post and one estimated correlation between the pre- and post reading measurements of the same subject.
By setting weights = varIdent(form = ~ 1 | Time:Group), a separate variance is estimated for each combination of Group and Time (Pre-C Post-C Pre-T1 Post-T1 Pre-T2 Post-T2).
The argument correlation=corSymm (form = ~ 1 | ID) defines the subject levels. The correlation structure is assumed to apply only to observations within the same subject (here: ID); observations from different subjects (a different value for ‘ID’) are assumed to be uncorrelated.
m0_gls <- gls(Score ~ Xalt,
weights = varIdent(form = ~ 1 | Time),
correlation=corSymm (form = ~ 1 | ID),
data = dtaL)
summary(m0_gls)Generalized least squares fit by REML
Model: Score ~ Xalt
Data: dtaL
AIC BIC logLik
350.2136 364.391 -168.1068
Correlation Structure: General
Formula: ~1 | ID
Parameter estimate(s):
Correlation:
1
2 0.91
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Time
Parameter estimates:
Pre Post
1.000000 1.016413
Coefficients:
Value Std.Error t-value p-value
(Intercept) 50.50000 1.2296621 41.06819 0.0000
XaltTimePost 6.38144 0.9027047 7.06924 0.0000
XaltTimePost:GroupT1 -2.80337 1.2699097 -2.20753 0.0314
XaltTimePost:GroupT2 7.15906 1.2699097 5.63745 0.0000
Correlation:
(Intr) XltTmP XTP:GT1
XaltTimePost -0.102
XaltTimePost:GroupT1 0.000 -0.703
XaltTimePost:GroupT2 0.000 -0.703 0.500
Standardized residuals:
Min Q1 Med Q3 Max
-2.7467890 -0.3836191 0.1374178 0.7475076 1.4493709
Residual standard error: 6.735137
Degrees of freedom: 60 total; 56 residual
The estimated correlation between Pre and Post is 0.91. The estimated standard deviation for the Pre measurement is 6.735 and the estimated standard deviation for the Post measurement is 1.016 * 6.735.
We can obtain predicted means using the gls() function. As expected, the predicted value is the same for three groups at baseline (Time == Pre).
To obtain the confidence intervals, the Gls() function of the rms package can be used exactly as the gls() function, leading to exactly the same predictions but with confidence intervals.
m0_Gls <- Gls(Score ~ Xalt,
weights = varIdent(form = ~ 1 | Time),
correlation=corSymm (form = ~ 1 | ID),
data = dtaL)predictions$Group2 <- factor(predictions$Group, levels=c("All", "C", "T1", "T2"))
predictions$Group2[predictions$Time=="Pre"] <- "All"pCI <- ggplot(predictions,
aes(y=linear.predictors,
x=Time)) +
geom_errorbar(limits,
width=0.09,
position=pd) +
geom_line(aes(group=Group,
y=linear.predictors),
position=pd)+
geom_point(aes(fill=Group2),
position=pd,
shape=21,
size=rel(4)) +
scale_fill_manual(values=c("white", "gray80", "gray20", "black")) +
theme_minimal() +
theme(panel.grid.major=element_blank(),
legend.title=element_blank(),
text=element_text(size=14),
legend.position="bottom") +
labs(x="Time",
y="Estimated mean with corresponding 95% CI")
pCIIn the case of non-randomized experiments or studies an equal mean at baseline cannot be assumed. Consequently, applying constrained model is not appropriate.
Reference
Timm, N.H. (1975). Multivariate Analysis with Applications in Education and Psychology, p. 490.
Liu, G.F., Lu, K., Mogg, R., et al. (2009). Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Statistics in Medicine, 28, 2509-2530.