Column 1: Dog ID
Column 2: Time in hours
Column 3: Arrhythmogenic dose of epinephrine (ADE)
# Loading package
pacman::p_load(tidyverse, nlme)
# Input data
dta1 <- read.csv("ade.csv")
str(dta1)
## 'data.frame': 60 obs. of 3 variables:
## $ Dog : chr "D11" "D11" "D11" "D11" ...
## $ Hour: num 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ...
## $ ADE : num 5.7 4.7 4.8 4.9 3.88 3.51 2.8 2.6 2.5 2.5 ...
head(dta1)
## Dog Hour ADE
## 1 D11 0.0 5.70
## 2 D11 0.5 4.70
## 3 D11 1.0 4.80
## 4 D11 1.5 4.90
## 5 D11 2.0 3.88
## 6 D11 2.5 3.51
dta1 <- mutate(dta1, Hour_f = factor(Hour))
theme_set(theme_bw())
# means with error bars
ggplot(dta1, aes(Hour, ADE)) +
geom_point(alpha = .3) +
stat_summary(fun = mean, geom = "line") +
stat_summary(fun = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
labs(x = "Time (in hours)", y = "ADE")
A reducing trend of ADE value and variation by time among the six dog.
car::scatterplotMatrix(~ ADE + Hour_f,
data=dta1,
col=8,
smooth=FALSE,
regLine=FALSE,
ellipse=list(levels=c(0.68, 0.975),
fill=TRUE,
fill.alpha=0.1))
It’s not compound symmetry (may not consider apply “corCompSymm”), should consider time related: AR1.
Generalized Least Squares
Generalized least squares: the effect of Hour on ADE
varIdent: When Hour_f is a factor, variance is allowed to be different for each level of the Hour_f
summary(m1 <- gls(ADE ~ Hour,
weights = varIdent(form = ~ 1 | Hour_f),
data = dta1))
## Generalized least squares fit by REML
## Model: ADE ~ Hour
## Data: dta1
## AIC BIC logLik
## 196.6523 221.3776 -86.32617
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Hour_f
## Parameter estimates:
## 0 0.5 1 1.5 2 2.5 3
## 1.00000000 0.55447427 0.64132902 0.51921510 0.43187908 0.08245758 0.10901472
## 3.5 4 4.5
## 0.17046144 0.21770576 0.11133075
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 5.435552 0.27441443 19.807821 0
## Hour -0.701874 0.08386437 -8.369153 0
##
## Correlation:
## (Intr)
## Hour -0.964
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.3854719 -0.6272929 -0.1826742 0.5824007 2.3554705
##
## Residual standard error: 3.508619
## Degrees of freedom: 60 total; 58 residual
The SD of ADE in each time points are decreasing (1 to 0.11).
The correlation between ADE and Hour is -0.964. (ADE is reduced by time) Yt = β0 + β1 × hourt
The estimate value of ADE= 5.44 - 0.7 × hour.
Apply corAR1 (form = ~ Hour | Dog)
The correlation structure is assumed to apply only to observations within the same grouping level (Dog).
summary(m2 <- gls(ADE ~ Hour,
weights = varIdent(form = ~ 1 | Hour_f),
correlation = corAR1(form = ~ Hour | Dog),
data = dta1))
## Generalized least squares fit by REML
## Model: ADE ~ Hour
## Data: dta1
## AIC BIC logLik
## 168.5913 195.3771 -71.29566
##
## Correlation Structure: ARMA(1,0)
## Formula: ~Hour | Dog
## Parameter estimate(s):
## Phi1
## 0.7553498
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Hour_f
## Parameter estimates:
## 0 0.5 1 1.5 2 2.5 3
## 1.00000000 0.32803241 0.37194739 0.28590639 0.34018814 0.12571538 0.09896982
## 3.5 4 4.5
## 0.16894933 0.22219099 0.09978382
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 4.749848 0.3999065 11.877395 0
## Hour -0.599160 0.1096671 -5.463442 0
##
## Correlation:
## (Intr)
## Hour -0.94
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.43566691 -0.09530459 0.29216375 1.34205811 5.06634565
##
## Residual standard error: 3.278488
## Degrees of freedom: 60 total; 58 residual
knitr::kable(intervals(m2)$corStruct)
lower | est. | upper | |
---|---|---|---|
Phi1 | 0.1414353 | 0.7553498 | 0.9496501 |
knitr::kable(t(intervals(m2)$coef[1, c(2,1,3)]))
est. | lower | upper |
---|---|---|
4.749848 | 3.949348 | 5.550348 |
The fitted correlation parameter Phi=0.755.
The AIC of m2 (168.59) is better than m1 (196.65) due to consider the correlation within dogs?
The estimate value of ADE= 4.75 - 0.6 × hour.
apply corCompSymm (form = ~ 1 | Dog)
summary(m3 <- gls(ADE ~ Hour,
weights = varIdent(form = ~ 1 | Hour_f),
correlation = corCompSymm(form = ~ 1 | Dog),
data = dta1))
## Generalized least squares fit by REML
## Model: ADE ~ Hour
## Data: dta1
## AIC BIC logLik
## 161.6077 188.3935 -67.80387
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | Dog
## Parameter estimate(s):
## Rho
## 0.7217933
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Hour_f
## Parameter estimates:
## 0 0.5 1 1.5 2 2.5 3 3.5
## 1.0000000 0.5100528 0.6140381 0.5228105 0.4479331 0.1852681 0.1129502 0.1611728
## 4 4.5
## 0.2309740 0.1131425
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 4.494975 0.30766482 14.609974 0
## Hour -0.529169 0.06981253 -7.579863 0
##
## Correlation:
## (Intr)
## Hour -0.957
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.66528429 -0.02390899 0.26405734 0.89073611 2.44460225
##
## Residual standard error: 3.832969
## Degrees of freedom: 60 total; 58 residual
The correlation between ADE and Hour is -0.957. (similar to m1)
WHY the AIC of m3 is better than m1?
apply exponential time correlation structure: ρ=exp(−t/t0)
varExp: Exponential of the variance covariate
summary(m3a <- gls(ADE ~ Hour,
weights = varExp(form = ~ Hour),
correlation = corCompSymm(form = ~ 1 | Dog),
data = dta1))
## Generalized least squares fit by REML
## Model: ADE ~ Hour
## Data: dta1
## AIC BIC logLik
## 163.9703 174.2725 -76.98516
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | Dog
## Parameter estimate(s):
## Rho
## 0.6145863
## Variance function:
## Structure: Exponential of variance covariate
## Formula: ~Hour
## Parameter estimates:
## expon
## -0.4204529
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 5.302798 0.6777356 7.824287 0
## Hour -0.674162 0.1346238 -5.007751 0
##
## Correlation:
## (Intr)
## Hour -0.982
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.1017685 -0.3945215 -0.1031040 0.3711706 2.7263751
##
## Residual standard error: 3.079988
## Degrees of freedom: 60 total; 58 residual
add Dog as random effect
summary(m3b <- lme(ADE ~ Hour, random = ~ 1 | Dog,
weights = varIdent(form = ~ 1 | Hour_f),
correlation = corCompSymm(form = ~ 1 | Dog),
data = dta1))
## Linear mixed-effects model fit by REML
## Data: dta1
## AIC BIC logLik
## 153.2468 182.093 -62.62338
##
## Random effects:
## Formula: ~1 | Dog
## (Intercept) Residual
## StdDev: 0.3654553 3.928157
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | Dog
## Parameter estimate(s):
## Rho
## 0.8457657
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Hour_f
## Parameter estimates:
## 0 0.5 1 1.5 2 2.5 3
## 1.00000000 0.54618337 0.63785632 0.53997928 0.47199948 0.16064282 0.06300446
## 3.5 4 4.5
## 0.21603493 0.25337758 0.18268767
## Fixed effects: ADE ~ Hour
## Value Std.Error DF t-value p-value
## (Intercept) 4.811462 0.23464110 53 20.50562 0
## Hour -0.612618 0.05378694 53 -11.38973 0
## Correlation:
## (Intr)
## Hour -0.745
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -0.9223391 -0.1299528 0.1645922 0.6995419 2.4095565
##
## Number of Observations: 60
## Number of Groups: 6
anova(m1, m2)
## Model df AIC BIC logLik Test L.Ratio p-value
## m1 1 12 196.6523 221.3776 -86.32617
## m2 2 13 168.5913 195.3771 -71.29566 1 vs 2 30.06102 <.0001
m2 is better than m1 due to consider the time related effect and the correlation structure within the same grouping level (Dog).
anova(m1, m3, m3a, m3b)
## Model df AIC BIC logLik Test L.Ratio p-value
## m1 1 12 196.6523 221.3776 -86.32617
## m3 2 13 161.6077 188.3935 -67.80387 1 vs 2 37.04460 <.0001
## m3a 3 5 163.9703 174.2725 -76.98516 2 vs 3 18.36258 0.0187
## m3b 4 14 153.2468 182.0930 -62.62338 3 vs 4 28.72355 0.0007
m3b might be the best model to predict?
plot(m3, resid(., type = "pearson") ~ fitted(.) | Dog,
layout = c(6, 1), aspect = 2, abline = 0, pch = 20, cex = .8,
xlab = "Ftted values", ylab = "Pearson residuals")
Column 1: Subject ID
Column 2: Smoker status
Column 3: Time from baseline (years), 0, 3, 6, 9, 12, 15, 19
Column 4: Forced expiratory volume (FEV1)
pacman::p_load(tidyverse, nlme, car)
dta2 <- read.csv("fev_smoker.csv", h = T)
str(dta2)
## 'data.frame': 771 obs. of 4 variables:
## $ ID : chr "P1" "P1" "P1" "P1" ...
## $ Smoker: chr "Former" "Former" "Former" "Former" ...
## $ Time : int 0 3 6 9 15 19 0 3 6 9 ...
## $ FEV1 : num 3.4 3.4 3.45 3.2 2.95 2.4 3.1 3.15 3.5 2.95 ...
head(dta2)
## ID Smoker Time FEV1
## 1 P1 Former 0 3.40
## 2 P1 Former 3 3.40
## 3 P1 Former 6 3.45
## 4 P1 Former 9 3.20
## 5 P1 Former 15 2.95
## 6 P1 Former 19 2.40
# save a copy of data in wide format
dta2W <- dta2 %>%
spread(key = Time, value = FEV1)
# rename
names(dta2W)[3:9] <- paste0("Year", names(dta2W)[3:9])
# make ellipses in pairwise plot
car::scatterplotMatrix(~ Year0 + Year3 + Year6 + Year9 + Year12 + Year15 + Year19,
data = dta2W,
smooth = F,
by.group = TRUE,
regLine = F,
ellipse = T,
diagonal = T,
pch = c(1, 20))
Divide the smoker type into current(firebrick) and former(forestgreen)
car::scatterplotMatrix(~ Year0 + Year3 + Year6 + Year9 + Year12 + Year15 + Year19 | Smoker,
data = dta2W,
smooth = F,
by.group = TRUE,
regLine = F,
ellipse = T,
diagonal = T,
pch = c(1, 20),
col = c("firebrick", "forestgreen"))
# FEV1 means & variances of differ time and smorker
aggregate(FEV1 ~ Time + Smoker, data = dta2, mean, na.rm=T)
## Time Smoker FEV1
## 1 0 Current 3.229294
## 2 3 Current 3.119474
## 3 6 Current 3.091461
## 4 9 Current 2.870706
## 5 12 Current 2.797901
## 6 15 Current 2.680959
## 7 19 Current 2.497703
## 8 0 Former 3.519130
## 9 3 Former 3.577778
## 10 6 Former 3.264643
## 11 9 Former 3.171667
## 12 12 Former 3.136207
## 13 15 Former 2.870000
## 14 19 Former 2.907857
aggregate(FEV1 ~ Time + Smoker, data = dta2, var, na.rm=T)
## Time Smoker FEV1
## 1 0 Current 0.3402876
## 2 3 Current 0.3007072
## 3 6 Current 0.3191581
## 4 9 Current 0.3165019
## 5 12 Current 0.2969943
## 6 15 Current 0.2642588
## 7 19 Current 0.2546563
## 8 0 Former 0.2358265
## 9 3 Former 0.3941026
## 10 6 Former 0.3871962
## 11 9 Former 0.3380489
## 12 12 Former 0.3760530
## 13 15 Former 0.3709391
## 14 19 Former 0.4178767
# covariance matrices
dta2W %>%
filter( Smoker == "Former") %>%
select( -(1:2)) %>%
var(na.rm = T)
## Year0 Year3 Year6 Year9 Year12 Year15 Year19
## Year0 0.3617433 0.3442611 0.3550833 0.2358222 0.2398333 0.3091844 0.2144011
## Year3 0.3442611 0.3566944 0.3479167 0.2434444 0.2455556 0.3014333 0.2141278
## Year6 0.3550833 0.3479167 0.3990278 0.2583333 0.2502778 0.3320000 0.2674722
## Year9 0.2358222 0.2434444 0.2583333 0.1926667 0.1827778 0.2322889 0.1953444
## Year12 0.2398333 0.2455556 0.2502778 0.1827778 0.2022222 0.2201111 0.1406667
## Year15 0.3091844 0.3014333 0.3320000 0.2322889 0.2201111 0.3046711 0.2351044
## Year19 0.2144011 0.2141278 0.2674722 0.1953444 0.1406667 0.2351044 0.3509656
The lowest value in the center part?
dta2W %>%
filter(Smoker == "Current") %>%
select(-(1:2)) %>%
var(na.rm = T)
## Year0 Year3 Year6 Year9 Year12 Year15 Year19
## Year0 0.2068623 0.1958723 0.1550242 0.1740866 0.1580108 0.1527199 0.1550177
## Year3 0.1958723 0.2418398 0.1676342 0.1770779 0.1647835 0.1818398 0.1675022
## Year6 0.1550242 0.1676342 0.1618623 0.1328485 0.1484870 0.1479152 0.1571320
## Year9 0.1740866 0.1770779 0.1328485 0.2496970 0.1501407 0.1508160 0.1533355
## Year12 0.1580108 0.1647835 0.1484870 0.1501407 0.1894372 0.1709978 0.1528485
## Year15 0.1527199 0.1818398 0.1479152 0.1508160 0.1709978 0.1873327 0.1654093
## Year19 0.1550177 0.1675022 0.1571320 0.1533355 0.1528485 0.1654093 0.1876729
The highest value in the center part?
# relevel factor level to produce consistent legend in the plot below
dta2W <- dta2W %>%
mutate(Smoker = factor(Smoker))
dta2 <- dta2 %>%
mutate(Smoker = factor(Smoker))
dta2W$Smoker <- relevel(dta2W$Smoker, ref="Former")
dta2$Smoker <- relevel(dta2$Smoker, ref="Former")
theme_set(theme_bw())
# means with error bars
ggplot(dta2, aes(Time, FEV1,
group = Smoker,
shape = Smoker,
linetype = Smoker)) +
stat_summary(fun = mean, geom = "line") +
stat_summary(fun = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
scale_shape_manual(values = c(1, 2)) +
labs(x = "Time (years since baseline)",
y = "Mean FEV1 (liters)",
linetype = "Smoker", shape = "Smoker") +
theme(legend.justification = c(1, 1),
legend.position = c(1, 1),
legend.background = element_rect(fill = "white",color = "black"))
Both type of smoker’s FEV1 are decreased by years.
However, the decreasing trend of current smoker is larger than former smoker.
Generalized Least Squares
dta2 <- mutate(dta2, Time_f = as.factor(Time))
summary(m0 <- gls(FEV1 ~ Smoker*Time_f, data = dta2))
## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta2
## AIC BIC logLik
## 1359.281 1428.722 -664.6407
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.519130 0.1171462 30.040506 0.0000
## SmokerCurrent -0.289836 0.1320476 -2.194938 0.0285
## Time_f3 0.058647 0.1594157 0.367889 0.7131
## Time_f6 -0.254488 0.1581008 -1.609653 0.1079
## Time_f9 -0.347464 0.1557060 -2.231537 0.0259
## Time_f12 -0.382924 0.1568667 -2.441076 0.0149
## Time_f15 -0.649130 0.1639349 -3.959684 0.0001
## Time_f19 -0.611273 0.1581008 -3.866351 0.0001
## SmokerCurrent:Time_f3 -0.168468 0.1801366 -0.935222 0.3500
## SmokerCurrent:Time_f6 0.116654 0.1795986 0.649527 0.5162
## SmokerCurrent:Time_f9 -0.011124 0.1779636 -0.062510 0.9502
## SmokerCurrent:Time_f12 -0.048469 0.1794916 -0.270037 0.7872
## SmokerCurrent:Time_f15 0.100795 0.1868469 0.539454 0.5897
## SmokerCurrent:Time_f19 -0.120318 0.1815889 -0.662585 0.5078
##
## Correlation:
## (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent -0.887
## Time_f3 -0.735 0.652
## Time_f6 -0.741 0.657 0.544
## Time_f9 -0.752 0.667 0.553 0.557
## Time_f12 -0.747 0.663 0.549 0.553 0.562
## Time_f15 -0.715 0.634 0.525 0.529 0.538 0.534
## Time_f19 -0.741 0.657 0.544 0.549 0.557 0.553 0.529
## SmokerCurrent:Time_f3 0.650 -0.733 -0.885 -0.482 -0.489 -0.486 -0.465 -0.482
## SmokerCurrent:Time_f6 0.652 -0.735 -0.479 -0.880 -0.491 -0.487 -0.466 -0.483
## SmokerCurrent:Time_f9 0.658 -0.742 -0.484 -0.488 -0.875 -0.492 -0.470 -0.488
## SmokerCurrent:Time_f12 0.653 -0.736 -0.480 -0.484 -0.491 -0.874 -0.466 -0.484
## SmokerCurrent:Time_f15 0.627 -0.707 -0.461 -0.465 -0.472 -0.468 -0.877 -0.465
## SmokerCurrent:Time_f19 0.645 -0.727 -0.474 -0.478 -0.485 -0.482 -0.461 -0.871
## SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerCurrent:Time_f3
## SmokerCurrent:Time_f6 0.539
## SmokerCurrent:Time_f9 0.544 0.546
## SmokerCurrent:Time_f12 0.539 0.541 0.546
## SmokerCurrent:Time_f15 0.518 0.520 0.524 0.520
## SmokerCurrent:Time_f19 0.533 0.535 0.540 0.535 0.514
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.97158077 -0.67377531 0.05043194 0.66748149 3.13946469
##
## Residual standard error: 0.5618133
## Degrees of freedom: 771 total; 757 residual
variance is allowed to be different for each level of Time_f*Smoker
summary(m1 <- gls(FEV1 ~ Smoker*Time_f,
weights = varIdent(form = ~ 1 | Time_f*Smoker),
data = dta2))
## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta2
## AIC BIC logLik
## 1377.981 1507.604 -660.9907
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Time_f * Smoker
## Parameter estimates:
## 0*Former 3*Former 6*Former 9*Former 15*Former 19*Former 0*Current
## 1.000000 1.292730 1.281354 1.197259 1.254158 1.331145 1.201226
## 3*Current 6*Current 9*Current 12*Current 15*Current 19*Current 12*Former
## 1.129204 1.163337 1.158488 1.122216 1.058562 1.039166 1.262785
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.519130 0.1012590 34.75377 0.0000
## SmokerCurrent -0.289836 0.1194016 -2.42741 0.0154
## Time_f3 0.058647 0.1576382 0.37204 0.7100
## Time_f6 -0.254488 0.1551834 -1.63992 0.1014
## Time_f9 -0.347464 0.1467019 -2.36850 0.0181
## Time_f12 -0.382924 0.1523839 -2.51289 0.0122
## Time_f15 -0.649130 0.1603404 -4.04845 0.0001
## Time_f19 -0.611273 0.1586741 -3.85238 0.0001
## SmokerCurrent:Time_f3 -0.168468 0.1789371 -0.94149 0.3468
## SmokerCurrent:Time_f6 0.116654 0.1779643 0.65549 0.5124
## SmokerCurrent:Time_f9 -0.011124 0.1710216 -0.06505 0.9482
## SmokerCurrent:Time_f12 -0.048469 0.1757578 -0.27577 0.7828
## SmokerCurrent:Time_f15 0.100795 0.1825716 0.55209 0.5811
## SmokerCurrent:Time_f19 -0.120318 0.1806162 -0.66615 0.5055
##
## Correlation:
## (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent -0.848
## Time_f3 -0.642 0.545
## Time_f6 -0.653 0.553 0.419
## Time_f9 -0.690 0.585 0.443 0.450
## Time_f12 -0.664 0.564 0.427 0.434 0.459
## Time_f15 -0.632 0.536 0.406 0.412 0.436 0.420
## Time_f19 -0.638 0.541 0.410 0.416 0.440 0.424 0.403
## SmokerCurrent:Time_f3 0.566 -0.667 -0.881 -0.369 -0.391 -0.376 -0.357 -0.361
## SmokerCurrent:Time_f6 0.569 -0.671 -0.365 -0.872 -0.393 -0.378 -0.359 -0.363
## SmokerCurrent:Time_f9 0.592 -0.698 -0.380 -0.386 -0.858 -0.393 -0.374 -0.378
## SmokerCurrent:Time_f12 0.576 -0.679 -0.370 -0.376 -0.398 -0.867 -0.364 -0.368
## SmokerCurrent:Time_f15 0.555 -0.654 -0.356 -0.362 -0.383 -0.369 -0.878 -0.354
## SmokerCurrent:Time_f19 0.561 -0.661 -0.360 -0.366 -0.387 -0.373 -0.354 -0.879
## SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerCurrent:Time_f3
## SmokerCurrent:Time_f6 0.448
## SmokerCurrent:Time_f9 0.466 0.468
## SmokerCurrent:Time_f12 0.453 0.456 0.474
## SmokerCurrent:Time_f15 0.436 0.439 0.457 0.444
## SmokerCurrent:Time_f19 0.441 0.444 0.462 0.449 0.432
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.04445773 -0.68139172 0.04873177 0.67419760 2.87621234
##
## Residual standard error: 0.4856209
## Degrees of freedom: 771 total; 757 residual
exponential time correlation structure
summary(m2 <- gls(FEV1 ~ Smoker*Time_f,
weights = varExp(form = ~ Time),
data = dta2))
## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta2
## AIC BIC logLik
## 1360.927 1434.996 -664.4633
##
## Variance function:
## Structure: Exponential of variance covariate
## Formula: ~Time
## Parameter estimates:
## expon
## -0.002528783
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.519130 0.1197648 29.383681 0.0000
## SmokerCurrent -0.289836 0.1349993 -2.146947 0.0321
## Time_f3 0.058647 0.1624138 0.361098 0.7181
## Time_f6 -0.254488 0.1605419 -1.585179 0.1133
## Time_f9 -0.347464 0.1576421 -2.204130 0.0278
## Time_f12 -0.382924 0.1582709 -2.419418 0.0158
## Time_f15 -0.649130 0.1645762 -3.944255 0.0001
## Time_f19 -0.611273 0.1582604 -3.862453 0.0001
## SmokerCurrent:Time_f3 -0.168468 0.1835206 -0.917977 0.3589
## SmokerCurrent:Time_f6 0.116654 0.1823482 0.639733 0.5225
## SmokerCurrent:Time_f9 -0.011124 0.1801132 -0.061764 0.9508
## SmokerCurrent:Time_f12 -0.048469 0.1810081 -0.267774 0.7889
## SmokerCurrent:Time_f15 0.100795 0.1874982 0.537580 0.5910
## SmokerCurrent:Time_f19 -0.120318 0.1815964 -0.662558 0.5078
##
## Correlation:
## (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent -0.887
## Time_f3 -0.737 0.654
## Time_f6 -0.746 0.662 0.550
## Time_f9 -0.760 0.674 0.560 0.567
## Time_f12 -0.757 0.671 0.558 0.565 0.575
## Time_f15 -0.728 0.646 0.537 0.543 0.553 0.551
## Time_f19 -0.757 0.671 0.558 0.565 0.575 0.573 0.551
## SmokerCurrent:Time_f3 0.653 -0.736 -0.885 -0.487 -0.496 -0.494 -0.475 -0.494
## SmokerCurrent:Time_f6 0.657 -0.740 -0.484 -0.880 -0.499 -0.497 -0.478 -0.497
## SmokerCurrent:Time_f9 0.665 -0.750 -0.490 -0.496 -0.875 -0.503 -0.484 -0.503
## SmokerCurrent:Time_f12 0.662 -0.746 -0.488 -0.494 -0.503 -0.874 -0.481 -0.501
## SmokerCurrent:Time_f15 0.639 -0.720 -0.471 -0.477 -0.485 -0.483 -0.878 -0.483
## SmokerCurrent:Time_f19 0.660 -0.743 -0.486 -0.492 -0.501 -0.499 -0.480 -0.871
## SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerCurrent:Time_f3
## SmokerCurrent:Time_f6 0.545
## SmokerCurrent:Time_f9 0.551 0.555
## SmokerCurrent:Time_f12 0.549 0.552 0.559
## SmokerCurrent:Time_f15 0.530 0.533 0.540 0.537
## SmokerCurrent:Time_f19 0.547 0.550 0.557 0.554 0.535
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.92874262 -0.66677628 0.05044755 0.66745312 3.16543499
##
## Residual standard error: 0.5743718
## Degrees of freedom: 771 total; 757 residual
apply exponential time correlation structure in differ type of smoker
summary(m3 <- gls(FEV1 ~ Smoker*Time_f,
weights = varExp(form = ~ Time | Smoker),
data = dta2))
## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta2
## AIC BIC logLik
## 1358.216 1436.915 -662.1079
##
## Variance function:
## Structure: Exponential of variance covariate, different strata
## Formula: ~Time | Smoker
## Parameter estimates:
## Former Current
## 0.005425870 -0.006170002
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.519130 0.1201359 29.292914 0.0000
## SmokerCurrent -0.289836 0.1354176 -2.140315 0.0326
## Time_f3 0.058647 0.1647238 0.356034 0.7219
## Time_f6 -0.254488 0.1645770 -1.546313 0.1224
## Time_f9 -0.347464 0.1631958 -2.129122 0.0336
## Time_f12 -0.382924 0.1657444 -2.310326 0.0211
## Time_f15 -0.649130 0.1752396 -3.704246 0.0002
## Time_f19 -0.611273 0.1703016 -3.589357 0.0004
## SmokerCurrent:Time_f3 -0.168468 0.1854898 -0.908232 0.3640
## SmokerCurrent:Time_f6 0.116654 0.1856193 0.628459 0.5299
## SmokerCurrent:Time_f9 -0.011124 0.1844802 -0.060302 0.9519
## SmokerCurrent:Time_f12 -0.048469 0.1868438 -0.259411 0.7954
## SmokerCurrent:Time_f15 0.100795 0.1959415 0.514415 0.6071
## SmokerCurrent:Time_f19 -0.120318 0.1909351 -0.630152 0.5288
##
## Correlation:
## (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent -0.887
## Time_f3 -0.729 0.647
## Time_f6 -0.730 0.648 0.532
## Time_f9 -0.736 0.653 0.537 0.537
## Time_f12 -0.725 0.643 0.529 0.529 0.534
## Time_f15 -0.686 0.608 0.500 0.500 0.505 0.497
## Time_f19 -0.705 0.626 0.514 0.515 0.519 0.511 0.484
## SmokerCurrent:Time_f3 0.648 -0.730 -0.888 -0.473 -0.477 -0.469 -0.444 -0.457
## SmokerCurrent:Time_f6 0.647 -0.730 -0.472 -0.887 -0.476 -0.469 -0.444 -0.457
## SmokerCurrent:Time_f9 0.651 -0.734 -0.475 -0.475 -0.885 -0.472 -0.446 -0.459
## SmokerCurrent:Time_f12 0.643 -0.725 -0.469 -0.469 -0.473 -0.887 -0.441 -0.454
## SmokerCurrent:Time_f15 0.613 -0.691 -0.447 -0.448 -0.451 -0.444 -0.894 -0.433
## SmokerCurrent:Time_f19 0.629 -0.709 -0.459 -0.459 -0.463 -0.456 -0.431 -0.892
## SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerCurrent:Time_f3
## SmokerCurrent:Time_f6 0.533
## SmokerCurrent:Time_f9 0.536 0.536
## SmokerCurrent:Time_f12 0.529 0.529 0.532
## SmokerCurrent:Time_f15 0.505 0.504 0.507 0.501
## SmokerCurrent:Time_f19 0.518 0.517 0.521 0.514 0.490
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.95176426 -0.67527408 0.04683313 0.67280125 2.86836115
##
## Residual standard error: 0.5761515
## Degrees of freedom: 771 total; 757 residual
General correlation matrix, with no additional structure.
summary(m4 <- gls(FEV1 ~ Smoker*Time_f,
weights = varIdent(form = ~ 1 | Time_f*Smoker),
correlation = corSymm(form = ~ 1 | ID),
data = dta2))
## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta2
## AIC BIC logLik
## 361.1039 587.9427 -131.5519
##
## Correlation Structure: General
## Formula: ~1 | ID
## Parameter estimate(s):
## Correlation:
## 1 2 3 4 5 6
## 2 0.852
## 3 0.856 0.888
## 4 0.849 0.846 0.894
## 5 0.841 0.853 0.891 0.892
## 6 0.861 0.866 0.896 0.872 0.935
## 7 0.753 0.756 0.844 0.789 0.789 0.866
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Time_f * Smoker
## Parameter estimates:
## 0*Former 3*Former 6*Former 9*Former 15*Former 19*Former 0*Current
## 1.0000000 1.1548253 1.0981217 1.0095093 1.1970028 1.0796092 1.1241763
## 6*Current 9*Current 12*Current 15*Current 19*Current 3*Current 12*Former
## 1.0518756 1.1013750 1.0380913 1.0158861 0.9676265 1.0363118 1.0527522
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.484680 0.09850685 35.37500 0.0000
## SmokerCurrent -0.257911 0.11599433 -2.22348 0.0265
## Time_f3 -0.024095 0.06609965 -0.36453 0.7156
## Time_f6 -0.221219 0.06278074 -3.52368 0.0005
## Time_f9 -0.328856 0.05936269 -5.53977 0.0000
## Time_f12 -0.394901 0.06223303 -6.34552 0.0000
## Time_f15 -0.479012 0.06777450 -7.06774 0.0000
## Time_f19 -0.598701 0.06580692 -9.09784 0.0000
## SmokerCurrent:Time_f3 -0.065321 0.07460797 -0.87552 0.3816
## SmokerCurrent:Time_f6 0.063890 0.07168101 0.89130 0.3730
## SmokerCurrent:Time_f9 -0.052926 0.06933374 -0.76336 0.4455
## SmokerCurrent:Time_f12 -0.014889 0.07199623 -0.20680 0.8362
## SmokerCurrent:Time_f15 -0.103704 0.07642577 -1.35692 0.1752
## SmokerCurrent:Time_f19 -0.127043 0.07531976 -1.68672 0.0921
##
## Correlation:
## (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent -0.849
## Time_f3 -0.129 0.110
## Time_f6 -0.197 0.168 0.608
## Time_f9 -0.334 0.283 0.521 0.662
## Time_f12 -0.289 0.245 0.524 0.647 0.700
## Time_f15 -0.062 0.053 0.515 0.588 0.572 0.719
## Time_f19 -0.258 0.219 0.493 0.627 0.613 0.653 0.690
## SmokerCurrent:Time_f3 0.115 -0.204 -0.886 -0.539 -0.462 -0.465 -0.456 -0.437
## SmokerCurrent:Time_f6 0.173 -0.249 -0.533 -0.876 -0.579 -0.566 -0.515 -0.549
## SmokerCurrent:Time_f9 0.286 -0.332 -0.446 -0.566 -0.856 -0.600 -0.490 -0.524
## SmokerCurrent:Time_f12 0.250 -0.326 -0.453 -0.559 -0.605 -0.864 -0.622 -0.565
## SmokerCurrent:Time_f15 0.055 -0.154 -0.457 -0.522 -0.507 -0.638 -0.887 -0.612
## SmokerCurrent:Time_f19 0.225 -0.322 -0.431 -0.548 -0.535 -0.571 -0.603 -0.874
## SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerCurrent:Time_f3
## SmokerCurrent:Time_f6 0.620
## SmokerCurrent:Time_f9 0.528 0.661
## SmokerCurrent:Time_f12 0.539 0.650 0.692
## SmokerCurrent:Time_f15 0.532 0.600 0.579 0.724
## SmokerCurrent:Time_f19 0.512 0.632 0.604 0.654 0.701
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.00857813 -0.65224622 0.05368483 0.64660517 3.17724882
##
## Residual standard error: 0.5411956
## Degrees of freedom: 771 total; 757 residual
# Likelihood ratio test
anova(m0, m1, m2, m3, m4)
## Model df AIC BIC logLik Test L.Ratio p-value
## m0 1 15 1359.2814 1428.7218 -664.6407
## m1 2 28 1377.9814 1507.6035 -660.9907 1 vs 2 7.3000 0.8860
## m2 3 16 1360.9267 1434.9965 -664.4633 2 vs 3 6.9453 0.8612
## m3 4 17 1358.2158 1436.9149 -662.1079 3 vs 4 4.7109 0.0300
## m4 5 49 361.1039 587.9427 -131.5519 4 vs 5 1061.1119 <.0001
m4
# show estimated covariance
m4 <- update(m4, method="ML")
intervals(m4, which="var-cov")
## Approximate 95% confidence intervals
##
## Correlation structure:
## lower est. upper
## cor(1,2) 0.7984104 0.8523042 0.8926498
## cor(1,3) 0.8033256 0.8559242 0.8952727
## cor(1,4) 0.7947990 0.8498786 0.8910711
## cor(1,5) 0.7853424 0.8419941 0.8846604
## cor(1,6) 0.8045082 0.8615122 0.9027897
## cor(1,7) 0.5939791 0.7501549 0.8518415
## cor(2,3) 0.8467963 0.8884937 0.9193380
## cor(2,4) 0.7906006 0.8468000 0.8888527
## cor(2,5) 0.8008437 0.8539526 0.8937347
## cor(2,6) 0.8117837 0.8671055 0.9070014
## cor(2,7) 0.6008394 0.7537021 0.8533860
## cor(3,4) 0.8536486 0.8939198 0.9235665
## cor(3,5) 0.8508037 0.8914253 0.9214563
## cor(3,6) 0.8481543 0.8962899 0.9297465
## cor(3,7) 0.7174308 0.8426634 0.9151364
## cor(4,5) 0.8516577 0.8924253 0.9224583
## cor(4,6) 0.8179564 0.8724600 0.9114402
## cor(4,7) 0.6580127 0.7872924 0.8714891
## cor(5,6) 0.9056436 0.9358770 0.9566434
## cor(5,7) 0.6478585 0.7869055 0.8752062
## cor(6,7) 0.7618984 0.8638875 0.9240729
## attr(,"label")
## [1] "Correlation structure:"
##
## Variance function:
## lower est. upper
## 3*Former 0.9687025 1.1565059 1.380719
## 6*Former 0.9312166 1.0999878 1.299347
## 9*Former 0.8537168 1.0114297 1.198278
## 15*Former 0.9942021 1.1984605 1.444684
## 19*Former 0.9035607 1.0802535 1.291499
## 0*Current 0.9718621 1.1411996 1.340042
## 6*Current 0.9271746 1.0677355 1.229605
## 9*Current 0.9585674 1.1179806 1.303905
## 12*Current 0.9069306 1.0541140 1.225183
## 15*Current 0.8851592 1.0311232 1.201157
## 19*Current 0.8413179 0.9817466 1.145615
## 3*Current 0.9129852 1.0521247 1.212469
## 12*Former 0.8828825 1.0546222 1.259769
## attr(,"label")
## [1] "Variance function:"
##
## Residual standard error:
## lower est. upper
## 0.4530627 0.5310278 0.6224094
dta2_m4 <- data.frame(dta2, fit = fitted(m4), fit0 = fitted(m0))
# use fitted values to get error bars and see if they cover the data points
ggplot(dta2_m4, aes(Time, fit,
linetype = Smoker, shape = Smoker, group = Smoker)) +
stat_summary(fun = mean, geom = "line") +
stat_summary(fun = mean, geom = "point") +
stat_summary(aes(Time, fit0), fun = mean, geom = "point", col = "skyblue") +
stat_summary(aes(Time, FEV1), fun.data = mean_se,
geom = "pointrange", col = "firebrick", alpha = .3) +
scale_shape_manual(values = c(1, 2)) +
labs(x = "Time (years since baseline)", y = "FEV1", linetype = "Smoker",
shape = "Smoker") +
theme(legend.position = c(0.9, 0.9),
legend.background = element_rect(fill = "white", color = "black"))
# residual plot
plot(m0, resid(., type = "pearson") ~ fitted(.) | Smoker,
layout = c(2, 1),
aspect = 2,
abline = 0,
pch = 20,
cex = .8,
xlab = "Ftted values", ylab = "Pearson residuals")
# residual plot
plot(m4, resid(., type = "pearson") ~ fitted(.) | Smoker,
layout = c(2, 1),
aspect = 2,
abline = 0,
pch = 20,
cex = .8,
xlab = "Ftted values", ylab = "Pearson residuals")
Question: Notice that the GLM fit above, compared with the mixed-effects analysis, underestimates the standard errors for intercept and the effect of smoking, but overestimates that for the time effect and the interaction between time and smoking. Could you explain why?
Each subject has equal chance to be assigned to either the Control group or the Experimental groups. The assignment has no influence on the pre measurement. The post measurement depends on group and on the baseline measurement (Pre).
Column 1: Group ID, Control = C, Experimental 1 = T1, Experimental 2 = T2
Column 2: Reading score at the end of the study
Column 3: Reading score at the beginning of the study
Column 4: Subject ID
“Constrained Pre-Post Analysis”
dta3 <- read.csv('reading_g3.csv')
dta3L <- gather(data=dta3, key=Time, value=Score, Pre:Post, factor_key=TRUE)
kable(head(arrange(dta3L, ID), 10))
Group | ID | Time | Score |
---|---|---|---|
C | S11 | Pre | 32 |
C | S11 | Post | 41 |
C | S12 | Pre | 42 |
C | S12 | Post | 46 |
C | S13 | Pre | 45 |
C | S13 | Post | 48 |
C | S14 | Pre | 46 |
C | S14 | Post | 54 |
C | S15 | Pre | 49 |
C | S15 | Post | 52 |
We inspect the data with a boxplot and a scatter plot.
boxp <- ggplot(dta3L ,
aes(x=Time, y=Score, col=Group))+
geom_boxplot(outlier.size=0) +
geom_point(aes(fill=Group, col=NULL),
shape=21, alpha=0.5, size=2,
position=position_jitterdodge(jitter.width=0.2)) +
theme_bw() +
xlab("")
boxp
ggplot(dta3,
aes (x=Pre, y=Post, col=Group)) +
geom_point() +
geom_smooth(method="lm", se=FALSE) +
labs(x="Pre-test reading score",
y="Post-test reading score")+
theme_minimal()
The terms of the most basic constrained model are ‘Time’ and ‘Time * Group’. By omitting ‘Group’, there is no term allowing for a difference in means at baseline between the groups. Consequently, the baseline means are assumed to be equal. However, a general rule in regression modeling is that when there is an interaction in the model, the lower ranked effects should also be included in the model. For this reason, R automatically includes Time and Group as main effects in the model when including the interaction term Time*Group. To avoid to have Group as the main effect in our model, we will create an alternative model matrix: Xalt.
X <- model.matrix(~ Time * Group, data = dta3L)
colnames(X)
[1] "(Intercept)" "TimePost" "GroupT1" "GroupT2"
[5] "TimePost:GroupT1" "TimePost:GroupT2"
Xalt <- X[, c("TimePost", "TimePost:GroupT1", "TimePost:GroupT2")]
colnames(Xalt)
[1] "TimePost" "TimePost:GroupT1" "TimePost:GroupT2"
The gls() function of the ‘nlme’ package can be used for modeling data.
The standard deviations and correlations that should be estimated are defined by the constant variance function varIdent(). By setting weights = varIdent(form = ~ 1 | Time) a separate standard deviation will be estimated for each time point and a separate correlation will be estimated for each pair of time points (= unstructured variance-covariance matrix). In this example, we expect an estimated standard deviation for Time=Pre, an estimated standard deviation for Time=Post and one estimated correlation between the pre- and post reading measurements of the same subject.
By setting weights = varIdent(form = ~ 1 | Time:Group), a separate variance is estimated for each combination of Group and Time (Pre-C Post-C Pre-T1 Post-T1 Pre-T2 Post-T2).
The argument correlation=corSymm (form = ~ 1 | ID) defines the subject levels. The correlation structure is assumed to apply only to observations within the same subject (here: ID); observations from different subjects (a different value for ‘ID’) are assumed to be uncorrelated.
m0_gls <- gls(Score ~ Xalt,
weights = varIdent(form = ~ 1 | Time),
correlation=corSymm (form = ~ 1 | ID),
data = dta3L)
summary(m0_gls)
Generalized least squares fit by REML
Model: Score ~ Xalt
Data: dta3L
AIC BIC logLik
350.2136 364.391 -168.1068
Correlation Structure: General
Formula: ~1 | ID
Parameter estimate(s):
Correlation:
1
2 0.91
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Time
Parameter estimates:
Pre Post
1.000000 1.016413
Coefficients:
Value Std.Error t-value p-value
(Intercept) 50.50000 1.2296621 41.06819 0.0000
XaltTimePost 6.38144 0.9027047 7.06924 0.0000
XaltTimePost:GroupT1 -2.80337 1.2699097 -2.20753 0.0314
XaltTimePost:GroupT2 7.15906 1.2699097 5.63745 0.0000
Correlation:
(Intr) XltTmP XTP:GT1
XaltTimePost -0.102
XaltTimePost:GroupT1 0.000 -0.703
XaltTimePost:GroupT2 0.000 -0.703 0.500
Standardized residuals:
Min Q1 Med Q3 Max
-2.7467890 -0.3836191 0.1374178 0.7475076 1.4493709
Residual standard error: 6.735137
Degrees of freedom: 60 total; 56 residual
The estimated correlation between Pre and Post is 0.91. The estimated standard deviation for the Pre measurement is 6.735 and the estimated standard deviation for the Post measurement is 1.016 * 6.735.
We can obtain predicted means using the gls() function. As expected, the predicted value is the same for three groups at baseline (Time == Pre).
To obtain the confidence intervals, the Gls() function of the rms package can be used exactly as the gls() function, leading to exactly the same predictions but with confidence intervals.
m0_Gls <- Gls(Score ~ Xalt,
weights = varIdent(form = ~ 1 | Time),
correlation=corSymm (form = ~ 1 | ID),
data = dta3L)
predictions <- cbind(dta3L, predict(m0_Gls, dta3L, conf.int=0.95))
predictions$Group2 <- factor(predictions$Group, levels=c("All", "C", "T1", "T2"))
predictions$Group2[predictions$Time=="Pre"] <- "All"
pd <- position_dodge(.08)
limits <- aes(ymax=upper , ymin=lower, shape=Group2)
pCI <- ggplot(predictions,
aes(y=linear.predictors,
x=Time)) +
geom_errorbar(limits,
width=0.09,
position=pd) +
geom_line(aes(group=Group,
y=linear.predictors),
position=pd)+
geom_point(aes(fill=Group2),
position=pd,
shape=21,
size=rel(4)) +
scale_fill_manual(values=c("white", "gray80", "gray20", "black")) +
theme_minimal() +
theme(panel.grid.major=element_blank(),
legend.title=element_blank(),
text=element_text(size=14),
legend.position="bottom") +
labs(x="Time",
y="Estimated mean with corresponding 95% CI")
pCI
In the case of non-randomized experiments or studies an equal mean at baseline cannot be assumed. Consequently, applying constrained model is not appropriate.
Timm, N.H. (1975). Multivariate Analysis with Applications in Education and Psychology, p. 490.
Liu, G.F., Lu, K., Mogg, R., et al. (2009). Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Statistics in Medicine, 28, 2509-2530.