The arrhythmogenic dose of epinephrine (ADE) is determined by infusing epinephrine into an animal until arrhythmia criterion is reached. The arrhythmia criterion was the occurrence of four intermittent or continuous premature ventricular contractions. Once the criterion has been reached, the infusion required to produce it is recorded. The ADE is then calculated by multiplying the duration of the infusion by the infusion rate. The data give the ADE for six dogs measured at ten consecutive times.
Source: Shoukri, M.M., & Chaudhary, M.A. (2007). Analysis of Correlated Data with SAS and R. p. 218.
pacman::p_load(tidyverse, nlme)
dta_1<- read.table("ade.csv", h=T, sep = ",")
head(dta_1)
## Dog Hour ADE
## 1 D11 0.0 5.70
## 2 D11 0.5 4.70
## 3 D11 1.0 4.80
## 4 D11 1.5 4.90
## 5 D11 2.0 3.88
## 6 D11 2.5 3.51
#generate new factor variable"Hour_f"
dta_1 <- mutate(dta_1, Hour_f = factor(Hour))
# set ggplot theme as black and white
theme_set(theme_bw())
# means with error bars
ggplot(dta_1, aes(Hour, ADE)) +
geom_point(alpha = .3) +
stat_summary(fun.y = mean, geom = "line") +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
labs(x = "Time (in hours)", y = "ADE")
# within-group heteroscedasticity structure
summary(m1 <- gls(ADE ~ Hour,
weights = varIdent(form = ~ 1 | Hour_f),
data = dta_1))
## Generalized least squares fit by REML
## Model: ADE ~ Hour
## Data: dta_1
## AIC BIC logLik
## 196.6523 221.3776 -86.32617
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Hour_f
## Parameter estimates:
## 0 0.5 1 1.5 2 2.5 3
## 1.00000000 0.55447427 0.64132902 0.51921510 0.43187908 0.08245758 0.10901472
## 3.5 4 4.5
## 0.17046144 0.21770576 0.11133075
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 5.435552 0.27441443 19.807821 0
## Hour -0.701874 0.08386437 -8.369153 0
##
## Correlation:
## (Intr)
## Hour -0.964
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.3854719 -0.6272929 -0.1826742 0.5824007 2.3554705
##
## Residual standard error: 3.508619
## Degrees of freedom: 60 total; 58 residual
# auto-regression
summary(m2 <- gls(ADE ~ Hour,
weights = varIdent(form = ~ 1 | Hour_f),
correlation = corAR1(form = ~ Hour | Dog),
data = dta_1))
## Generalized least squares fit by REML
## Model: ADE ~ Hour
## Data: dta_1
## AIC BIC logLik
## 168.5913 195.3771 -71.29566
##
## Correlation Structure: ARMA(1,0)
## Formula: ~Hour | Dog
## Parameter estimate(s):
## Phi1
## 0.7553498
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Hour_f
## Parameter estimates:
## 0 0.5 1 1.5 2 2.5 3
## 1.00000000 0.32803241 0.37194739 0.28590639 0.34018814 0.12571538 0.09896982
## 3.5 4 4.5
## 0.16894933 0.22219099 0.09978382
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 4.749848 0.3999065 11.877395 0
## Hour -0.599160 0.1096671 -5.463442 0
##
## Correlation:
## (Intr)
## Hour -0.94
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.43566691 -0.09530459 0.29216375 1.34205811 5.06634565
##
## Residual standard error: 3.278488
## Degrees of freedom: 60 total; 58 residual
# compound symmetry
summary(m3 <- gls(ADE ~ Hour,
weights = varIdent(form = ~ 1 | Hour_f),
correlation = corCompSymm(form = ~ 1 | Dog),
data = dta_1))
## Generalized least squares fit by REML
## Model: ADE ~ Hour
## Data: dta_1
## AIC BIC logLik
## 161.6077 188.3935 -67.80387
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | Dog
## Parameter estimate(s):
## Rho
## 0.7217933
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Hour_f
## Parameter estimates:
## 0 0.5 1 1.5 2 2.5 3 3.5
## 1.0000000 0.5100528 0.6140381 0.5228105 0.4479331 0.1852681 0.1129502 0.1611728
## 4 4.5
## 0.2309740 0.1131425
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 4.494975 0.30766482 14.609974 0
## Hour -0.529169 0.06981253 -7.579863 0
##
## Correlation:
## (Intr)
## Hour -0.957
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.66528429 -0.02390899 0.26405734 0.89073611 2.44460225
##
## Residual standard error: 3.832969
## Degrees of freedom: 60 total; 58 residual
#
summary(m3a <- gls(ADE ~ Hour,
weights = varExp(form = ~ Hour),
correlation = corCompSymm(form = ~ 1 | Dog),
data = dta_1))
## Generalized least squares fit by REML
## Model: ADE ~ Hour
## Data: dta_1
## AIC BIC logLik
## 163.9703 174.2725 -76.98516
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | Dog
## Parameter estimate(s):
## Rho
## 0.6145863
## Variance function:
## Structure: Exponential of variance covariate
## Formula: ~Hour
## Parameter estimates:
## expon
## -0.4204529
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 5.302798 0.6777356 7.824287 0
## Hour -0.674162 0.1346238 -5.007751 0
##
## Correlation:
## (Intr)
## Hour -0.982
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.1017685 -0.3945215 -0.1031040 0.3711706 2.7263751
##
## Residual standard error: 3.079988
## Degrees of freedom: 60 total; 58 residual
summary(m3b <- lme(ADE ~ Hour, random = ~ 1 | Dog,
weights = varIdent(form = ~ 1 | Hour_f),
correlation = corCompSymm(form = ~ 1 | Dog),
data = dta_1))
## Linear mixed-effects model fit by REML
## Data: dta_1
## AIC BIC logLik
## 153.2468 182.093 -62.62338
##
## Random effects:
## Formula: ~1 | Dog
## (Intercept) Residual
## StdDev: 0.3654553 3.928157
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | Dog
## Parameter estimate(s):
## Rho
## 0.8457657
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Hour_f
## Parameter estimates:
## 0 0.5 1 1.5 2 2.5 3
## 1.00000000 0.54618337 0.63785633 0.53997927 0.47199948 0.16064282 0.06300446
## 3.5 4 4.5
## 0.21603493 0.25337758 0.18268767
## Fixed effects: ADE ~ Hour
## Value Std.Error DF t-value p-value
## (Intercept) 4.811462 0.23464110 53 20.50562 0
## Hour -0.612618 0.05378694 53 -11.38973 0
## Correlation:
## (Intr)
## Hour -0.745
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -0.9223391 -0.1299528 0.1645922 0.6995419 2.4095565
##
## Number of Observations: 60
## Number of Groups: 6
# comparing models
anova(m1, m2)
## Model df AIC BIC logLik Test L.Ratio p-value
## m1 1 12 196.6523 221.3776 -86.32617
## m2 2 13 168.5913 195.3771 -71.29566 1 vs 2 30.06102 <.0001
#
anova(m1, m3, m3a)
## Model df AIC BIC logLik Test L.Ratio p-value
## m1 1 12 196.6523 221.3776 -86.32617
## m3 2 13 161.6077 188.3935 -67.80387 1 vs 2 37.04460 <.0001
## m3a 3 5 163.9703 174.2725 -76.98516 2 vs 3 18.36258 0.0187
# residual plot
plot(m3, resid(., type = "pearson") ~ fitted(.) | Dog,
layout = c(6, 1), aspect = 2, abline = 0, pch = 20, cex = .8,
xlab = "Ftted values", ylab = "Pearson residuals")
In the Netherlands, residents in the rural area of Vlagtwedde in the north-east and residents in the urban, industrial area of Vlaardingen in the south-west, were recruited to participate in a study on chronic obstructive lung diseases. The participants, initially aged 15-44, were surveyed approximately every 3 years for up to 21 years. At each survey, information on respiratory symptoms and smoking status was collected by questionnaire and spirometry was performed. Pulmonary function was determined by spirometry and a measure of forced expiratory volume (FEV1) was obtained every three years for the first 15 years of the study, and also at year 19. The number of repeated measurements of FEV1 on each subject varied from 1 to 7.
Here the dataset is comprised of a sub-sample of 133 residents aged 36 or older at their entry into the study and whose smoking status did not change over the 19 years of follow-up. Each participant was either a current or former smoker. Current smoking was defined as smoking at least one cigarette per day.
Source: van der Lende, R., Kok, T.J., Peset, R., Quanjer, P.H., Schouten, J.P. & Orie, N.G.M. (1981). Decreases in VC and FEV1 with time: Indicators for effects of smoking and air pollution. Bulletin of European Physiopathology and Respiration, 17, 775-792.
pacman::p_load(tidyverse, nlme, car)
dta_2<-read.table("fev_smoker.csv", h=T, sep=",")
head(dta_2)
## ID Smoker Time FEV1
## 1 P1 Former 0 3.40
## 2 P1 Former 3 3.40
## 3 P1 Former 6 3.45
## 4 P1 Former 9 3.20
## 5 P1 Former 15 2.95
## 6 P1 Former 19 2.40
# save a copy of data in wide format
dtaW <- dta_2 %>%
spread(key = Time, value = FEV1)
names(dtaW)[3:9] <- paste0("Year", names(dtaW)[3:9])
# make ellipses in pairwise plot
scatterplotMatrix(~ Year0 + Year3 + Year6 + Year9 + Year12 + Year15 + Year19 | Smoker, data = dtaW, smooth = F, by.group = TRUE,
reg.line = F, ellipse = T, diag = "none", pch = c(1, 20),
col = c("firebrick", "forestgreen"))
# means & variances
aggregate(FEV1 ~ Time + Smoker, data = dta_2, mean, na.rm=T)
## Time Smoker FEV1
## 1 0 Current 3.229294
## 2 3 Current 3.119474
## 3 6 Current 3.091461
## 4 9 Current 2.870706
## 5 12 Current 2.797901
## 6 15 Current 2.680959
## 7 19 Current 2.497703
## 8 0 Former 3.519130
## 9 3 Former 3.577778
## 10 6 Former 3.264643
## 11 9 Former 3.171667
## 12 12 Former 3.136207
## 13 15 Former 2.870000
## 14 19 Former 2.907857
aggregate(FEV1 ~ Time + Smoker, data = dta_2, var, na.rm=T)
## Time Smoker FEV1
## 1 0 Current 0.3402876
## 2 3 Current 0.3007072
## 3 6 Current 0.3191581
## 4 9 Current 0.3165019
## 5 12 Current 0.2969943
## 6 15 Current 0.2642588
## 7 19 Current 0.2546563
## 8 0 Former 0.2358265
## 9 3 Former 0.3941026
## 10 6 Former 0.3871962
## 11 9 Former 0.3380489
## 12 12 Former 0.3760530
## 13 15 Former 0.3709391
## 14 19 Former 0.4178767
# covariance matrices
dtaW %>%
filter( Smoker == "Former") %>%
select( -(1:2)) %>%
var(na.rm = T)
## Year0 Year3 Year6 Year9 Year12 Year15 Year19
## Year0 0.3617433 0.3442611 0.3550833 0.2358222 0.2398333 0.3091844 0.2144011
## Year3 0.3442611 0.3566944 0.3479167 0.2434444 0.2455556 0.3014333 0.2141278
## Year6 0.3550833 0.3479167 0.3990278 0.2583333 0.2502778 0.3320000 0.2674722
## Year9 0.2358222 0.2434444 0.2583333 0.1926667 0.1827778 0.2322889 0.1953444
## Year12 0.2398333 0.2455556 0.2502778 0.1827778 0.2022222 0.2201111 0.1406667
## Year15 0.3091844 0.3014333 0.3320000 0.2322889 0.2201111 0.3046711 0.2351044
## Year19 0.2144011 0.2141278 0.2674722 0.1953444 0.1406667 0.2351044 0.3509656
# "Current" covariance matrix
dtaW %>%
filter( Smoker == "Current") %>%
select( -(1:2)) %>%
var(na.rm = T)
## Year0 Year3 Year6 Year9 Year12 Year15 Year19
## Year0 0.2068623 0.1958723 0.1550242 0.1740866 0.1580108 0.1527199 0.1550177
## Year3 0.1958723 0.2418398 0.1676342 0.1770779 0.1647835 0.1818398 0.1675022
## Year6 0.1550242 0.1676342 0.1618623 0.1328485 0.1484870 0.1479152 0.1571320
## Year9 0.1740866 0.1770779 0.1328485 0.2496970 0.1501407 0.1508160 0.1533355
## Year12 0.1580108 0.1647835 0.1484870 0.1501407 0.1894372 0.1709978 0.1528485
## Year15 0.1527199 0.1818398 0.1479152 0.1508160 0.1709978 0.1873327 0.1654093
## Year19 0.1550177 0.1675022 0.1571320 0.1533355 0.1528485 0.1654093 0.1876729
#
str(dtaW)
## 'data.frame': 133 obs. of 9 variables:
## $ ID : chr "P1" "P10" "P100" "P101" ...
## $ Smoker: chr "Former" "Current" "Current" "Current" ...
## $ Year0 : num 3.4 2.65 3 2.7 2.8 NA NA 4.1 2.07 3.25 ...
## $ Year3 : num 3.4 NA 2.75 3 3.35 3.1 4.4 3.65 2.2 3.3 ...
## $ Year6 : num 3.45 3.3 2.45 NA NA NA 3.5 3.9 2.35 3.2 ...
## $ Year9 : num 3.2 2.4 NA NA 2.95 2.75 3.6 2.75 2.25 3.3 ...
## $ Year12: num NA 2.15 NA 2.25 2.5 2.5 NA 3.35 2.2 2.85 ...
## $ Year15: num 2.95 2.15 2.2 1.85 2.45 1.95 3.4 2.95 NA 3.2 ...
## $ Year19: num 2.4 2.25 2.18 1.6 NA 2.65 3.2 3.17 NA 3.1 ...
# relevel factor level to produce consistent legend in the plot below
#dta_2$Smoker <- relevel(dta_2$Smoker, ref="Former")
# set theme as black and white
theme_set(theme_bw())
# means with error bars
ggplot(dta_2, aes(Time, FEV1,
group = Smoker, shape = Smoker, linetype = Smoker)) +
stat_summary(fun.y = mean, geom = "line") +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
scale_shape_manual(values = c(1, 2)) +
labs(x = "Time (years since baseline)", y = "Mean FEV1 (liters)",
linetype = "Smoker", shape = "Smoker") +
theme(legend.justification = c(1, 1), legend.position = c(1, 1),
legend.background = element_rect(fill = "white", color = "black"))
# add new variable "Time_f" as factor variable
dta_2 <- mutate(dta_2, Time_f = as.factor(Time))
# FEV1 與現在是否抽煙以及抽煙時間的交互作用
summary(m0 <- gls(FEV1 ~ Smoker*Time_f, data = dta_2))
## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta_2
## AIC BIC logLik
## 1359.281 1428.722 -664.6407
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.229294 0.06093721 52.99380 0.0000
## SmokerFormer 0.289836 0.13204761 2.19494 0.0285
## Time_f3 -0.109820 0.08387973 -1.30926 0.1908
## Time_f6 -0.137833 0.08520443 -1.61768 0.1061
## Time_f9 -0.358588 0.08617823 -4.16101 0.0000
## Time_f12 -0.431393 0.08723567 -4.94514 0.0000
## Time_f15 -0.548335 0.08964988 -6.11641 0.0000
## Time_f19 -0.731591 0.08932341 -8.19037 0.0000
## SmokerFormer:Time_f3 0.168468 0.18013659 0.93522 0.3500
## SmokerFormer:Time_f6 -0.116654 0.17959864 -0.64953 0.5162
## SmokerFormer:Time_f9 0.011124 0.17796363 0.06251 0.9502
## SmokerFormer:Time_f12 0.048469 0.17949158 0.27004 0.7872
## SmokerFormer:Time_f15 -0.100795 0.18684687 -0.53945 0.5897
## SmokerFormer:Time_f19 0.120318 0.18158895 0.66259 0.5078
##
## Correlation:
## (Intr) SmkrFr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerFormer -0.461
## Time_f3 -0.726 0.335
## Time_f6 -0.715 0.330 0.520
## Time_f9 -0.707 0.326 0.514 0.506
## Time_f12 -0.699 0.322 0.507 0.500 0.494
## Time_f15 -0.680 0.314 0.494 0.486 0.481 0.475
## Time_f19 -0.682 0.315 0.496 0.488 0.482 0.477 0.464
## SmokerFormer:Time_f3 0.338 -0.733 -0.466 -0.242 -0.239 -0.236 -0.230 -0.231
## SmokerFormer:Time_f6 0.339 -0.735 -0.246 -0.474 -0.240 -0.237 -0.231 -0.231
## SmokerFormer:Time_f9 0.342 -0.742 -0.249 -0.245 -0.484 -0.239 -0.233 -0.234
## SmokerFormer:Time_f12 0.339 -0.736 -0.247 -0.243 -0.240 -0.486 -0.231 -0.232
## SmokerFormer:Time_f15 0.326 -0.707 -0.237 -0.233 -0.231 -0.228 -0.480 -0.222
## SmokerFormer:Time_f19 0.336 -0.727 -0.244 -0.240 -0.237 -0.234 -0.228 -0.492
## SF:T_3 SF:T_6 SF:T_9 SF:T_12 SF:T_15
## SmokerFormer
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerFormer:Time_f3
## SmokerFormer:Time_f6 0.539
## SmokerFormer:Time_f9 0.544 0.546
## SmokerFormer:Time_f12 0.539 0.541 0.546
## SmokerFormer:Time_f15 0.518 0.520 0.524 0.520
## SmokerFormer:Time_f19 0.533 0.535 0.540 0.535 0.514
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.97158077 -0.67377531 0.05043194 0.66748149 3.13946469
##
## Residual standard error: 0.5618133
## Degrees of freedom: 771 total; 757 residual
交互作用不顯著
# different variances per stratum
summary(m1 <- gls(FEV1 ~ Smoker*Time_f,
weights = varIdent(form = ~ 1 | Time_f*Smoker),
data = dta_2))
## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta_2
## AIC BIC logLik
## 1377.981 1507.604 -660.9907
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Time_f * Smoker
## Parameter estimates:
## 0*Former 3*Former 6*Former 9*Former 15*Former 19*Former 0*Current
## 1.000000 1.292710 1.281344 1.197259 1.254151 1.331139 1.201218
## 3*Current 6*Current 9*Current 12*Current 15*Current 19*Current 12*Former
## 1.129204 1.163335 1.158479 1.122197 1.058560 1.039136 1.262762
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.229294 0.06327225 51.03808 0.0000
## SmokerFormer 0.289836 0.11940236 2.42739 0.0154
## Time_f3 -0.109820 0.08466840 -1.29707 0.1950
## Time_f6 -0.137833 0.08711755 -1.58215 0.1140
## Time_f9 -0.358588 0.08790306 -4.07936 0.0000
## Time_f12 -0.431393 0.08757802 -4.92581 0.0000
## Time_f15 -0.548335 0.08731204 -6.28018 0.0000
## Time_f19 -0.731591 0.08628221 -8.47905 0.0000
## SmokerFormer:Time_f3 0.168468 0.17893718 0.94149 0.3468
## SmokerFormer:Time_f6 -0.116654 0.17796497 -0.65549 0.5124
## SmokerFormer:Time_f9 0.011124 0.17102269 0.06505 0.9482
## SmokerFormer:Time_f12 0.048469 0.17575742 0.27577 0.7828
## SmokerFormer:Time_f15 -0.100795 0.18257248 -0.55208 0.5811
## SmokerFormer:Time_f19 0.120318 0.18061661 0.66615 0.5055
##
## Correlation:
## (Intr) SmkrFr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerFormer -0.530
## Time_f3 -0.747 0.396
## Time_f6 -0.726 0.385 0.543
## Time_f9 -0.720 0.381 0.538 0.523
## Time_f12 -0.722 0.383 0.540 0.525 0.520
## Time_f15 -0.725 0.384 0.542 0.526 0.522 0.524
## Time_f19 -0.733 0.389 0.548 0.533 0.528 0.530 0.531
## SmokerFormer:Time_f3 0.354 -0.667 -0.473 -0.257 -0.255 -0.255 -0.256 -0.259
## SmokerFormer:Time_f6 0.356 -0.671 -0.266 -0.490 -0.256 -0.257 -0.258 -0.261
## SmokerFormer:Time_f9 0.370 -0.698 -0.276 -0.269 -0.514 -0.267 -0.268 -0.271
## SmokerFormer:Time_f12 0.360 -0.679 -0.269 -0.261 -0.259 -0.498 -0.261 -0.264
## SmokerFormer:Time_f15 0.347 -0.654 -0.259 -0.252 -0.249 -0.250 -0.478 -0.254
## SmokerFormer:Time_f19 0.350 -0.661 -0.262 -0.254 -0.252 -0.253 -0.254 -0.478
## SF:T_3 SF:T_6 SF:T_9 SF:T_12 SF:T_15
## SmokerFormer
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerFormer:Time_f3
## SmokerFormer:Time_f6 0.448
## SmokerFormer:Time_f9 0.466 0.468
## SmokerFormer:Time_f12 0.453 0.456 0.474
## SmokerFormer:Time_f15 0.436 0.439 0.457 0.444
## SmokerFormer:Time_f19 0.441 0.444 0.462 0.449 0.432
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.04443149 -0.68139604 0.04873136 0.67419703 2.87624105
##
## Residual standard error: 0.4856249
## Degrees of freedom: 771 total; 757 residual
#Exponential of the variance covariate, specifying a variance covariate
summary(m2 <- gls(FEV1 ~ Smoker*Time_f,
weights = varExp(form = ~ Time),
data = dta_2))
## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta_2
## AIC BIC logLik
## 1360.927 1434.996 -664.4633
##
## Variance function:
## Structure: Exponential of variance covariate
## Formula: ~Time
## Parameter estimates:
## expon
## -0.002528783
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.229294 0.06229936 51.83511 0.0000
## SmokerFormer 0.289836 0.13499932 2.14695 0.0321
## Time_f3 -0.109820 0.08544929 -1.28521 0.1991
## Time_f6 -0.137833 0.08647075 -1.59399 0.1114
## Time_f9 -0.358588 0.08711898 -4.11607 0.0000
## Time_f12 -0.431393 0.08783081 -4.91163 0.0000
## Time_f15 -0.548335 0.08983467 -6.10383 0.0000
## Time_f19 -0.731591 0.08905559 -8.21500 0.0000
## SmokerFormer:Time_f3 0.168468 0.18352063 0.91798 0.3589
## SmokerFormer:Time_f6 -0.116654 0.18234823 -0.63973 0.5225
## SmokerFormer:Time_f9 0.011124 0.18011317 0.06176 0.9508
## SmokerFormer:Time_f12 0.048469 0.18100811 0.26777 0.7889
## SmokerFormer:Time_f15 -0.100795 0.18749823 -0.53758 0.5910
## SmokerFormer:Time_f19 0.120318 0.18159639 0.66256 0.5078
##
## Correlation:
## (Intr) SmkrFr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerFormer -0.461
## Time_f3 -0.729 0.336
## Time_f6 -0.720 0.332 0.525
## Time_f9 -0.715 0.330 0.521 0.515
## Time_f12 -0.709 0.327 0.517 0.511 0.507
## Time_f15 -0.693 0.320 0.506 0.500 0.496 0.492
## Time_f19 -0.700 0.323 0.510 0.504 0.500 0.496 0.485
## SmokerFormer:Time_f3 0.339 -0.736 -0.466 -0.245 -0.243 -0.241 -0.235 -0.237
## SmokerFormer:Time_f6 0.342 -0.740 -0.249 -0.474 -0.244 -0.242 -0.237 -0.239
## SmokerFormer:Time_f9 0.346 -0.750 -0.252 -0.249 -0.484 -0.245 -0.240 -0.242
## SmokerFormer:Time_f12 0.344 -0.746 -0.251 -0.248 -0.246 -0.485 -0.239 -0.241
## SmokerFormer:Time_f15 0.332 -0.720 -0.242 -0.239 -0.238 -0.236 -0.479 -0.232
## SmokerFormer:Time_f19 0.343 -0.743 -0.250 -0.247 -0.245 -0.243 -0.238 -0.490
## SF:T_3 SF:T_6 SF:T_9 SF:T_12 SF:T_15
## SmokerFormer
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerFormer:Time_f3
## SmokerFormer:Time_f6 0.545
## SmokerFormer:Time_f9 0.551 0.555
## SmokerFormer:Time_f12 0.549 0.552 0.559
## SmokerFormer:Time_f15 0.530 0.533 0.540 0.537
## SmokerFormer:Time_f19 0.547 0.550 0.557 0.554 0.535
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.92874262 -0.66677628 0.05044755 0.66745312 3.16543499
##
## Residual standard error: 0.5743718
## Degrees of freedom: 771 total; 757 residual
#Exponential of the variance covariate, a grouping factor "Smoker" for the coefficients
summary(m3 <- gls(FEV1 ~ Smoker*Time_f,
weights = varExp(form = ~ Time | Smoker),
data = dta_2))
## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta_2
## AIC BIC logLik
## 1358.216 1436.915 -662.1079
##
## Variance function:
## Structure: Exponential of variance covariate, different strata
## Formula: ~Time | Smoker
## Parameter estimates:
## Former Current
## 0.005425870 -0.006170002
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.229294 0.06249240 51.67499 0.0000
## SmokerFormer 0.289836 0.13541763 2.14031 0.0326
## Time_f3 -0.109820 0.08527911 -1.28778 0.1982
## Time_f6 -0.137833 0.08584232 -1.60566 0.1088
## Time_f9 -0.358588 0.08602381 -4.16848 0.0000
## Time_f12 -0.431393 0.08625192 -5.00155 0.0000
## Time_f15 -0.548335 0.08765943 -6.25529 0.0000
## Time_f19 -0.731591 0.08633408 -8.47396 0.0000
## SmokerFormer:Time_f3 0.168468 0.18548978 0.90823 0.3640
## SmokerFormer:Time_f6 -0.116654 0.18561925 -0.62846 0.5299
## SmokerFormer:Time_f9 0.011124 0.18448022 0.06030 0.9519
## SmokerFormer:Time_f12 0.048469 0.18684375 0.25941 0.7954
## SmokerFormer:Time_f15 -0.100795 0.19594152 -0.51441 0.6071
## SmokerFormer:Time_f19 0.120318 0.19093508 0.63015 0.5288
##
## Correlation:
## (Intr) SmkrFr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerFormer -0.461
## Time_f3 -0.733 0.338
## Time_f6 -0.728 0.336 0.533
## Time_f9 -0.726 0.335 0.532 0.529
## Time_f12 -0.725 0.334 0.531 0.527 0.526
## Time_f15 -0.713 0.329 0.522 0.519 0.518 0.517
## Time_f19 -0.724 0.334 0.530 0.527 0.526 0.524 0.516
## SmokerFormer:Time_f3 0.337 -0.730 -0.460 -0.245 -0.245 -0.244 -0.240 -0.244
## SmokerFormer:Time_f6 0.337 -0.730 -0.247 -0.462 -0.245 -0.244 -0.240 -0.244
## SmokerFormer:Time_f9 0.339 -0.734 -0.248 -0.247 -0.466 -0.245 -0.241 -0.245
## SmokerFormer:Time_f12 0.334 -0.725 -0.245 -0.243 -0.243 -0.462 -0.238 -0.242
## SmokerFormer:Time_f15 0.319 -0.691 -0.234 -0.232 -0.232 -0.231 -0.447 -0.231
## SmokerFormer:Time_f19 0.327 -0.709 -0.240 -0.238 -0.238 -0.237 -0.233 -0.452
## SF:T_3 SF:T_6 SF:T_9 SF:T_12 SF:T_15
## SmokerFormer
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerFormer:Time_f3
## SmokerFormer:Time_f6 0.533
## SmokerFormer:Time_f9 0.536 0.536
## SmokerFormer:Time_f12 0.529 0.529 0.532
## SmokerFormer:Time_f15 0.505 0.504 0.507 0.501
## SmokerFormer:Time_f19 0.518 0.517 0.521 0.514 0.490
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.95176426 -0.67527408 0.04683313 0.67280125 2.86836115
##
## Residual standard error: 0.5761515
## Degrees of freedom: 771 total; 757 residual
#unstructure correalation within ID,
summary(m4 <- gls(FEV1 ~ Smoker*Time_f,
correlation = corSymm(form = ~ 1 | ID),
weights = varIdent(form = ~ 1 | Time_f*Smoker),
data = dta_2))
## Generalized least squares fit by REML
## Model: FEV1 ~ Smoker * Time_f
## Data: dta_2
## AIC BIC logLik
## 361.1039 587.9427 -131.5519
##
## Correlation Structure: General
## Formula: ~1 | ID
## Parameter estimate(s):
## Correlation:
## 1 2 3 4 5 6
## 2 0.852
## 3 0.856 0.888
## 4 0.849 0.846 0.894
## 5 0.841 0.853 0.891 0.892
## 6 0.861 0.866 0.896 0.872 0.935
## 7 0.753 0.756 0.844 0.789 0.789 0.866
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Time_f * Smoker
## Parameter estimates:
## 0*Former 3*Former 6*Former 9*Former 15*Former 19*Former 0*Current
## 1.0000000 1.1548304 1.0981236 1.0095120 1.1970056 1.0796139 1.1241811
## 6*Current 9*Current 12*Current 15*Current 19*Current 3*Current 12*Former
## 1.0518792 1.1013787 1.0380963 1.0158854 0.9676307 1.0363139 1.0527553
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.226768 0.06124614 52.68525 0.0000
## SmokerFormer 0.257911 0.11599406 2.22349 0.0265
## Time_f3 -0.089416 0.03460035 -2.58424 0.0099
## Time_f6 -0.157329 0.03459404 -4.54788 0.0000
## Time_f9 -0.381781 0.03582235 -10.65763 0.0000
## Time_f12 -0.409790 0.03620099 -11.31984 0.0000
## Time_f15 -0.582715 0.03532020 -16.49808 0.0000
## Time_f19 -0.725744 0.03664045 -19.80719 0.0000
## SmokerFormer:Time_f3 0.065320 0.07460792 0.87551 0.3816
## SmokerFormer:Time_f6 -0.063890 0.07168090 -0.89131 0.3730
## SmokerFormer:Time_f9 0.052926 0.06933367 0.76335 0.4455
## SmokerFormer:Time_f12 0.014888 0.07199618 0.20679 0.8362
## SmokerFormer:Time_f15 0.103703 0.07642577 1.35691 0.1752
## SmokerFormer:Time_f19 0.127043 0.07531983 1.68672 0.0921
##
## Correlation:
## (Intr) SmkrFr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerFormer -0.528
## Time_f3 -0.434 0.229
## Time_f6 -0.399 0.211 0.664
## Time_f9 -0.329 0.174 0.556 0.661
## Time_f12 -0.430 0.227 0.588 0.663 0.667
## Time_f15 -0.439 0.232 0.592 0.639 0.606 0.744
## Time_f19 -0.509 0.269 0.578 0.649 0.581 0.656 0.739
## SmokerFormer:Time_f3 0.201 -0.204 -0.464 -0.308 -0.258 -0.273 -0.274 -0.268
## SmokerFormer:Time_f6 0.193 -0.249 -0.320 -0.483 -0.319 -0.320 -0.309 -0.313
## SmokerFormer:Time_f9 0.170 -0.332 -0.287 -0.341 -0.517 -0.345 -0.313 -0.300
## SmokerFormer:Time_f12 0.216 -0.326 -0.296 -0.333 -0.336 -0.503 -0.374 -0.330
## SmokerFormer:Time_f15 0.203 -0.154 -0.273 -0.295 -0.280 -0.344 -0.462 -0.342
## SmokerFormer:Time_f19 0.248 -0.322 -0.281 -0.316 -0.283 -0.319 -0.360 -0.486
## SF:T_3 SF:T_6 SF:T_9 SF:T_12 SF:T_15
## SmokerFormer
## Time_f3
## Time_f6
## Time_f9
## Time_f12
## Time_f15
## Time_f19
## SmokerFormer:Time_f3
## SmokerFormer:Time_f6 0.620
## SmokerFormer:Time_f9 0.528 0.661
## SmokerFormer:Time_f12 0.539 0.650 0.692
## SmokerFormer:Time_f15 0.532 0.600 0.579 0.724
## SmokerFormer:Time_f19 0.512 0.632 0.604 0.654 0.701
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.00858236 -0.65224589 0.05368462 0.64660671 3.17725120
##
## Residual standard error: 0.5411937
## Degrees of freedom: 771 total; 757 residual
# Likelihood ratio test
anova(m0, m1, m2, m3, m4)
## Model df AIC BIC logLik Test L.Ratio p-value
## m0 1 15 1359.2814 1428.7218 -664.6407
## m1 2 28 1377.9814 1507.6035 -660.9907 1 vs 2 7.3000 0.8860
## m2 3 16 1360.9267 1434.9965 -664.4633 2 vs 3 6.9453 0.8612
## m3 4 17 1358.2158 1436.9149 -662.1079 3 vs 4 4.7109 0.0300
## m4 5 49 361.1039 587.9427 -131.5519 4 vs 5 1061.1119 <.0001
m4 model got the smallest AIC
# show estimated covariance
intervals(m4, which="var-cov")
## Approximate 95% confidence intervals
##
## Correlation structure:
## lower est. upper
## cor(1,2) 0.7967346 0.8517421 0.8927522
## cor(1,3) 0.8018409 0.8555031 0.8954764
## cor(1,4) 0.7931020 0.8492935 0.8911503
## cor(1,5) 0.7834071 0.8412936 0.8847095
## cor(1,6) 0.8022351 0.8606287 0.9027113
## cor(1,7) 0.5941630 0.7527582 0.8550134
## cor(2,3) 0.8454823 0.8879922 0.9193183
## cor(2,4) 0.7892002 0.8463215 0.8889253
## cor(2,5) 0.7990307 0.8531834 0.8936072
## cor(2,6) 0.8099224 0.8662820 0.9067911
## cor(2,7) 0.6039538 0.7559247 0.8548582
## cor(3,4) 0.8526820 0.8935462 0.9235429
## cor(3,5) 0.8494824 0.8907973 0.9212547
## cor(3,6) 0.8468453 0.8956741 0.9295293
## cor(3,7) 0.7199037 0.8441776 0.9160061
## cor(4,5) 0.8505888 0.8918408 0.9221822
## cor(4,6) 0.8167268 0.8717991 0.9111321
## cor(4,7) 0.6602163 0.7892229 0.8729961
## cor(5,6) 0.9042969 0.9351723 0.9563153
## cor(5,7) 0.6503844 0.7894474 0.8773089
## cor(6,7) 0.7630385 0.8655912 0.9256331
## attr(,"label")
## [1] "Correlation structure:"
##
## Variance function:
## lower est. upper
## 3*Former 0.9770507 1.1548304 1.364958
## 6*Former 0.9386342 1.0981236 1.284713
## 9*Former 0.8626250 1.0095120 1.181411
## 15*Former 1.0176955 1.1970056 1.407909
## 19*Former 0.9239548 1.0796139 1.261497
## 0*Current 0.9793513 1.1241811 1.290429
## 6*Current 0.9232082 1.0518792 1.198484
## 9*Current 0.9643554 1.1013787 1.257871
## 12*Current 0.9081244 1.0380963 1.186670
## 15*Current 0.8911809 1.0158854 1.158040
## 19*Current 0.8487620 0.9676307 1.103147
## 3*Current 0.9099665 1.0363139 1.180204
## 12*Former 0.9069074 1.0527553 1.222058
## attr(,"label")
## [1] "Variance function:"
##
## Residual standard error:
## lower est. upper
## 0.4662662 0.5411937 0.6281617
# residual plot
plot(m4, resid(., type = "pearson") ~ fitted(.) | Smoker,
layout = c(2, 1), aspect = 2, abline = 0, pch = 20, cex = .8,
xlab = "Ftted values", ylab = "Pearson residuals")
# generate new data frame from fiitted value of m4 and m0 models
dta_m4 <- data.frame(dta_2, fit = fitted(m4), fit0 = fitted(m0))
# use fitted values to get error bars and see if they cover the data points
ggplot(dta_m4, aes(Time, fit, linetype = Smoker, shape = Smoker,
group = Smoker)) +
stat_summary(fun.y = mean, geom = "line") +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(aes(Time, fit0), fun.y = mean, geom = "point", col = "skyblue") +
stat_summary(aes(Time, FEV1), fun.data = mean_se,
geom = "pointrange", col = "firebrick", alpha = .3) +
scale_shape_manual(values = c(1, 2)) +
labs(x = "Time (years since baseline)", y = "FEV1", linetype = "Smoker",
shape = "Smoker") +
theme(legend.position = c(0.9, 0.9),
legend.background = element_rect(fill = "white", color = "black"))
Question: Notice that the GLM fit above, compared with the mixed-effects analysis, underestimates the standard errors for intercept and the effect of smoking, but overestimates that for the time effect and the interaction between time and smoking. Could you explain why?
An experiment was conducted to study two new reading programs in the fourth grade. Thirty subjects were randomly assigned to three conditions: a control (traditional program), and two experimental programs. Before the study took place, a test was administered to obtain grade-equivalent reading levels for the subjects. At the end of the study, six months later, similar reading evaluation scores were obtained.
Each subject has equal chance to be assigned to either the Control group or the Experimental groups. The assignment has no influence on the pre measurement. The post measurement depends on group and on the baseline measurement (Pre).
library(knitr)
library(rmdformats)
## Global options
options(max.print="75")
opts_chunk$set(echo=TRUE,
cache=TRUE,
prompt=FALSE,
tidy=TRUE,
comment=NA,
message=FALSE,
warning=FALSE)
opts_knit$set(width=75)
pacman::p_load(knitr, simstudy, tidyverse, nlme, rms)
dta_3 <- read.csv('reading_g3.csv')
Wide to long format
dtaL <- gather(data=dta_3, key=Time, value=Score, Pre:Post, factor_key=TRUE)
kable(head(arrange(dtaL, ID), 10))
| Group | ID | Time | Score |
|---|---|---|---|
| C | S11 | Pre | 32 |
| C | S11 | Post | 41 |
| C | S12 | Pre | 42 |
| C | S12 | Post | 46 |
| C | S13 | Pre | 45 |
| C | S13 | Post | 48 |
| C | S14 | Pre | 46 |
| C | S14 | Post | 54 |
| C | S15 | Pre | 49 |
| C | S15 | Post | 52 |
We inspect the data with a boxplot and a scatter plot.
boxp <- ggplot(dtaL ,
aes(x=Time, y=Score, col=Group))+
geom_boxplot(outlier.size=0) +
geom_point(aes(fill=Group, col=NULL),
shape=21, alpha=0.5, size=2,
position=position_jitterdodge(jitter.width=0.2)) +
theme_bw() +
xlab("")
boxp
Conditional regression
ggplot(dta_3,
aes (x=Pre, y=Post, col=Group)) +
geom_point() +
geom_smooth(method="lm", se=FALSE) +
labs(x="Pre-test reading score",
y="Post-test reading score")+
theme_minimal()
The terms of the most basic constrained model are ‘Time’ and ‘Time * Group’. By omitting ‘Group’, there is no term allowing for a difference in means at baseline between the groups. Consequently, the baseline means are assumed to be equal. However, a general rule in regression modeling is that when there is an interaction in the model, the lower ranked effects should also be included in the model. For this reason, R automatically includes Time and Group as main effects in the model when including the interaction term Time*Group. To avoid to have Group as the main effect in our model, we will create an alternative model matrix: Xalt.
X <- model.matrix(~ Time * Group, data = dtaL)
colnames(X)
[1] "(Intercept)" "TimePost" "GroupT1" "GroupT2"
[5] "TimePost:GroupT1" "TimePost:GroupT2"
Xalt <- X[, c("TimePost", "TimePost:GroupT1", "TimePost:GroupT2")]
colnames(Xalt)
[1] "TimePost" "TimePost:GroupT1" "TimePost:GroupT2"
The gls() function of the ‘nlme’ package can be used for modeling data.
The standard deviations and correlations that should be estimated are defined by the constant variance function varIdent(). By setting weights = varIdent(form = ~ 1 | Time) a separate standard deviation will be estimated for each time point and a separate correlation will be estimated for each pair of time points (= unstructured variance-covariance matrix). In this example, we expect an estimated standard deviation for Time=Pre, an estimated standard deviation for Time=Post and one estimated correlation between the pre- and post reading measurements of the same subject.
By setting weights = varIdent(form = ~ 1 | Time:Group), a separate variance is estimated for each combination of Group and Time (Pre-C Post-C Pre-T1 Post-T1 Pre-T2 Post-T2).
The argument correlation=corSymm (form = ~ 1 | ID) defines the subject levels. The correlation structure is assumed to apply only to observations within the same subject (here: ID); observations from different subjects (a different value for ‘ID’) are assumed to be uncorrelated.
m0_gls <- gls(Score ~ Xalt,
weights = varIdent(form = ~ 1 | Time),
correlation=corSymm (form = ~ 1 | ID),
data = dtaL)
summary(m0_gls)
Generalized least squares fit by REML
Model: Score ~ Xalt
Data: dtaL
AIC BIC logLik
350.2136 364.391 -168.1068
Correlation Structure: General
Formula: ~1 | ID
Parameter estimate(s):
Correlation:
1
2 0.91
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Time
Parameter estimates:
Pre Post
1.000000 1.016413
Coefficients:
Value Std.Error t-value p-value
(Intercept) 50.50000 1.2296621 41.06819 0.0000
XaltTimePost 6.38144 0.9027047 7.06924 0.0000
XaltTimePost:GroupT1 -2.80337 1.2699097 -2.20753 0.0314
XaltTimePost:GroupT2 7.15906 1.2699097 5.63745 0.0000
Correlation:
(Intr) XltTmP XTP:GT1
XaltTimePost -0.102
XaltTimePost:GroupT1 0.000 -0.703
XaltTimePost:GroupT2 0.000 -0.703 0.500
Standardized residuals:
Min Q1 Med Q3 Max
-2.7467890 -0.3836191 0.1374178 0.7475076 1.4493709
Residual standard error: 6.735137
Degrees of freedom: 60 total; 56 residual
The estimated correlation between Pre and Post is 0.91. The estimated standard deviation for the Pre measurement is 6.735 and the estimated standard deviation for the Post measurement is 1.016 * 6.735.
We can obtain predicted means using the gls() function. As expected, the predicted value is the same for three groups at baseline (Time == Pre).
To obtain the confidence intervals, the Gls() function of the rms package can be used exactly as the gls() function, leading to exactly the same predictions but with confidence intervals.
m0_Gls <- Gls(Score ~ Xalt,
weights = varIdent(form = ~ 1 | Time),
correlation=corSymm (form = ~ 1 | ID),
data = dtaL)
predictions <- cbind(dtaL, predict(m0_Gls, dtaL, conf.int=0.95))
predictions$Group2 <- factor(predictions$Group, levels=c("All", "C", "T1", "T2"))
predictions$Group2[predictions$Time=="Pre"] <- "All"
pd <- position_dodge(.08)
limits <- aes(ymax=upper , ymin=lower, shape=Group2)
pCI <- ggplot(predictions,
aes(y=linear.predictors,
x=Time)) +
geom_errorbar(limits,
width=0.09,
position=pd) +
geom_line(aes(group=Group,
y=linear.predictors),
position=pd)+
geom_point(aes(fill=Group2),
position=pd,
shape=21,
size=rel(4)) +
scale_fill_manual(values=c("white", "gray80", "gray20", "black")) +
theme_minimal() +
theme(panel.grid.major=element_blank(),
legend.title=element_blank(),
text=element_text(size=14),
legend.position="bottom") +
labs(x="Time",
y="Estimated mean with corresponding 95% CI")
pCI
In the case of non-randomized experiments or studies an equal mean at baseline cannot be assumed. Consequently, applying constrained model is not appropriate.
Timm, N.H. (1975). Multivariate Analysis with Applications in Education and Psychology, p. 490.
Liu, G.F., Lu, K., Mogg, R., et al. (2009). Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Statistics in Medicine, 28, 2509-2530.