Homework

In Class Exercise 1

The arrhythmogenic dose of epinephrine (ADE) is determined by infusing epinephrine into an animal until arrhythmia criterion is reached. The arrhythmia criterion was the occurrence of four intermittent or continuous premature ventricular contractions. Once the criterion has been reached, the infusion required to produce it is recorded. The ADE is then calculated by multiplying the duration of the infusion by the infusion rate. The data give the ADE for six dogs measured at ten consecutive times.

Source: Shoukri, M.M., & Chaudhary, M.A. (2007). Analysis of Correlated Data with SAS and R. p. 218.

Column 1: Dog ID
Column 2: Time in hours
Column 3: Arrhythmogenic dose of epinephrine (ADE)

pacman::p_load(tidyverse, nlme)
dta_1<- read.table("ade.csv", h=T, sep = ",")
head(dta_1)

##   Dog Hour  ADE
## 1 D11  0.0 5.70
## 2 D11  0.5 4.70
## 3 D11  1.0 4.80
## 4 D11  1.5 4.90
## 5 D11  2.0 3.88
## 6 D11  2.5 3.51

#generate new factor variable"Hour_f"  
dta_1 <- mutate(dta_1, Hour_f = factor(Hour))

# set ggplot theme as black and white
theme_set(theme_bw())

# means with error bars
ggplot(dta_1, aes(Hour, ADE)) +
 geom_point(alpha = .3) +
 stat_summary(fun.y = mean, geom = "line") +
 stat_summary(fun.y = mean, geom = "point") +
 stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
 labs(x = "Time (in hours)", y = "ADE")

# within-group heteroscedasticity structure
summary(m1 <- gls(ADE ~ Hour,
                  weights = varIdent(form = ~ 1 | Hour_f),
                  data = dta_1))

## Generalized least squares fit by REML
##   Model: ADE ~ Hour 
##   Data: dta_1 
##        AIC      BIC    logLik
##   196.6523 221.3776 -86.32617
## 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | Hour_f 
##  Parameter estimates:
##          0        0.5          1        1.5          2        2.5          3 
## 1.00000000 0.55447427 0.64132902 0.51921510 0.43187908 0.08245758 0.10901472 
##        3.5          4        4.5 
## 0.17046144 0.21770576 0.11133075 
## 
## Coefficients:
##                 Value  Std.Error   t-value p-value
## (Intercept)  5.435552 0.27441443 19.807821       0
## Hour        -0.701874 0.08386437 -8.369153       0
## 
##  Correlation: 
##      (Intr)
## Hour -0.964
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.3854719 -0.6272929 -0.1826742  0.5824007  2.3554705 
## 
## Residual standard error: 3.508619 
## Degrees of freedom: 60 total; 58 residual

# auto-regression
summary(m2 <- gls(ADE ~ Hour,
                   weights = varIdent(form = ~ 1 | Hour_f),
                   correlation = corAR1(form = ~ Hour | Dog),
                   data = dta_1))

## Generalized least squares fit by REML
##   Model: ADE ~ Hour 
##   Data: dta_1 
##        AIC      BIC    logLik
##   168.5913 195.3771 -71.29566
## 
## Correlation Structure: ARMA(1,0)
##  Formula: ~Hour | Dog 
##  Parameter estimate(s):
##      Phi1 
## 0.7553498 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | Hour_f 
##  Parameter estimates:
##          0        0.5          1        1.5          2        2.5          3 
## 1.00000000 0.32803241 0.37194739 0.28590639 0.34018814 0.12571538 0.09896982 
##        3.5          4        4.5 
## 0.16894933 0.22219099 0.09978382 
## 
## Coefficients:
##                 Value Std.Error   t-value p-value
## (Intercept)  4.749848 0.3999065 11.877395       0
## Hour        -0.599160 0.1096671 -5.463442       0
## 
##  Correlation: 
##      (Intr)
## Hour -0.94 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -1.43566691 -0.09530459  0.29216375  1.34205811  5.06634565 
## 
## Residual standard error: 3.278488 
## Degrees of freedom: 60 total; 58 residual

# compound symmetry
summary(m3 <- gls(ADE ~ Hour,
                  weights = varIdent(form = ~ 1 | Hour_f),
                  correlation = corCompSymm(form = ~ 1 | Dog),
                  data = dta_1))

## Generalized least squares fit by REML
##   Model: ADE ~ Hour 
##   Data: dta_1 
##        AIC      BIC    logLik
##   161.6077 188.3935 -67.80387
## 
## Correlation Structure: Compound symmetry
##  Formula: ~1 | Dog 
##  Parameter estimate(s):
##       Rho 
## 0.7217933 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | Hour_f 
##  Parameter estimates:
##         0       0.5         1       1.5         2       2.5         3       3.5 
## 1.0000000 0.5100528 0.6140381 0.5228105 0.4479331 0.1852681 0.1129502 0.1611728 
##         4       4.5 
## 0.2309740 0.1131425 
## 
## Coefficients:
##                 Value  Std.Error   t-value p-value
## (Intercept)  4.494975 0.30766482 14.609974       0
## Hour        -0.529169 0.06981253 -7.579863       0
## 
##  Correlation: 
##      (Intr)
## Hour -0.957
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -0.66528429 -0.02390899  0.26405734  0.89073611  2.44460225 
## 
## Residual standard error: 3.832969 
## Degrees of freedom: 60 total; 58 residual

#
summary(m3a <- gls(ADE ~ Hour,
                  weights = varExp(form = ~ Hour),
                  correlation = corCompSymm(form = ~ 1 | Dog),
                  data = dta_1))

## Generalized least squares fit by REML
##   Model: ADE ~ Hour 
##   Data: dta_1 
##        AIC      BIC    logLik
##   163.9703 174.2725 -76.98516
## 
## Correlation Structure: Compound symmetry
##  Formula: ~1 | Dog 
##  Parameter estimate(s):
##       Rho 
## 0.6145863 
## Variance function:
##  Structure: Exponential of variance covariate
##  Formula: ~Hour 
##  Parameter estimates:
##      expon 
## -0.4204529 
## 
## Coefficients:
##                 Value Std.Error   t-value p-value
## (Intercept)  5.302798 0.6777356  7.824287       0
## Hour        -0.674162 0.1346238 -5.007751       0
## 
##  Correlation: 
##      (Intr)
## Hour -0.982
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.1017685 -0.3945215 -0.1031040  0.3711706  2.7263751 
## 
## Residual standard error: 3.079988 
## Degrees of freedom: 60 total; 58 residual

summary(m3b <- lme(ADE ~ Hour, random = ~ 1 | Dog,
                  weights = varIdent(form = ~ 1 | Hour_f),
                  correlation = corCompSymm(form = ~ 1 | Dog),
                  data = dta_1))

## Linear mixed-effects model fit by REML
##  Data: dta_1 
##        AIC     BIC    logLik
##   153.2468 182.093 -62.62338
## 
## Random effects:
##  Formula: ~1 | Dog
##         (Intercept) Residual
## StdDev:   0.3654553 3.928157
## 
## Correlation Structure: Compound symmetry
##  Formula: ~1 | Dog 
##  Parameter estimate(s):
##       Rho 
## 0.8457657 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | Hour_f 
##  Parameter estimates:
##          0        0.5          1        1.5          2        2.5          3 
## 1.00000000 0.54618337 0.63785633 0.53997927 0.47199948 0.16064282 0.06300446 
##        3.5          4        4.5 
## 0.21603493 0.25337758 0.18268767 
## Fixed effects: ADE ~ Hour 
##                 Value  Std.Error DF   t-value p-value
## (Intercept)  4.811462 0.23464110 53  20.50562       0
## Hour        -0.612618 0.05378694 53 -11.38973       0
##  Correlation: 
##      (Intr)
## Hour -0.745
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -0.9223391 -0.1299528  0.1645922  0.6995419  2.4095565 
## 
## Number of Observations: 60
## Number of Groups: 6

# comparing models
anova(m1, m2)

##    Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## m1     1 12 196.6523 221.3776 -86.32617                        
## m2     2 13 168.5913 195.3771 -71.29566 1 vs 2 30.06102  <.0001

#
anova(m1, m3, m3a)

##     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## m1      1 12 196.6523 221.3776 -86.32617                        
## m3      2 13 161.6077 188.3935 -67.80387 1 vs 2 37.04460  <.0001
## m3a     3  5 163.9703 174.2725 -76.98516 2 vs 3 18.36258  0.0187

#  residual plot
plot(m3, resid(., type = "pearson") ~ fitted(.) | Dog,
     layout = c(6, 1), aspect = 2, abline = 0, pch = 20, cex = .8,
     xlab = "Ftted values", ylab = "Pearson residuals")

In class Exercise 2

In the Netherlands, residents in the rural area of Vlagtwedde in the north-east and residents in the urban, industrial area of Vlaardingen in the south-west, were recruited to participate in a study on chronic obstructive lung diseases. The participants, initially aged 15-44, were surveyed approximately every 3 years for up to 21 years. At each survey, information on respiratory symptoms and smoking status was collected by questionnaire and spirometry was performed. Pulmonary function was determined by spirometry and a measure of forced expiratory volume (FEV1) was obtained every three years for the first 15 years of the study, and also at year 19. The number of repeated measurements of FEV1 on each subject varied from 1 to 7.

Here the dataset is comprised of a sub-sample of 133 residents aged 36 or older at their entry into the study and whose smoking status did not change over the 19 years of follow-up. Each participant was either a current or former smoker. Current smoking was defined as smoking at least one cigarette per day.

Source: van der Lende, R., Kok, T.J., Peset, R., Quanjer, P.H., Schouten, J.P. & Orie, N.G.M. (1981). Decreases in VC and FEV1 with time: Indicators for effects of smoking and air pollution. Bulletin of European Physiopathology and Respiration, 17, 775-792.

Column 1: Subject ID
Column 2: Smoker status
Column 3: Time from baseline (years), 0, 3, 6, 9, 12, 15, 19
Column 4: Forced expiratory volume (FEV1)

pacman::p_load(tidyverse, nlme, car)
dta_2<-read.table("fev_smoker.csv", h=T, sep=",")
head(dta_2)

##   ID Smoker Time FEV1
## 1 P1 Former    0 3.40
## 2 P1 Former    3 3.40
## 3 P1 Former    6 3.45
## 4 P1 Former    9 3.20
## 5 P1 Former   15 2.95
## 6 P1 Former   19 2.40

# save a copy of data in wide format
dtaW <- dta_2 %>%
        spread(key = Time, value = FEV1)
        
names(dtaW)[3:9] <- paste0("Year", names(dtaW)[3:9])

# make ellipses in pairwise plot
scatterplotMatrix(~ Year0 + Year3 + Year6 + Year9 + Year12 + Year15 + Year19 | Smoker, data = dtaW, smooth = F, by.group = TRUE,
                  reg.line = F, ellipse = T, diag = "none", pch = c(1, 20),
                  col = c("firebrick", "forestgreen"))

# means & variances
aggregate(FEV1 ~ Time + Smoker, data = dta_2, mean, na.rm=T)

##    Time  Smoker     FEV1
## 1     0 Current 3.229294
## 2     3 Current 3.119474
## 3     6 Current 3.091461
## 4     9 Current 2.870706
## 5    12 Current 2.797901
## 6    15 Current 2.680959
## 7    19 Current 2.497703
## 8     0  Former 3.519130
## 9     3  Former 3.577778
## 10    6  Former 3.264643
## 11    9  Former 3.171667
## 12   12  Former 3.136207
## 13   15  Former 2.870000
## 14   19  Former 2.907857

aggregate(FEV1 ~ Time + Smoker, data = dta_2, var, na.rm=T)

##    Time  Smoker      FEV1
## 1     0 Current 0.3402876
## 2     3 Current 0.3007072
## 3     6 Current 0.3191581
## 4     9 Current 0.3165019
## 5    12 Current 0.2969943
## 6    15 Current 0.2642588
## 7    19 Current 0.2546563
## 8     0  Former 0.2358265
## 9     3  Former 0.3941026
## 10    6  Former 0.3871962
## 11    9  Former 0.3380489
## 12   12  Former 0.3760530
## 13   15  Former 0.3709391
## 14   19  Former 0.4178767

# covariance matrices
dtaW %>%
 filter( Smoker == "Former") %>%
 select( -(1:2)) %>%
 var(na.rm = T)

##            Year0     Year3     Year6     Year9    Year12    Year15    Year19
## Year0  0.3617433 0.3442611 0.3550833 0.2358222 0.2398333 0.3091844 0.2144011
## Year3  0.3442611 0.3566944 0.3479167 0.2434444 0.2455556 0.3014333 0.2141278
## Year6  0.3550833 0.3479167 0.3990278 0.2583333 0.2502778 0.3320000 0.2674722
## Year9  0.2358222 0.2434444 0.2583333 0.1926667 0.1827778 0.2322889 0.1953444
## Year12 0.2398333 0.2455556 0.2502778 0.1827778 0.2022222 0.2201111 0.1406667
## Year15 0.3091844 0.3014333 0.3320000 0.2322889 0.2201111 0.3046711 0.2351044
## Year19 0.2144011 0.2141278 0.2674722 0.1953444 0.1406667 0.2351044 0.3509656

# "Current" covariance matrix
dtaW %>%
 filter( Smoker == "Current") %>%
 select( -(1:2)) %>%
 var(na.rm = T)

##            Year0     Year3     Year6     Year9    Year12    Year15    Year19
## Year0  0.2068623 0.1958723 0.1550242 0.1740866 0.1580108 0.1527199 0.1550177
## Year3  0.1958723 0.2418398 0.1676342 0.1770779 0.1647835 0.1818398 0.1675022
## Year6  0.1550242 0.1676342 0.1618623 0.1328485 0.1484870 0.1479152 0.1571320
## Year9  0.1740866 0.1770779 0.1328485 0.2496970 0.1501407 0.1508160 0.1533355
## Year12 0.1580108 0.1647835 0.1484870 0.1501407 0.1894372 0.1709978 0.1528485
## Year15 0.1527199 0.1818398 0.1479152 0.1508160 0.1709978 0.1873327 0.1654093
## Year19 0.1550177 0.1675022 0.1571320 0.1533355 0.1528485 0.1654093 0.1876729

#
str(dtaW)

## 'data.frame':    133 obs. of  9 variables:
##  $ ID    : chr  "P1" "P10" "P100" "P101" ...
##  $ Smoker: chr  "Former" "Current" "Current" "Current" ...
##  $ Year0 : num  3.4 2.65 3 2.7 2.8 NA NA 4.1 2.07 3.25 ...
##  $ Year3 : num  3.4 NA 2.75 3 3.35 3.1 4.4 3.65 2.2 3.3 ...
##  $ Year6 : num  3.45 3.3 2.45 NA NA NA 3.5 3.9 2.35 3.2 ...
##  $ Year9 : num  3.2 2.4 NA NA 2.95 2.75 3.6 2.75 2.25 3.3 ...
##  $ Year12: num  NA 2.15 NA 2.25 2.5 2.5 NA 3.35 2.2 2.85 ...
##  $ Year15: num  2.95 2.15 2.2 1.85 2.45 1.95 3.4 2.95 NA 3.2 ...
##  $ Year19: num  2.4 2.25 2.18 1.6 NA 2.65 3.2 3.17 NA 3.1 ...

# relevel factor level to produce consistent legend in the plot below
#dta_2$Smoker <- relevel(dta_2$Smoker, ref="Former")

# set theme as black and white
theme_set(theme_bw())

# means with error bars
ggplot(dta_2, aes(Time, FEV1,
                group = Smoker, shape = Smoker, linetype = Smoker)) +
 stat_summary(fun.y = mean, geom = "line") +
 stat_summary(fun.y = mean, geom = "point") +
 stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
 scale_shape_manual(values = c(1, 2)) +
 labs(x = "Time (years since baseline)", y = "Mean FEV1 (liters)",
      linetype = "Smoker", shape = "Smoker") +
 theme(legend.justification = c(1, 1), legend.position = c(1, 1),
       legend.background = element_rect(fill = "white", color = "black"))

# add new variable "Time_f" as factor variable
dta_2 <- mutate(dta_2, Time_f = as.factor(Time))

# FEV1 與現在是否抽煙以及抽煙時間的交互作用
summary(m0 <- gls(FEV1 ~ Smoker*Time_f, data = dta_2))

## Generalized least squares fit by REML
##   Model: FEV1 ~ Smoker * Time_f 
##   Data: dta_2 
##        AIC      BIC    logLik
##   1359.281 1428.722 -664.6407
## 
## Coefficients:
##                           Value  Std.Error  t-value p-value
## (Intercept)            3.229294 0.06093721 52.99380  0.0000
## SmokerFormer           0.289836 0.13204761  2.19494  0.0285
## Time_f3               -0.109820 0.08387973 -1.30926  0.1908
## Time_f6               -0.137833 0.08520443 -1.61768  0.1061
## Time_f9               -0.358588 0.08617823 -4.16101  0.0000
## Time_f12              -0.431393 0.08723567 -4.94514  0.0000
## Time_f15              -0.548335 0.08964988 -6.11641  0.0000
## Time_f19              -0.731591 0.08932341 -8.19037  0.0000
## SmokerFormer:Time_f3   0.168468 0.18013659  0.93522  0.3500
## SmokerFormer:Time_f6  -0.116654 0.17959864 -0.64953  0.5162
## SmokerFormer:Time_f9   0.011124 0.17796363  0.06251  0.9502
## SmokerFormer:Time_f12  0.048469 0.17949158  0.27004  0.7872
## SmokerFormer:Time_f15 -0.100795 0.18684687 -0.53945  0.5897
## SmokerFormer:Time_f19  0.120318 0.18158895  0.66259  0.5078
## 
##  Correlation: 
##                       (Intr) SmkrFr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerFormer          -0.461                                                 
## Time_f3               -0.726  0.335                                          
## Time_f6               -0.715  0.330  0.520                                   
## Time_f9               -0.707  0.326  0.514  0.506                            
## Time_f12              -0.699  0.322  0.507  0.500  0.494                     
## Time_f15              -0.680  0.314  0.494  0.486  0.481  0.475              
## Time_f19              -0.682  0.315  0.496  0.488  0.482  0.477  0.464       
## SmokerFormer:Time_f3   0.338 -0.733 -0.466 -0.242 -0.239 -0.236 -0.230 -0.231
## SmokerFormer:Time_f6   0.339 -0.735 -0.246 -0.474 -0.240 -0.237 -0.231 -0.231
## SmokerFormer:Time_f9   0.342 -0.742 -0.249 -0.245 -0.484 -0.239 -0.233 -0.234
## SmokerFormer:Time_f12  0.339 -0.736 -0.247 -0.243 -0.240 -0.486 -0.231 -0.232
## SmokerFormer:Time_f15  0.326 -0.707 -0.237 -0.233 -0.231 -0.228 -0.480 -0.222
## SmokerFormer:Time_f19  0.336 -0.727 -0.244 -0.240 -0.237 -0.234 -0.228 -0.492
##                       SF:T_3 SF:T_6 SF:T_9 SF:T_12 SF:T_15
## SmokerFormer                                              
## Time_f3                                                   
## Time_f6                                                   
## Time_f9                                                   
## Time_f12                                                  
## Time_f15                                                  
## Time_f19                                                  
## SmokerFormer:Time_f3                                      
## SmokerFormer:Time_f6   0.539                              
## SmokerFormer:Time_f9   0.544  0.546                       
## SmokerFormer:Time_f12  0.539  0.541  0.546                
## SmokerFormer:Time_f15  0.518  0.520  0.524  0.520         
## SmokerFormer:Time_f19  0.533  0.535  0.540  0.535   0.514 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -2.97158077 -0.67377531  0.05043194  0.66748149  3.13946469 
## 
## Residual standard error: 0.5618133 
## Degrees of freedom: 771 total; 757 residual

交互作用不顯著

# different variances per stratum 
summary(m1 <- gls(FEV1 ~ Smoker*Time_f,
                  weights = varIdent(form = ~ 1 | Time_f*Smoker),
                  data = dta_2))

## Generalized least squares fit by REML
##   Model: FEV1 ~ Smoker * Time_f 
##   Data: dta_2 
##        AIC      BIC    logLik
##   1377.981 1507.604 -660.9907
## 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | Time_f * Smoker 
##  Parameter estimates:
##   0*Former   3*Former   6*Former   9*Former  15*Former  19*Former  0*Current 
##   1.000000   1.292710   1.281344   1.197259   1.254151   1.331139   1.201218 
##  3*Current  6*Current  9*Current 12*Current 15*Current 19*Current  12*Former 
##   1.129204   1.163335   1.158479   1.122197   1.058560   1.039136   1.262762 
## 
## Coefficients:
##                           Value  Std.Error  t-value p-value
## (Intercept)            3.229294 0.06327225 51.03808  0.0000
## SmokerFormer           0.289836 0.11940236  2.42739  0.0154
## Time_f3               -0.109820 0.08466840 -1.29707  0.1950
## Time_f6               -0.137833 0.08711755 -1.58215  0.1140
## Time_f9               -0.358588 0.08790306 -4.07936  0.0000
## Time_f12              -0.431393 0.08757802 -4.92581  0.0000
## Time_f15              -0.548335 0.08731204 -6.28018  0.0000
## Time_f19              -0.731591 0.08628221 -8.47905  0.0000
## SmokerFormer:Time_f3   0.168468 0.17893718  0.94149  0.3468
## SmokerFormer:Time_f6  -0.116654 0.17796497 -0.65549  0.5124
## SmokerFormer:Time_f9   0.011124 0.17102269  0.06505  0.9482
## SmokerFormer:Time_f12  0.048469 0.17575742  0.27577  0.7828
## SmokerFormer:Time_f15 -0.100795 0.18257248 -0.55208  0.5811
## SmokerFormer:Time_f19  0.120318 0.18061661  0.66615  0.5055
## 
##  Correlation: 
##                       (Intr) SmkrFr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerFormer          -0.530                                                 
## Time_f3               -0.747  0.396                                          
## Time_f6               -0.726  0.385  0.543                                   
## Time_f9               -0.720  0.381  0.538  0.523                            
## Time_f12              -0.722  0.383  0.540  0.525  0.520                     
## Time_f15              -0.725  0.384  0.542  0.526  0.522  0.524              
## Time_f19              -0.733  0.389  0.548  0.533  0.528  0.530  0.531       
## SmokerFormer:Time_f3   0.354 -0.667 -0.473 -0.257 -0.255 -0.255 -0.256 -0.259
## SmokerFormer:Time_f6   0.356 -0.671 -0.266 -0.490 -0.256 -0.257 -0.258 -0.261
## SmokerFormer:Time_f9   0.370 -0.698 -0.276 -0.269 -0.514 -0.267 -0.268 -0.271
## SmokerFormer:Time_f12  0.360 -0.679 -0.269 -0.261 -0.259 -0.498 -0.261 -0.264
## SmokerFormer:Time_f15  0.347 -0.654 -0.259 -0.252 -0.249 -0.250 -0.478 -0.254
## SmokerFormer:Time_f19  0.350 -0.661 -0.262 -0.254 -0.252 -0.253 -0.254 -0.478
##                       SF:T_3 SF:T_6 SF:T_9 SF:T_12 SF:T_15
## SmokerFormer                                              
## Time_f3                                                   
## Time_f6                                                   
## Time_f9                                                   
## Time_f12                                                  
## Time_f15                                                  
## Time_f19                                                  
## SmokerFormer:Time_f3                                      
## SmokerFormer:Time_f6   0.448                              
## SmokerFormer:Time_f9   0.466  0.468                       
## SmokerFormer:Time_f12  0.453  0.456  0.474                
## SmokerFormer:Time_f15  0.436  0.439  0.457  0.444         
## SmokerFormer:Time_f19  0.441  0.444  0.462  0.449   0.432 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -3.04443149 -0.68139604  0.04873136  0.67419703  2.87624105 
## 
## Residual standard error: 0.4856249 
## Degrees of freedom: 771 total; 757 residual

#Exponential of the variance covariate, specifying a variance covariate
summary(m2 <- gls(FEV1 ~ Smoker*Time_f,
                  weights = varExp(form = ~ Time),
                  data = dta_2))

## Generalized least squares fit by REML
##   Model: FEV1 ~ Smoker * Time_f 
##   Data: dta_2 
##        AIC      BIC    logLik
##   1360.927 1434.996 -664.4633
## 
## Variance function:
##  Structure: Exponential of variance covariate
##  Formula: ~Time 
##  Parameter estimates:
##        expon 
## -0.002528783 
## 
## Coefficients:
##                           Value  Std.Error  t-value p-value
## (Intercept)            3.229294 0.06229936 51.83511  0.0000
## SmokerFormer           0.289836 0.13499932  2.14695  0.0321
## Time_f3               -0.109820 0.08544929 -1.28521  0.1991
## Time_f6               -0.137833 0.08647075 -1.59399  0.1114
## Time_f9               -0.358588 0.08711898 -4.11607  0.0000
## Time_f12              -0.431393 0.08783081 -4.91163  0.0000
## Time_f15              -0.548335 0.08983467 -6.10383  0.0000
## Time_f19              -0.731591 0.08905559 -8.21500  0.0000
## SmokerFormer:Time_f3   0.168468 0.18352063  0.91798  0.3589
## SmokerFormer:Time_f6  -0.116654 0.18234823 -0.63973  0.5225
## SmokerFormer:Time_f9   0.011124 0.18011317  0.06176  0.9508
## SmokerFormer:Time_f12  0.048469 0.18100811  0.26777  0.7889
## SmokerFormer:Time_f15 -0.100795 0.18749823 -0.53758  0.5910
## SmokerFormer:Time_f19  0.120318 0.18159639  0.66256  0.5078
## 
##  Correlation: 
##                       (Intr) SmkrFr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerFormer          -0.461                                                 
## Time_f3               -0.729  0.336                                          
## Time_f6               -0.720  0.332  0.525                                   
## Time_f9               -0.715  0.330  0.521  0.515                            
## Time_f12              -0.709  0.327  0.517  0.511  0.507                     
## Time_f15              -0.693  0.320  0.506  0.500  0.496  0.492              
## Time_f19              -0.700  0.323  0.510  0.504  0.500  0.496  0.485       
## SmokerFormer:Time_f3   0.339 -0.736 -0.466 -0.245 -0.243 -0.241 -0.235 -0.237
## SmokerFormer:Time_f6   0.342 -0.740 -0.249 -0.474 -0.244 -0.242 -0.237 -0.239
## SmokerFormer:Time_f9   0.346 -0.750 -0.252 -0.249 -0.484 -0.245 -0.240 -0.242
## SmokerFormer:Time_f12  0.344 -0.746 -0.251 -0.248 -0.246 -0.485 -0.239 -0.241
## SmokerFormer:Time_f15  0.332 -0.720 -0.242 -0.239 -0.238 -0.236 -0.479 -0.232
## SmokerFormer:Time_f19  0.343 -0.743 -0.250 -0.247 -0.245 -0.243 -0.238 -0.490
##                       SF:T_3 SF:T_6 SF:T_9 SF:T_12 SF:T_15
## SmokerFormer                                              
## Time_f3                                                   
## Time_f6                                                   
## Time_f9                                                   
## Time_f12                                                  
## Time_f15                                                  
## Time_f19                                                  
## SmokerFormer:Time_f3                                      
## SmokerFormer:Time_f6   0.545                              
## SmokerFormer:Time_f9   0.551  0.555                       
## SmokerFormer:Time_f12  0.549  0.552  0.559                
## SmokerFormer:Time_f15  0.530  0.533  0.540  0.537         
## SmokerFormer:Time_f19  0.547  0.550  0.557  0.554   0.535 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -2.92874262 -0.66677628  0.05044755  0.66745312  3.16543499 
## 
## Residual standard error: 0.5743718 
## Degrees of freedom: 771 total; 757 residual

#Exponential of the variance covariate, a grouping factor "Smoker" for the coefficients
summary(m3 <- gls(FEV1 ~ Smoker*Time_f,
                  weights = varExp(form = ~ Time | Smoker),
                  data = dta_2))

## Generalized least squares fit by REML
##   Model: FEV1 ~ Smoker * Time_f 
##   Data: dta_2 
##        AIC      BIC    logLik
##   1358.216 1436.915 -662.1079
## 
## Variance function:
##  Structure: Exponential of variance covariate, different strata
##  Formula: ~Time | Smoker 
##  Parameter estimates:
##       Former      Current 
##  0.005425870 -0.006170002 
## 
## Coefficients:
##                           Value  Std.Error  t-value p-value
## (Intercept)            3.229294 0.06249240 51.67499  0.0000
## SmokerFormer           0.289836 0.13541763  2.14031  0.0326
## Time_f3               -0.109820 0.08527911 -1.28778  0.1982
## Time_f6               -0.137833 0.08584232 -1.60566  0.1088
## Time_f9               -0.358588 0.08602381 -4.16848  0.0000
## Time_f12              -0.431393 0.08625192 -5.00155  0.0000
## Time_f15              -0.548335 0.08765943 -6.25529  0.0000
## Time_f19              -0.731591 0.08633408 -8.47396  0.0000
## SmokerFormer:Time_f3   0.168468 0.18548978  0.90823  0.3640
## SmokerFormer:Time_f6  -0.116654 0.18561925 -0.62846  0.5299
## SmokerFormer:Time_f9   0.011124 0.18448022  0.06030  0.9519
## SmokerFormer:Time_f12  0.048469 0.18684375  0.25941  0.7954
## SmokerFormer:Time_f15 -0.100795 0.19594152 -0.51441  0.6071
## SmokerFormer:Time_f19  0.120318 0.19093508  0.63015  0.5288
## 
##  Correlation: 
##                       (Intr) SmkrFr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerFormer          -0.461                                                 
## Time_f3               -0.733  0.338                                          
## Time_f6               -0.728  0.336  0.533                                   
## Time_f9               -0.726  0.335  0.532  0.529                            
## Time_f12              -0.725  0.334  0.531  0.527  0.526                     
## Time_f15              -0.713  0.329  0.522  0.519  0.518  0.517              
## Time_f19              -0.724  0.334  0.530  0.527  0.526  0.524  0.516       
## SmokerFormer:Time_f3   0.337 -0.730 -0.460 -0.245 -0.245 -0.244 -0.240 -0.244
## SmokerFormer:Time_f6   0.337 -0.730 -0.247 -0.462 -0.245 -0.244 -0.240 -0.244
## SmokerFormer:Time_f9   0.339 -0.734 -0.248 -0.247 -0.466 -0.245 -0.241 -0.245
## SmokerFormer:Time_f12  0.334 -0.725 -0.245 -0.243 -0.243 -0.462 -0.238 -0.242
## SmokerFormer:Time_f15  0.319 -0.691 -0.234 -0.232 -0.232 -0.231 -0.447 -0.231
## SmokerFormer:Time_f19  0.327 -0.709 -0.240 -0.238 -0.238 -0.237 -0.233 -0.452
##                       SF:T_3 SF:T_6 SF:T_9 SF:T_12 SF:T_15
## SmokerFormer                                              
## Time_f3                                                   
## Time_f6                                                   
## Time_f9                                                   
## Time_f12                                                  
## Time_f15                                                  
## Time_f19                                                  
## SmokerFormer:Time_f3                                      
## SmokerFormer:Time_f6   0.533                              
## SmokerFormer:Time_f9   0.536  0.536                       
## SmokerFormer:Time_f12  0.529  0.529  0.532                
## SmokerFormer:Time_f15  0.505  0.504  0.507  0.501         
## SmokerFormer:Time_f19  0.518  0.517  0.521  0.514   0.490 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -2.95176426 -0.67527408  0.04683313  0.67280125  2.86836115 
## 
## Residual standard error: 0.5761515 
## Degrees of freedom: 771 total; 757 residual

#unstructure correalation within ID, 
summary(m4 <- gls(FEV1 ~ Smoker*Time_f,
                  correlation = corSymm(form = ~ 1 | ID),
                  weights = varIdent(form = ~ 1 | Time_f*Smoker),
                  data = dta_2))

## Generalized least squares fit by REML
##   Model: FEV1 ~ Smoker * Time_f 
##   Data: dta_2 
##        AIC      BIC    logLik
##   361.1039 587.9427 -131.5519
## 
## Correlation Structure: General
##  Formula: ~1 | ID 
##  Parameter estimate(s):
##  Correlation: 
##   1     2     3     4     5     6    
## 2 0.852                              
## 3 0.856 0.888                        
## 4 0.849 0.846 0.894                  
## 5 0.841 0.853 0.891 0.892            
## 6 0.861 0.866 0.896 0.872 0.935      
## 7 0.753 0.756 0.844 0.789 0.789 0.866
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | Time_f * Smoker 
##  Parameter estimates:
##   0*Former   3*Former   6*Former   9*Former  15*Former  19*Former  0*Current 
##  1.0000000  1.1548304  1.0981236  1.0095120  1.1970056  1.0796139  1.1241811 
##  6*Current  9*Current 12*Current 15*Current 19*Current  3*Current  12*Former 
##  1.0518792  1.1013787  1.0380963  1.0158854  0.9676307  1.0363139  1.0527553 
## 
## Coefficients:
##                           Value  Std.Error   t-value p-value
## (Intercept)            3.226768 0.06124614  52.68525  0.0000
## SmokerFormer           0.257911 0.11599406   2.22349  0.0265
## Time_f3               -0.089416 0.03460035  -2.58424  0.0099
## Time_f6               -0.157329 0.03459404  -4.54788  0.0000
## Time_f9               -0.381781 0.03582235 -10.65763  0.0000
## Time_f12              -0.409790 0.03620099 -11.31984  0.0000
## Time_f15              -0.582715 0.03532020 -16.49808  0.0000
## Time_f19              -0.725744 0.03664045 -19.80719  0.0000
## SmokerFormer:Time_f3   0.065320 0.07460792   0.87551  0.3816
## SmokerFormer:Time_f6  -0.063890 0.07168090  -0.89131  0.3730
## SmokerFormer:Time_f9   0.052926 0.06933367   0.76335  0.4455
## SmokerFormer:Time_f12  0.014888 0.07199618   0.20679  0.8362
## SmokerFormer:Time_f15  0.103703 0.07642577   1.35691  0.1752
## SmokerFormer:Time_f19  0.127043 0.07531983   1.68672  0.0921
## 
##  Correlation: 
##                       (Intr) SmkrFr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerFormer          -0.528                                                 
## Time_f3               -0.434  0.229                                          
## Time_f6               -0.399  0.211  0.664                                   
## Time_f9               -0.329  0.174  0.556  0.661                            
## Time_f12              -0.430  0.227  0.588  0.663  0.667                     
## Time_f15              -0.439  0.232  0.592  0.639  0.606  0.744              
## Time_f19              -0.509  0.269  0.578  0.649  0.581  0.656  0.739       
## SmokerFormer:Time_f3   0.201 -0.204 -0.464 -0.308 -0.258 -0.273 -0.274 -0.268
## SmokerFormer:Time_f6   0.193 -0.249 -0.320 -0.483 -0.319 -0.320 -0.309 -0.313
## SmokerFormer:Time_f9   0.170 -0.332 -0.287 -0.341 -0.517 -0.345 -0.313 -0.300
## SmokerFormer:Time_f12  0.216 -0.326 -0.296 -0.333 -0.336 -0.503 -0.374 -0.330
## SmokerFormer:Time_f15  0.203 -0.154 -0.273 -0.295 -0.280 -0.344 -0.462 -0.342
## SmokerFormer:Time_f19  0.248 -0.322 -0.281 -0.316 -0.283 -0.319 -0.360 -0.486
##                       SF:T_3 SF:T_6 SF:T_9 SF:T_12 SF:T_15
## SmokerFormer                                              
## Time_f3                                                   
## Time_f6                                                   
## Time_f9                                                   
## Time_f12                                                  
## Time_f15                                                  
## Time_f19                                                  
## SmokerFormer:Time_f3                                      
## SmokerFormer:Time_f6   0.620                              
## SmokerFormer:Time_f9   0.528  0.661                       
## SmokerFormer:Time_f12  0.539  0.650  0.692                
## SmokerFormer:Time_f15  0.532  0.600  0.579  0.724         
## SmokerFormer:Time_f19  0.512  0.632  0.604  0.654   0.701 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -3.00858236 -0.65224589  0.05368462  0.64660671  3.17725120 
## 
## Residual standard error: 0.5411937 
## Degrees of freedom: 771 total; 757 residual

# Likelihood ratio test
anova(m0, m1, m2, m3, m4)

##    Model df       AIC       BIC    logLik   Test   L.Ratio p-value
## m0     1 15 1359.2814 1428.7218 -664.6407                         
## m1     2 28 1377.9814 1507.6035 -660.9907 1 vs 2    7.3000  0.8860
## m2     3 16 1360.9267 1434.9965 -664.4633 2 vs 3    6.9453  0.8612
## m3     4 17 1358.2158 1436.9149 -662.1079 3 vs 4    4.7109  0.0300
## m4     5 49  361.1039  587.9427 -131.5519 4 vs 5 1061.1119  <.0001

m4 model got the smallest AIC

# show estimated covariance
intervals(m4, which="var-cov")

## Approximate 95% confidence intervals
## 
##  Correlation structure:
##              lower      est.     upper
## cor(1,2) 0.7967346 0.8517421 0.8927522
## cor(1,3) 0.8018409 0.8555031 0.8954764
## cor(1,4) 0.7931020 0.8492935 0.8911503
## cor(1,5) 0.7834071 0.8412936 0.8847095
## cor(1,6) 0.8022351 0.8606287 0.9027113
## cor(1,7) 0.5941630 0.7527582 0.8550134
## cor(2,3) 0.8454823 0.8879922 0.9193183
## cor(2,4) 0.7892002 0.8463215 0.8889253
## cor(2,5) 0.7990307 0.8531834 0.8936072
## cor(2,6) 0.8099224 0.8662820 0.9067911
## cor(2,7) 0.6039538 0.7559247 0.8548582
## cor(3,4) 0.8526820 0.8935462 0.9235429
## cor(3,5) 0.8494824 0.8907973 0.9212547
## cor(3,6) 0.8468453 0.8956741 0.9295293
## cor(3,7) 0.7199037 0.8441776 0.9160061
## cor(4,5) 0.8505888 0.8918408 0.9221822
## cor(4,6) 0.8167268 0.8717991 0.9111321
## cor(4,7) 0.6602163 0.7892229 0.8729961
## cor(5,6) 0.9042969 0.9351723 0.9563153
## cor(5,7) 0.6503844 0.7894474 0.8773089
## cor(6,7) 0.7630385 0.8655912 0.9256331
## attr(,"label")
## [1] "Correlation structure:"
## 
##  Variance function:
##                lower      est.    upper
## 3*Former   0.9770507 1.1548304 1.364958
## 6*Former   0.9386342 1.0981236 1.284713
## 9*Former   0.8626250 1.0095120 1.181411
## 15*Former  1.0176955 1.1970056 1.407909
## 19*Former  0.9239548 1.0796139 1.261497
## 0*Current  0.9793513 1.1241811 1.290429
## 6*Current  0.9232082 1.0518792 1.198484
## 9*Current  0.9643554 1.1013787 1.257871
## 12*Current 0.9081244 1.0380963 1.186670
## 15*Current 0.8911809 1.0158854 1.158040
## 19*Current 0.8487620 0.9676307 1.103147
## 3*Current  0.9099665 1.0363139 1.180204
## 12*Former  0.9069074 1.0527553 1.222058
## attr(,"label")
## [1] "Variance function:"
## 
##  Residual standard error:
##     lower      est.     upper 
## 0.4662662 0.5411937 0.6281617

#  residual plot
plot(m4, resid(., type = "pearson") ~ fitted(.) | Smoker,
     layout = c(2, 1), aspect = 2, abline = 0, pch = 20, cex = .8,
     xlab = "Ftted values", ylab = "Pearson residuals")

# generate new data frame from fiitted value of m4 and m0 models
dta_m4 <- data.frame(dta_2, fit = fitted(m4), fit0 = fitted(m0))

# use fitted values to get error bars and see if they cover the data points
ggplot(dta_m4, aes(Time, fit, linetype = Smoker, shape = Smoker,
                 group = Smoker)) +
 stat_summary(fun.y = mean, geom = "line") +
 stat_summary(fun.y = mean, geom = "point") +
 stat_summary(aes(Time, fit0), fun.y = mean, geom = "point", col = "skyblue") +
 stat_summary(aes(Time, FEV1), fun.data = mean_se, 
              geom = "pointrange", col = "firebrick", alpha = .3) +
 scale_shape_manual(values = c(1, 2)) +
 labs(x = "Time (years since baseline)", y = "FEV1", linetype = "Smoker",
      shape = "Smoker") +
 theme(legend.position = c(0.9, 0.9),
       legend.background = element_rect(fill = "white", color = "black"))

Question: Notice that the GLM fit above, compared with the mixed-effects analysis, underestimates the standard errors for intercept and the effect of smoking, but overestimates that for the time effect and the interaction between time and smoking. Could you explain why?

In Class Exercise 3

The study

An experiment was conducted to study two new reading programs in the fourth grade. Thirty subjects were randomly assigned to three conditions: a control (traditional program), and two experimental programs. Before the study took place, a test was administered to obtain grade-equivalent reading levels for the subjects. At the end of the study, six months later, similar reading evaluation scores were obtained.

Each subject has equal chance to be assigned to either the Control group or the Experimental groups. The assignment has no influence on the pre measurement. The post measurement depends on group and on the baseline measurement (Pre).

Column 1: Group ID, Control = C, Experimental 1 = T1, Experimental 2 = T2
Column 2: Reading score at the end of the study
Column 3: Reading score at the beginning of the study
Column 4: Subject ID

library(knitr)
library(rmdformats)
## Global options
options(max.print="75")
opts_chunk$set(echo=TRUE,
                 cache=TRUE,
               prompt=FALSE,
               tidy=TRUE,
               comment=NA,
               message=FALSE,
               warning=FALSE)
opts_knit$set(width=75)

pacman::p_load(knitr, simstudy, tidyverse, nlme, rms)

dta_3 <- read.csv('reading_g3.csv')

Wide to long format

dtaL <- gather(data=dta_3, key=Time, value=Score, Pre:Post, factor_key=TRUE)  
kable(head(arrange(dtaL, ID), 10))

Group	ID	Time	Score
C	S11	Pre	32
C	S11	Post	41
C	S12	Pre	42
C	S12	Post	46
C	S13	Pre	45
C	S13	Post	48
C	S14	Pre	46
C	S14	Post	54
C	S15	Pre	49
C	S15	Post	52

Visualization

We inspect the data with a boxplot and a scatter plot.

boxp <- ggplot(dtaL , 
               aes(x=Time, y=Score, col=Group))+ 
  geom_boxplot(outlier.size=0) + 
  geom_point(aes(fill=Group, col=NULL), 
             shape=21, alpha=0.5, size=2, 
             position=position_jitterdodge(jitter.width=0.2)) + 
  theme_bw() + 
  xlab("") 
boxp

Conditional regression

ggplot(dta_3, 
       aes (x=Pre, y=Post, col=Group)) + 
  geom_point() + 
  geom_smooth(method="lm", se=FALSE) + 
  labs(x="Pre-test reading score",
       y="Post-test reading score")+
  theme_minimal()

The constrained baseline model

The terms of the most basic constrained model are ‘Time’ and ‘Time * Group’. By omitting ‘Group’, there is no term allowing for a difference in means at baseline between the groups. Consequently, the baseline means are assumed to be equal. However, a general rule in regression modeling is that when there is an interaction in the model, the lower ranked effects should also be included in the model. For this reason, R automatically includes Time and Group as main effects in the model when including the interaction term Time*Group. To avoid to have Group as the main effect in our model, we will create an alternative model matrix: Xalt.

X <- model.matrix(~ Time * Group, data = dtaL)
colnames(X)

[1] "(Intercept)"      "TimePost"         "GroupT1"          "GroupT2"         
[5] "TimePost:GroupT1" "TimePost:GroupT2"

Xalt <- X[, c("TimePost", "TimePost:GroupT1", "TimePost:GroupT2")] 
colnames(Xalt)

[1] "TimePost"         "TimePost:GroupT1" "TimePost:GroupT2"

Analysis

The gls() function of the ‘nlme’ package can be used for modeling data.

The standard deviations and correlations that should be estimated are defined by the constant variance function varIdent(). By setting weights = varIdent(form = ~ 1 | Time) a separate standard deviation will be estimated for each time point and a separate correlation will be estimated for each pair of time points (= unstructured variance-covariance matrix). In this example, we expect an estimated standard deviation for Time=Pre, an estimated standard deviation for Time=Post and one estimated correlation between the pre- and post reading measurements of the same subject.

By setting weights = varIdent(form = ~ 1 | Time:Group), a separate variance is estimated for each combination of Group and Time (Pre-C Post-C Pre-T1 Post-T1 Pre-T2 Post-T2).

The argument correlation=corSymm (form = ~ 1 | ID) defines the subject levels. The correlation structure is assumed to apply only to observations within the same subject (here: ID); observations from different subjects (a different value for ‘ID’) are assumed to be uncorrelated.

m0_gls <- gls(Score ~ Xalt, 
                weights = varIdent(form = ~ 1 | Time), 
                correlation=corSymm (form = ~ 1 | ID), 
                data = dtaL)
summary(m0_gls)

Generalized least squares fit by REML
  Model: Score ~ Xalt 
  Data: dtaL 
       AIC     BIC    logLik
  350.2136 364.391 -168.1068

Correlation Structure: General
 Formula: ~1 | ID 
 Parameter estimate(s):
 Correlation: 
  1   
2 0.91
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Time 
 Parameter estimates:
     Pre     Post 
1.000000 1.016413 

Coefficients:
                        Value Std.Error  t-value p-value
(Intercept)          50.50000 1.2296621 41.06819  0.0000
XaltTimePost          6.38144 0.9027047  7.06924  0.0000
XaltTimePost:GroupT1 -2.80337 1.2699097 -2.20753  0.0314
XaltTimePost:GroupT2  7.15906 1.2699097  5.63745  0.0000

 Correlation: 
                     (Intr) XltTmP XTP:GT1
XaltTimePost         -0.102               
XaltTimePost:GroupT1  0.000 -0.703        
XaltTimePost:GroupT2  0.000 -0.703  0.500 

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.7467890 -0.3836191  0.1374178  0.7475076  1.4493709 

Residual standard error: 6.735137 
Degrees of freedom: 60 total; 56 residual

The estimated correlation between Pre and Post is 0.91. The estimated standard deviation for the Pre measurement is 6.735 and the estimated standard deviation for the Post measurement is 1.016 * 6.735.

Prediction

We can obtain predicted means using the gls() function. As expected, the predicted value is the same for three groups at baseline (Time == Pre).

To obtain the confidence intervals, the Gls() function of the rms package can be used exactly as the gls() function, leading to exactly the same predictions but with confidence intervals.

m0_Gls <- Gls(Score ~ Xalt, 
                weights = varIdent(form = ~ 1 | Time), 
                correlation=corSymm (form = ~ 1 | ID), 
                data = dtaL)

predictions <- cbind(dtaL, predict(m0_Gls, dtaL, conf.int=0.95))

predictions$Group2 <- factor(predictions$Group, levels=c("All", "C", "T1", "T2"))
predictions$Group2[predictions$Time=="Pre"] <- "All"

pd <- position_dodge(.08)
limits <- aes(ymax=upper , ymin=lower, shape=Group2)

pCI <- ggplot(predictions, 
              aes(y=linear.predictors, 
                  x=Time)) + 
  geom_errorbar(limits, 
                width=0.09, 
                position=pd) +
  geom_line(aes(group=Group, 
                y=linear.predictors), 
            position=pd)+
  geom_point(aes(fill=Group2),
             position=pd, 
             shape=21, 
             size=rel(4)) + 
  scale_fill_manual(values=c("white", "gray80", "gray20", "black")) + 
  theme_minimal() + 
  theme(panel.grid.major=element_blank(), 
        legend.title=element_blank(), 
        text=element_text(size=14), 
        legend.position="bottom") + 
  labs(x="Time", 
       y="Estimated mean with corresponding 95% CI")
pCI

Non-randomized studies

In the case of non-randomized experiments or studies an equal mean at baseline cannot be assumed. Consequently, applying constrained model is not appropriate.

Reference

Timm, N.H. (1975). Multivariate Analysis with Applications in Education and Psychology, p. 490.

Liu, G.F., Lu, K., Mogg, R., et al. (2009). Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Statistics in Medicine, 28, 2509-2530.

Homework_1019

yuwenchen

10/19/2020