1 In-class exercises 2:

Replicate the analysis reported in the arrhythmogenic dose of epinephrine example.

The arrhythmogenic dose of epinephrine (ADE) is determined by infusing epinephrine into an animal until arrhythmia criterion is reached. The arrhythmia criterion was the occurrence of four intermittent or continuous premature ventricular contractions. Once the criterion has been reached, the infusion required to produce it is recorded. The ADE is then calculated by multiplying the duration of the infusion by the infusion rate. The data give the ADE for six dogs measured at ten consecutive times.

Source: Shoukri, M.M., & Chaudhary, M.A. (2007). Analysis of Correlated Data with SAS and R. p. 218.

1.1 Load data file

#
pacman::p_load(tidyverse, nlme)

#
dta <- read.csv("Data/ade.csv")

#
str(dta)
'data.frame':   60 obs. of  3 variables:
 $ Dog : chr  "D11" "D11" "D11" "D11" ...
 $ Hour: num  0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ...
 $ ADE : num  5.7 4.7 4.8 4.9 3.88 3.51 2.8 2.6 2.5 2.5 ...
head(dta)
  Dog Hour  ADE
1 D11  0.0 5.70
2 D11  0.5 4.70
3 D11  1.0 4.80
4 D11  1.5 4.90
5 D11  2.0 3.88
6 D11  2.5 3.51
#
dta <- mutate(dta, Hour_f = factor(Hour))

1.2 Visualization

theme_set(theme_bw())

# means with error bars
ggplot(dta, aes(Hour, ADE)) +
 geom_point(alpha = .3) +
 stat_summary(fun.y = mean, geom = "line") +
 stat_summary(fun.y = mean, geom = "point") +
 stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
 labs(x = "Time (in hours)", y = "ADE")

1.3 Model 1

summary(m1 <- gls(ADE ~ Hour,
                  weights = varIdent(form = ~ 1 | Hour_f),
                  data = dta))
Generalized least squares fit by REML
  Model: ADE ~ Hour 
  Data: dta 
     AIC    BIC  logLik
  196.65 221.38 -86.326

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Hour_f 
 Parameter estimates:
       0      0.5        1      1.5        2      2.5        3      3.5 
1.000000 0.554474 0.641329 0.519215 0.431879 0.082458 0.109015 0.170461 
       4      4.5 
0.217706 0.111331 

Coefficients:
              Value Std.Error t-value p-value
(Intercept)  5.4356  0.274414 19.8078       0
Hour        -0.7019  0.083864 -8.3692       0

 Correlation: 
     (Intr)
Hour -0.964

Standardized residuals:
     Min       Q1      Med       Q3      Max 
-1.38547 -0.62729 -0.18267  0.58240  2.35547 

Residual standard error: 3.5086 
Degrees of freedom: 60 total; 58 residual

Time effect

1.4 Model2: ARMA

summary(m2 <- gls(ADE ~ Hour,
                   weights = varIdent(form = ~ 1 | Hour_f),
                   correlation = corAR1(form = ~ Hour | Dog),
                   data = dta))
Generalized least squares fit by REML
  Model: ADE ~ Hour 
  Data: dta 
     AIC    BIC  logLik
  168.59 195.38 -71.296

Correlation Structure: ARMA(1,0)
 Formula: ~Hour | Dog 
 Parameter estimate(s):
   Phi1 
0.75535 
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Hour_f 
 Parameter estimates:
       0      0.5        1      1.5        2      2.5        3      3.5 
1.000000 0.328032 0.371947 0.285906 0.340188 0.125715 0.098970 0.168949 
       4      4.5 
0.222191 0.099784 

Coefficients:
              Value Std.Error t-value p-value
(Intercept)  4.7498   0.39991 11.8774       0
Hour        -0.5992   0.10967 -5.4634       0

 Correlation: 
     (Intr)
Hour -0.94 

Standardized residuals:
      Min        Q1       Med        Q3       Max 
-1.435667 -0.095305  0.292164  1.342058  5.066346 

Residual standard error: 3.2785 
Degrees of freedom: 60 total; 58 residual

Time effect

1.5 Model 3: Compound Symmetry

summary(m3 <- gls(ADE ~ Hour,
                  weights = varIdent(form = ~ 1 | Hour_f),
                  correlation = corCompSymm(form = ~ 1 | Dog),
                  data = dta))
Generalized least squares fit by REML
  Model: ADE ~ Hour 
  Data: dta 
     AIC    BIC  logLik
  161.61 188.39 -67.804

Correlation Structure: Compound symmetry
 Formula: ~1 | Dog 
 Parameter estimate(s):
    Rho 
0.72179 
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Hour_f 
 Parameter estimates:
      0     0.5       1     1.5       2     2.5       3     3.5       4     4.5 
1.00000 0.51005 0.61404 0.52281 0.44793 0.18527 0.11295 0.16117 0.23097 0.11314 

Coefficients:
              Value Std.Error t-value p-value
(Intercept)  4.4950  0.307665 14.6100       0
Hour        -0.5292  0.069813 -7.5799       0

 Correlation: 
     (Intr)
Hour -0.957

Standardized residuals:
      Min        Q1       Med        Q3       Max 
-0.665284 -0.023909  0.264057  0.890736  2.444602 

Residual standard error: 3.833 
Degrees of freedom: 60 total; 58 residual

Time effect

1.6 Model 3a: Exponential of variance covariate

summary(m3a <- gls(ADE ~ Hour,
                  weights = varExp(form = ~ Hour),
                  correlation = corCompSymm(form = ~ 1 | Dog),
                  data = dta))
Generalized least squares fit by REML
  Model: ADE ~ Hour 
  Data: dta 
     AIC    BIC  logLik
  163.97 174.27 -76.985

Correlation Structure: Compound symmetry
 Formula: ~1 | Dog 
 Parameter estimate(s):
    Rho 
0.61459 
Variance function:
 Structure: Exponential of variance covariate
 Formula: ~Hour 
 Parameter estimates:
   expon 
-0.42045 

Coefficients:
              Value Std.Error t-value p-value
(Intercept)  5.3028   0.67774  7.8243       0
Hour        -0.6742   0.13462 -5.0078       0

 Correlation: 
     (Intr)
Hour -0.982

Standardized residuals:
     Min       Q1      Med       Q3      Max 
-1.10177 -0.39452 -0.10310  0.37117  2.72638 

Residual standard error: 3.08 
Degrees of freedom: 60 total; 58 residual

Time effect

1.7 Model 3b: Unstructured

summary(m3b <- lme(ADE ~ Hour, random = ~ 1 | Dog,
                  weights = varIdent(form = ~ 1 | Hour_f),
                  correlation = corCompSymm(form = ~ 1 | Dog),
                  data = dta))
Linear mixed-effects model fit by REML
 Data: dta 
     AIC    BIC  logLik
  153.25 182.09 -62.623

Random effects:
 Formula: ~1 | Dog
        (Intercept) Residual
StdDev:     0.36546   3.9282

Correlation Structure: Compound symmetry
 Formula: ~1 | Dog 
 Parameter estimate(s):
    Rho 
0.84577 
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Hour_f 
 Parameter estimates:
       0      0.5        1      1.5        2      2.5        3      3.5 
1.000000 0.546183 0.637856 0.539979 0.471999 0.160643 0.063004 0.216035 
       4      4.5 
0.253378 0.182688 
Fixed effects: ADE ~ Hour 
              Value Std.Error DF t-value p-value
(Intercept)  4.8115  0.234641 53  20.506       0
Hour        -0.6126  0.053787 53 -11.390       0
 Correlation: 
     (Intr)
Hour -0.745

Standardized Within-Group Residuals:
     Min       Q1      Med       Q3      Max 
-0.92234 -0.12995  0.16459  0.69954  2.40956 

Number of Observations: 60
Number of Groups: 6 

Time effect

1.8 Model Comparison

anova(m1, m2)
   Model df    AIC    BIC  logLik   Test L.Ratio p-value
m1     1 12 196.65 221.38 -86.326                       
m2     2 13 168.59 195.38 -71.296 1 vs 2  30.061  <.0001

Simple model may not fit here.

anova(m1, m3, m3a)
    Model df    AIC    BIC  logLik   Test L.Ratio p-value
m1      1 12 196.65 221.38 -86.326                       
m3      2 13 161.61 188.39 -67.804 1 vs 2  37.045  <.0001
m3a     3  5 163.97 174.27 -76.985 2 vs 3  18.363  0.0187

Compound Symmetry may be the best fit model than Unstructured model. Also, the design is a repeated measure design.

1.9 Residual plot

plot(m3, resid(., type = "pearson") ~ fitted(.) | Dog,
     layout = c(6, 1), aspect = 2, abline = 0, pch = 20, cex = .8,
     xlab = "Ftted values", ylab = "Pearson residuals")

2 In-class exercises 3:

Replicate the analysis reported in the smoker example.

In the Netherlands, residents in the rural area of Vlagtwedde in the north-east and residents in the urban, industrial area of Vlaardingen in the south-west, were recruited to participate in a study on chronic obstructive lung diseases. The participants, initially aged 15-44, were surveyed approximately every 3 years for up to 21 years. At each survey, information on respiratory symptoms and smoking status was collected by questionnaire and spirometry was performed. Pulmonary function was determined by spirometry and a measure of forced expiratory volume (FEV1) was obtained every three years for the first 15 years of the study, and also at year 19. The number of repeated measurements of FEV1 on each subject varied from 1 to 7.

Here the dataset is comprised of a sub-sample of 133 residents aged 36 or older at their entry into the study and whose smoking status did not change over the 19 years of follow-up. Each participant was either a current or former smoker. Current smoking was defined as smoking at least one cigarette per day.

Source: van der Lende, R., Kok, T.J., Peset, R., Quanjer, P.H., Schouten, J.P. & Orie, N.G.M. (1981). Decreases in VC and FEV1 with time: Indicators for effects of smoking and air pollution. Bulletin of European Physiopathology and Respiration, 17, 775-792.

2.1 Load data file

#
pacman::p_load(tidyverse, nlme, car)

#
dta <- read.csv("Data/fev_smoker.csv", h = T)

#
str(dta)
'data.frame':   771 obs. of  4 variables:
 $ ID    : chr  "P1" "P1" "P1" "P1" ...
 $ Smoker: chr  "Former" "Former" "Former" "Former" ...
 $ Time  : int  0 3 6 9 15 19 0 3 6 9 ...
 $ FEV1  : num  3.4 3.4 3.45 3.2 2.95 2.4 3.1 3.15 3.5 2.95 ...
head(dta)
  ID Smoker Time FEV1
1 P1 Former    0 3.40
2 P1 Former    3 3.40
3 P1 Former    6 3.45
4 P1 Former    9 3.20
5 P1 Former   15 2.95
6 P1 Former   19 2.40

2.2 save a copy of data in wide format

dtaW <- dta %>%
        spread(key = Time, value = FEV1)
        
names(dtaW)[3:9] <- paste0("Year", names(dtaW)[3:9])

2.3 make ellipses in pairwise plot

scatterplotMatrix(~ Year0 + Year3 + Year6 + Year9 + Year12 + Year15 + Year19 | Smoker, 
                  data = dtaW, smooth = F, by.group = TRUE,
                  ellipse = T, pch = c(1, 20),
                  col = c("firebrick", "forestgreen")) 

2.4 Means & Variances

aggregate(FEV1 ~ Time + Smoker, data = dta, mean, na.rm=T)
   Time  Smoker   FEV1
1     0 Current 3.2293
2     3 Current 3.1195
3     6 Current 3.0915
4     9 Current 2.8707
5    12 Current 2.7979
6    15 Current 2.6810
7    19 Current 2.4977
8     0  Former 3.5191
9     3  Former 3.5778
10    6  Former 3.2646
11    9  Former 3.1717
12   12  Former 3.1362
13   15  Former 2.8700
14   19  Former 2.9079
aggregate(FEV1 ~ Time + Smoker, data = dta, var, na.rm=T)
   Time  Smoker    FEV1
1     0 Current 0.34029
2     3 Current 0.30071
3     6 Current 0.31916
4     9 Current 0.31650
5    12 Current 0.29699
6    15 Current 0.26426
7    19 Current 0.25466
8     0  Former 0.23583
9     3  Former 0.39410
10    6  Former 0.38720
11    9  Former 0.33805
12   12  Former 0.37605
13   15  Former 0.37094
14   19  Former 0.41788

2.5 Covariance matrices: Former Smoker

dtaW %>%
 filter( Smoker == "Former") %>%
 select( -(1:2)) %>%
 var(na.rm = T)
         Year0   Year3   Year6   Year9  Year12  Year15  Year19
Year0  0.36174 0.34426 0.35508 0.23582 0.23983 0.30918 0.21440
Year3  0.34426 0.35669 0.34792 0.24344 0.24556 0.30143 0.21413
Year6  0.35508 0.34792 0.39903 0.25833 0.25028 0.33200 0.26747
Year9  0.23582 0.24344 0.25833 0.19267 0.18278 0.23229 0.19534
Year12 0.23983 0.24556 0.25028 0.18278 0.20222 0.22011 0.14067
Year15 0.30918 0.30143 0.33200 0.23229 0.22011 0.30467 0.23510
Year19 0.21440 0.21413 0.26747 0.19534 0.14067 0.23510 0.35097

2.6 Covariance matrices: Current Smoker

dtaW %>%
 filter( Smoker == "Current") %>%
 select( -(1:2)) %>%
 var(na.rm = T)
         Year0   Year3   Year6   Year9  Year12  Year15  Year19
Year0  0.20686 0.19587 0.15502 0.17409 0.15801 0.15272 0.15502
Year3  0.19587 0.24184 0.16763 0.17708 0.16478 0.18184 0.16750
Year6  0.15502 0.16763 0.16186 0.13285 0.14849 0.14792 0.15713
Year9  0.17409 0.17708 0.13285 0.24970 0.15014 0.15082 0.15334
Year12 0.15801 0.16478 0.14849 0.15014 0.18944 0.17100 0.15285
Year15 0.15272 0.18184 0.14792 0.15082 0.17100 0.18733 0.16541
Year19 0.15502 0.16750 0.15713 0.15334 0.15285 0.16541 0.18767

2.7 Data Visualization

dta$Smoker <- factor(dta$Smoker, levels = c('Former', 'Current'))

#
theme_set(theme_bw())

# means with error bars
ggplot(dta, aes(Time, FEV1,
                group = Smoker, shape = Smoker, linetype = Smoker)) +
 stat_summary(fun = mean, geom = "line") +
 stat_summary(fun = mean, geom = "point") +
 stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
 scale_shape_manual(values = c(1, 2)) +
 labs(x = "Time (years since baseline)", y = "Mean FEV1 (liters)",
      linetype = "Smoker", shape = "Smoker") +
 theme(legend.justification = c(1, 1), legend.position = c(1, 1),
       legend.background = element_rect(fill = "white", color = "black"))

2.8 Model 0: Simple

dta <- mutate(dta, Time_f = as.factor(Time))

# 
summary(m0 <- gls(FEV1 ~ Smoker*Time_f, data = dta))
Generalized least squares fit by REML
  Model: FEV1 ~ Smoker * Time_f 
  Data: dta 
     AIC    BIC  logLik
  1359.3 1428.7 -664.64

Coefficients:
                         Value Std.Error t-value p-value
(Intercept)             3.5191   0.11715 30.0405  0.0000
SmokerCurrent          -0.2898   0.13205 -2.1949  0.0285
Time_f3                 0.0586   0.15942  0.3679  0.7131
Time_f6                -0.2545   0.15810 -1.6097  0.1079
Time_f9                -0.3475   0.15571 -2.2315  0.0259
Time_f12               -0.3829   0.15687 -2.4411  0.0149
Time_f15               -0.6491   0.16393 -3.9597  0.0001
Time_f19               -0.6113   0.15810 -3.8664  0.0001
SmokerCurrent:Time_f3  -0.1685   0.18014 -0.9352  0.3500
SmokerCurrent:Time_f6   0.1167   0.17960  0.6495  0.5162
SmokerCurrent:Time_f9  -0.0111   0.17796 -0.0625  0.9502
SmokerCurrent:Time_f12 -0.0485   0.17949 -0.2700  0.7872
SmokerCurrent:Time_f15  0.1008   0.18685  0.5395  0.5897
SmokerCurrent:Time_f19 -0.1203   0.18159 -0.6626  0.5078

 Correlation: 
                       (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
SmokerCurrent          -0.887                                                 
Time_f3                -0.735  0.652                                          
Time_f6                -0.741  0.657  0.544                                   
Time_f9                -0.752  0.667  0.553  0.557                            
Time_f12               -0.747  0.663  0.549  0.553  0.562                     
Time_f15               -0.715  0.634  0.525  0.529  0.538  0.534              
Time_f19               -0.741  0.657  0.544  0.549  0.557  0.553  0.529       
SmokerCurrent:Time_f3   0.650 -0.733 -0.885 -0.482 -0.489 -0.486 -0.465 -0.482
SmokerCurrent:Time_f6   0.652 -0.735 -0.479 -0.880 -0.491 -0.487 -0.466 -0.483
SmokerCurrent:Time_f9   0.658 -0.742 -0.484 -0.488 -0.875 -0.492 -0.470 -0.488
SmokerCurrent:Time_f12  0.653 -0.736 -0.480 -0.484 -0.491 -0.874 -0.466 -0.484
SmokerCurrent:Time_f15  0.627 -0.707 -0.461 -0.465 -0.472 -0.468 -0.877 -0.465
SmokerCurrent:Time_f19  0.645 -0.727 -0.474 -0.478 -0.485 -0.482 -0.461 -0.871
                       SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
SmokerCurrent                                              
Time_f3                                                    
Time_f6                                                    
Time_f9                                                    
Time_f12                                                   
Time_f15                                                   
Time_f19                                                   
SmokerCurrent:Time_f3                                      
SmokerCurrent:Time_f6   0.539                              
SmokerCurrent:Time_f9   0.544  0.546                       
SmokerCurrent:Time_f12  0.539  0.541  0.546                
SmokerCurrent:Time_f15  0.518  0.520  0.524  0.520         
SmokerCurrent:Time_f19  0.533  0.535  0.540  0.535   0.514 

Standardized residuals:
      Min        Q1       Med        Q3       Max 
-2.971581 -0.673775  0.050432  0.667481  3.139465 

Residual standard error: 0.56181 
Degrees of freedom: 771 total; 757 residual

2.9 Model 1: Variance Component for each type of smoker across time.

summary(m1 <- gls(FEV1 ~ Smoker*Time_f,
                  weights = varIdent(form = ~ 1 | Time_f*Smoker),
                  data = dta))
Generalized least squares fit by REML
  Model: FEV1 ~ Smoker * Time_f 
  Data: dta 
   AIC    BIC  logLik
  1378 1507.6 -660.99

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Time_f * Smoker 
 Parameter estimates:
  0*Former   3*Former   6*Former   9*Former  15*Former  19*Former  0*Current 
    1.0000     1.2927     1.2813     1.1972     1.2542     1.3311     1.2012 
 3*Current  6*Current  9*Current 12*Current 15*Current 19*Current  12*Former 
    1.1292     1.1633     1.1585     1.1222     1.0586     1.0392     1.2628 

Coefficients:
                         Value Std.Error t-value p-value
(Intercept)             3.5191   0.10126  34.754  0.0000
SmokerCurrent          -0.2898   0.11940  -2.427  0.0154
Time_f3                 0.0586   0.15764   0.372  0.7100
Time_f6                -0.2545   0.15518  -1.640  0.1014
Time_f9                -0.3475   0.14670  -2.368  0.0181
Time_f12               -0.3829   0.15238  -2.513  0.0122
Time_f15               -0.6491   0.16034  -4.048  0.0001
Time_f19               -0.6113   0.15868  -3.852  0.0001
SmokerCurrent:Time_f3  -0.1685   0.17894  -0.941  0.3468
SmokerCurrent:Time_f6   0.1167   0.17797   0.655  0.5124
SmokerCurrent:Time_f9  -0.0111   0.17102  -0.065  0.9482
SmokerCurrent:Time_f12 -0.0485   0.17576  -0.276  0.7828
SmokerCurrent:Time_f15  0.1008   0.18257   0.552  0.5811
SmokerCurrent:Time_f19 -0.1203   0.18062  -0.666  0.5055

 Correlation: 
                       (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
SmokerCurrent          -0.848                                                 
Time_f3                -0.642  0.545                                          
Time_f6                -0.653  0.553  0.419                                   
Time_f9                -0.690  0.585  0.443  0.450                            
Time_f12               -0.665  0.564  0.427  0.434  0.459                     
Time_f15               -0.632  0.536  0.406  0.412  0.436  0.420              
Time_f19               -0.638  0.541  0.410  0.416  0.440  0.424  0.403       
SmokerCurrent:Time_f3   0.566 -0.667 -0.881 -0.369 -0.391 -0.376 -0.357 -0.361
SmokerCurrent:Time_f6   0.569 -0.671 -0.365 -0.872 -0.393 -0.378 -0.359 -0.363
SmokerCurrent:Time_f9   0.592 -0.698 -0.380 -0.386 -0.858 -0.393 -0.374 -0.378
SmokerCurrent:Time_f12  0.576 -0.679 -0.370 -0.376 -0.398 -0.867 -0.364 -0.368
SmokerCurrent:Time_f15  0.555 -0.654 -0.356 -0.362 -0.383 -0.369 -0.878 -0.354
SmokerCurrent:Time_f19  0.561 -0.661 -0.360 -0.366 -0.387 -0.373 -0.354 -0.879
                       SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
SmokerCurrent                                              
Time_f3                                                    
Time_f6                                                    
Time_f9                                                    
Time_f12                                                   
Time_f15                                                   
Time_f19                                                   
SmokerCurrent:Time_f3                                      
SmokerCurrent:Time_f6   0.448                              
SmokerCurrent:Time_f9   0.466  0.468                       
SmokerCurrent:Time_f12  0.453  0.456  0.474                
SmokerCurrent:Time_f15  0.436  0.439  0.457  0.444         
SmokerCurrent:Time_f19  0.441  0.444  0.462  0.449   0.432 

Standardized residuals:
      Min        Q1       Med        Q3       Max 
-3.044453 -0.681392  0.048732  0.674197  2.876223 

Residual standard error: 0.48562 
Degrees of freedom: 771 total; 757 residual

2.10 Model 2: Variance Component across time.

summary(m2 <- gls(FEV1 ~ Smoker*Time_f,
                  weights = varExp(form = ~ Time),
                  data = dta))
Generalized least squares fit by REML
  Model: FEV1 ~ Smoker * Time_f 
  Data: dta 
     AIC  BIC  logLik
  1360.9 1435 -664.46

Variance function:
 Structure: Exponential of variance covariate
 Formula: ~Time 
 Parameter estimates:
     expon 
-0.0025288 

Coefficients:
                         Value Std.Error t-value p-value
(Intercept)             3.5191   0.11976 29.3837  0.0000
SmokerCurrent          -0.2898   0.13500 -2.1469  0.0321
Time_f3                 0.0586   0.16241  0.3611  0.7181
Time_f6                -0.2545   0.16054 -1.5852  0.1133
Time_f9                -0.3475   0.15764 -2.2041  0.0278
Time_f12               -0.3829   0.15827 -2.4194  0.0158
Time_f15               -0.6491   0.16458 -3.9443  0.0001
Time_f19               -0.6113   0.15826 -3.8625  0.0001
SmokerCurrent:Time_f3  -0.1685   0.18352 -0.9180  0.3589
SmokerCurrent:Time_f6   0.1167   0.18235  0.6397  0.5225
SmokerCurrent:Time_f9  -0.0111   0.18011 -0.0618  0.9508
SmokerCurrent:Time_f12 -0.0485   0.18101 -0.2678  0.7889
SmokerCurrent:Time_f15  0.1008   0.18750  0.5376  0.5910
SmokerCurrent:Time_f19 -0.1203   0.18160 -0.6626  0.5078

 Correlation: 
                       (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
SmokerCurrent          -0.887                                                 
Time_f3                -0.737  0.654                                          
Time_f6                -0.746  0.662  0.550                                   
Time_f9                -0.760  0.674  0.560  0.567                            
Time_f12               -0.757  0.671  0.558  0.565  0.575                     
Time_f15               -0.728  0.646  0.537  0.543  0.553  0.551              
Time_f19               -0.757  0.671  0.558  0.565  0.575  0.573  0.551       
SmokerCurrent:Time_f3   0.653 -0.736 -0.885 -0.487 -0.496 -0.494 -0.475 -0.494
SmokerCurrent:Time_f6   0.657 -0.740 -0.484 -0.880 -0.499 -0.497 -0.478 -0.497
SmokerCurrent:Time_f9   0.665 -0.750 -0.490 -0.496 -0.875 -0.503 -0.484 -0.503
SmokerCurrent:Time_f12  0.662 -0.746 -0.488 -0.494 -0.503 -0.874 -0.481 -0.501
SmokerCurrent:Time_f15  0.639 -0.720 -0.471 -0.477 -0.485 -0.483 -0.878 -0.483
SmokerCurrent:Time_f19  0.660 -0.743 -0.486 -0.492 -0.501 -0.499 -0.480 -0.871
                       SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
SmokerCurrent                                              
Time_f3                                                    
Time_f6                                                    
Time_f9                                                    
Time_f12                                                   
Time_f15                                                   
Time_f19                                                   
SmokerCurrent:Time_f3                                      
SmokerCurrent:Time_f6   0.545                              
SmokerCurrent:Time_f9   0.551  0.555                       
SmokerCurrent:Time_f12  0.549  0.552  0.559                
SmokerCurrent:Time_f15  0.530  0.533  0.540  0.537         
SmokerCurrent:Time_f19  0.547  0.550  0.557  0.554   0.535 

Standardized residuals:
      Min        Q1       Med        Q3       Max 
-2.928743 -0.666776  0.050448  0.667453  3.165435 

Residual standard error: 0.57437 
Degrees of freedom: 771 total; 757 residual

2.11 Model 3: Exponential of variance covariate

summary(m3 <- gls(FEV1 ~ Smoker*Time_f,
                  weights = varExp(form = ~ Time | Smoker),
                  data = dta))
Generalized least squares fit by REML
  Model: FEV1 ~ Smoker * Time_f 
  Data: dta 
     AIC    BIC  logLik
  1358.2 1436.9 -662.11

Variance function:
 Structure: Exponential of variance covariate, different strata
 Formula: ~Time | Smoker 
 Parameter estimates:
    Former    Current 
 0.0054259 -0.0061700 

Coefficients:
                         Value Std.Error t-value p-value
(Intercept)             3.5191   0.12014 29.2929  0.0000
SmokerCurrent          -0.2898   0.13542 -2.1403  0.0326
Time_f3                 0.0586   0.16472  0.3560  0.7219
Time_f6                -0.2545   0.16458 -1.5463  0.1224
Time_f9                -0.3475   0.16320 -2.1291  0.0336
Time_f12               -0.3829   0.16574 -2.3103  0.0211
Time_f15               -0.6491   0.17524 -3.7042  0.0002
Time_f19               -0.6113   0.17030 -3.5894  0.0004
SmokerCurrent:Time_f3  -0.1685   0.18549 -0.9082  0.3640
SmokerCurrent:Time_f6   0.1167   0.18562  0.6285  0.5299
SmokerCurrent:Time_f9  -0.0111   0.18448 -0.0603  0.9519
SmokerCurrent:Time_f12 -0.0485   0.18684 -0.2594  0.7954
SmokerCurrent:Time_f15  0.1008   0.19594  0.5144  0.6071
SmokerCurrent:Time_f19 -0.1203   0.19093 -0.6302  0.5288

 Correlation: 
                       (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
SmokerCurrent          -0.887                                                 
Time_f3                -0.729  0.647                                          
Time_f6                -0.730  0.648  0.532                                   
Time_f9                -0.736  0.653  0.537  0.537                            
Time_f12               -0.725  0.643  0.529  0.529  0.534                     
Time_f15               -0.686  0.608  0.500  0.500  0.505  0.497              
Time_f19               -0.705  0.626  0.514  0.515  0.519  0.511  0.484       
SmokerCurrent:Time_f3   0.648 -0.730 -0.888 -0.473 -0.477 -0.469 -0.444 -0.457
SmokerCurrent:Time_f6   0.647 -0.730 -0.472 -0.887 -0.476 -0.469 -0.444 -0.457
SmokerCurrent:Time_f9   0.651 -0.734 -0.475 -0.475 -0.885 -0.472 -0.446 -0.459
SmokerCurrent:Time_f12  0.643 -0.725 -0.469 -0.469 -0.473 -0.887 -0.441 -0.454
SmokerCurrent:Time_f15  0.613 -0.691 -0.447 -0.448 -0.451 -0.444 -0.894 -0.433
SmokerCurrent:Time_f19  0.629 -0.709 -0.459 -0.459 -0.463 -0.456 -0.431 -0.892
                       SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
SmokerCurrent                                              
Time_f3                                                    
Time_f6                                                    
Time_f9                                                    
Time_f12                                                   
Time_f15                                                   
Time_f19                                                   
SmokerCurrent:Time_f3                                      
SmokerCurrent:Time_f6   0.533                              
SmokerCurrent:Time_f9   0.536  0.536                       
SmokerCurrent:Time_f12  0.529  0.529  0.532                
SmokerCurrent:Time_f15  0.505  0.504  0.507  0.501         
SmokerCurrent:Time_f19  0.518  0.517  0.521  0.514   0.490 

Standardized residuals:
      Min        Q1       Med        Q3       Max 
-2.951764 -0.675274  0.046833  0.672801  2.868361 

Residual standard error: 0.57615 
Degrees of freedom: 771 total; 757 residual

2.12 Model 4: Compound symmetry

summary(m4 <- gls(FEV1 ~ Smoker*Time_f,
                  correlation = corSymm(form = ~ 1 | ID),
                  weights = varIdent(form = ~ 1 | Time_f*Smoker),
                  data = dta))
Generalized least squares fit by REML
  Model: FEV1 ~ Smoker * Time_f 
  Data: dta 
    AIC    BIC  logLik
  361.1 587.94 -131.55

Correlation Structure: General
 Formula: ~1 | ID 
 Parameter estimate(s):
 Correlation: 
  1     2     3     4     5     6    
2 0.852                              
3 0.856 0.888                        
4 0.849 0.846 0.894                  
5 0.841 0.853 0.891 0.892            
6 0.861 0.866 0.896 0.872 0.935      
7 0.753 0.756 0.844 0.789 0.789 0.866
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Time_f * Smoker 
 Parameter estimates:
  0*Former   3*Former   6*Former   9*Former  15*Former  19*Former  0*Current 
   1.00000    1.15483    1.09813    1.00951    1.19701    1.07961    1.12418 
 6*Current  9*Current 12*Current 15*Current 19*Current  3*Current  12*Former 
   1.05188    1.10138    1.03810    1.01589    0.96763    1.03632    1.05276 

Coefficients:
                         Value Std.Error t-value p-value
(Intercept)             3.4847  0.098506  35.375  0.0000
SmokerCurrent          -0.2579  0.115994  -2.223  0.0265
Time_f3                -0.0241  0.066099  -0.365  0.7156
Time_f6                -0.2212  0.062781  -3.524  0.0005
Time_f9                -0.3289  0.059362  -5.540  0.0000
Time_f12               -0.3949  0.062233  -6.346  0.0000
Time_f15               -0.4790  0.067774  -7.068  0.0000
Time_f19               -0.5987  0.065807  -9.098  0.0000
SmokerCurrent:Time_f3  -0.0653  0.074608  -0.876  0.3816
SmokerCurrent:Time_f6   0.0639  0.071681   0.891  0.3730
SmokerCurrent:Time_f9  -0.0529  0.069333  -0.763  0.4455
SmokerCurrent:Time_f12 -0.0149  0.071996  -0.207  0.8362
SmokerCurrent:Time_f15 -0.1037  0.076426  -1.357  0.1752
SmokerCurrent:Time_f19 -0.1270  0.075320  -1.687  0.0921

 Correlation: 
                       (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
SmokerCurrent          -0.849                                                 
Time_f3                -0.129  0.110                                          
Time_f6                -0.197  0.168  0.608                                   
Time_f9                -0.334  0.283  0.521  0.662                            
Time_f12               -0.289  0.245  0.524  0.647  0.700                     
Time_f15               -0.062  0.053  0.515  0.588  0.572  0.719              
Time_f19               -0.258  0.219  0.493  0.627  0.613  0.653  0.690       
SmokerCurrent:Time_f3   0.115 -0.204 -0.886 -0.539 -0.462 -0.465 -0.456 -0.437
SmokerCurrent:Time_f6   0.173 -0.249 -0.533 -0.876 -0.579 -0.566 -0.515 -0.549
SmokerCurrent:Time_f9   0.286 -0.332 -0.446 -0.566 -0.856 -0.600 -0.490 -0.524
SmokerCurrent:Time_f12  0.250 -0.326 -0.453 -0.559 -0.605 -0.864 -0.622 -0.565
SmokerCurrent:Time_f15  0.055 -0.154 -0.457 -0.522 -0.507 -0.638 -0.887 -0.612
SmokerCurrent:Time_f19  0.225 -0.322 -0.431 -0.548 -0.535 -0.571 -0.603 -0.874
                       SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
SmokerCurrent                                              
Time_f3                                                    
Time_f6                                                    
Time_f9                                                    
Time_f12                                                   
Time_f15                                                   
Time_f19                                                   
SmokerCurrent:Time_f3                                      
SmokerCurrent:Time_f6   0.620                              
SmokerCurrent:Time_f9   0.528  0.661                       
SmokerCurrent:Time_f12  0.539  0.650  0.692                
SmokerCurrent:Time_f15  0.532  0.600  0.579  0.724         
SmokerCurrent:Time_f19  0.512  0.632  0.604  0.654   0.701 

Standardized residuals:
      Min        Q1       Med        Q3       Max 
-3.008572 -0.652245  0.053685  0.646604  3.177245 

Residual standard error: 0.54119 
Degrees of freedom: 771 total; 757 residual

After controlling the baseline, current smoker showed worse forced expiratory volume which decreased as long as their smoking duration.

2.13 Model Comparison

anova(m0, m1, m2, m3, m4)
   Model df    AIC     BIC  logLik   Test L.Ratio p-value
m0     1 15 1359.3 1428.72 -664.64                       
m1     2 28 1378.0 1507.60 -660.99 1 vs 2    7.30  0.8860
m2     3 16 1360.9 1435.00 -664.46 2 vs 3    6.95  0.8612
m3     4 17 1358.2 1436.91 -662.11 3 vs 4    4.71  0.0300
m4     5 49  361.1  587.94 -131.55 4 vs 5 1061.11  <.0001

The results revealed that Compound symmetry Model may be the best fit model here.

2.14 Residual plot

plot(m4, resid(., type = "pearson") ~ fitted(.) | Smoker,
     layout = c(2, 1), aspect = 2, abline = 0, pch = 20, cex = .8,
     xlab = "Ftted values", ylab = "Pearson residuals")

2.15

dta_m4 <- data.frame(dta, fit = fitted(m4), fit0 = fitted(m0))

# use fitted values to get error bars and see if they cover the data points
ggplot(dta_m4, aes(Time, fit, linetype = Smoker, shape = Smoker,
                 group = Smoker)) +
 stat_summary(fun = mean, geom = "line") +
 stat_summary(fun = mean, geom = "point") +
 stat_summary(aes(Time, fit0), fun = mean, geom = "point", col = "skyblue") +
 stat_summary(aes(Time, FEV1), fun.data = mean_se, 
              geom = "pointrange", col = "firebrick", alpha = .3) +
 scale_shape_manual(values = c(1, 2)) +
 labs(x = "Time (years since baseline)", y = "FEV1", linetype = "Smoker",
      shape = "Smoker") +
 theme(legend.position = c(0.9, 0.9),
       legend.background = element_rect(fill = "white", color = "black"))

3 In-class Exercise 4:

Reproduce the results of analysis with the markdown file in the reading programs example and compare them with that of an anlysis of covariance.

pacman::p_load(knitr, simstudy, tidyverse, nlme, rms)

3.1 The study

An experiment was conducted to study two new reading programs in the fourth grade. Thirty subjects were randomly assigned to three conditions: a control (traditional program), and two experimental programs. Before the study took place, a test was administered to obtain grade-equivalent reading levels for the subjects. At the end of the study, six months later, similar reading evaluation scores were obtained.

Each subject has equal chance to be assigned to either the Control group or the Experimental groups. The assignment has no influence on the pre measurement. The post measurement depends on group and on the baseline measurement (Pre).

3.2 Data

dta <- read.csv('data/reading_g3.csv')

3.3 Wide to long format

dtaL <- gather(data=dta, key=Time, value=Score, Pre:Post, factor_key=TRUE)  
kable(head(arrange(dtaL, ID), 10))
Group ID Time Score
C S11 Pre 32
C S11 Post 41
C S12 Pre 42
C S12 Post 46
C S13 Pre 45
C S13 Post 48
C S14 Pre 46
C S14 Post 54
C S15 Pre 49
C S15 Post 52

3.4 Data Visualization

We inspect the data with a boxplot and a scatter plot.

boxp <- ggplot(dtaL , 
               aes(x=Time, y=Score, col=Group))+ 
  geom_boxplot(outlier.size=0) + 
  geom_point(aes(fill=Group, col=NULL), 
             shape=21, alpha=0.5, size=2, 
             position=position_jitterdodge(jitter.width=0.2)) + 
  theme_bw() + 
  xlab("") 
boxp

ggplot(dta, 
       aes (x=Pre, y=Post, col=Group)) + 
  geom_point() + 
  geom_smooth(method="lm", se=FALSE) + 
  labs(x="Pre-test reading score",
       y="Post-test reading score")+
  theme_minimal()

3.5 The constrained baseline model

The terms of the most basic constrained model are ‘Time’ and ‘Time * Group’. By omitting ‘Group’, there is no term allowing for a difference in means at baseline between the groups. Consequently, the baseline means are assumed to be equal. However, a general rule in regression modeling is that when there is an interaction in the model, the lower ranked effects should also be included in the model. For this reason, R automatically includes Time and Group as main effects in the model when including the interaction term Time*Group. To avoid to have Group as the main effect in our model, we will create an alternative model matrix: Xalt.

X <- model.matrix(~ Time * Group, data = dtaL)
colnames(X)
[1] "(Intercept)"      "TimePost"         "GroupT1"          "GroupT2"         
[5] "TimePost:GroupT1" "TimePost:GroupT2"
Xalt <- X[, c("TimePost", "TimePost:GroupT1", "TimePost:GroupT2")] 
colnames(Xalt)
[1] "TimePost"         "TimePost:GroupT1" "TimePost:GroupT2"

3.6 Analysis

The gls() function of the ‘nlme’ package can be used for modeling data.

The standard deviations and correlations that should be estimated are defined by the constant variance function varIdent(). By setting weights = varIdent(form = ~ 1 | Time) a separate standard deviation will be estimated for each time point and a separate correlation will be estimated for each pair of time points (= unstructured variance-covariance matrix). In this example, we expect an estimated standard deviation for Time=Pre, an estimated standard deviation for Time=Post and one estimated correlation between the pre- and post reading measurements of the same subject.

By setting weights = varIdent(form = ~ 1 | Time:Group), a separate variance is estimated for each combination of Group and Time (Pre-C Post-C Pre-T1 Post-T1 Pre-T2 Post-T2).

The argument correlation=corSymm (form = ~ 1 | ID) defines the subject levels. The correlation structure is assumed to apply only to observations within the same subject (here: ID); observations from different subjects (a different value for ‘ID’) are assumed to be uncorrelated.

m0_gls <- gls(Score ~ Xalt, 
                weights = varIdent(form = ~ 1 | Time), 
                correlation=corSymm (form = ~ 1 | ID), 
                data = dtaL)
summary(m0_gls)
Generalized least squares fit by REML
  Model: Score ~ Xalt 
  Data: dtaL 
     AIC    BIC  logLik
  350.21 364.39 -168.11

Correlation Structure: General
 Formula: ~1 | ID 
 Parameter estimate(s):
 Correlation: 
  1   
2 0.91
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Time 
 Parameter estimates:
   Pre   Post 
1.0000 1.0164 

Coefficients:
                      Value Std.Error t-value p-value
(Intercept)          50.500    1.2297  41.068  0.0000
XaltTimePost          6.381    0.9027   7.069  0.0000
XaltTimePost:GroupT1 -2.803    1.2699  -2.208  0.0314
XaltTimePost:GroupT2  7.159    1.2699   5.637  0.0000

 Correlation: 
                     (Intr) XltTmP XTP:GT1
XaltTimePost         -0.102               
XaltTimePost:GroupT1  0.000 -0.703        
XaltTimePost:GroupT2  0.000 -0.703  0.500 

Standardized residuals:
     Min       Q1      Med       Q3      Max 
-2.74679 -0.38362  0.13742  0.74751  1.44937 

Residual standard error: 6.7351 
Degrees of freedom: 60 total; 56 residual

The estimated correlation between Pre and Post is 0.91. The estimated standard deviation for the Pre measurement is 6.735 and the estimated standard deviation for the Post measurement is 1.016 * 6.735.

3.7 Prediction

We can obtain predicted means using the gls() function. As expected, the predicted value is the same for three groups at baseline (Time == Pre).

To obtain the confidence intervals, the Gls() function of the rms package can be used exactly as the gls() function, leading to exactly the same predictions but with confidence intervals.

m0_Gls <- Gls(Score ~ Xalt, 
                weights = varIdent(form = ~ 1 | Time), 
                correlation=corSymm (form = ~ 1 | ID), 
                data = dtaL)
predictions <- cbind(dtaL, predict(m0_Gls, dtaL, conf.int=0.95))
predictions$Group2 <- factor(predictions$Group, levels=c("All", "C", "T1", "T2"))
predictions$Group2[predictions$Time=="Pre"] <- "All"
pd <- position_dodge(.08)
limits <- aes(ymax=upper , ymin=lower, shape=Group2)
pCI <- ggplot(predictions, 
              aes(y=linear.predictors, 
                  x=Time)) + 
  geom_errorbar(limits, 
                width=0.09, 
                position=pd) +
  geom_line(aes(group=Group, 
                y=linear.predictors), 
            position=pd)+
  geom_point(aes(fill=Group2, shape=Group2),
             position=pd, 
             size=rel(4)) + 
  scale_fill_manual(values=c("white", "gray80", "gray20", "black")) + 
  theme_minimal() + 
  theme(panel.grid.major=element_blank(), 
        legend.title=element_blank(), 
        text=element_text(size=14), 
        legend.position="bottom") + 
  labs(x="Time", 
       y="Estimated mean with corresponding 95% CI")
pCI

When we constrained their performance before the exp as baseline, we found that the T2 group exhibited better performance than other groups after the exp.

3.8 Non-randomized studies

In the case of non-randomized experiments or studies an equal mean at baseline cannot be assumed. Consequently, applying constrained model is not appropriate.

3.9 Reference

Timm, N.H. (1975). Multivariate Analysis with Applications in Education and Psychology, p. 490.

Liu, G.F., Lu, K., Mogg, R., et al. (2009). Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Statistics in Medicine, 28, 2509-2530.