1 Inclass 1

1.1 Info

Problem: Replicate the analysis reported in the arrhythmogenic dose of epinephrine example.
Data: The arrhythmogenic dose of epinephrine (ADE) is determined by infusing epinephrine into an animal until arrhythmia criterion is reached. The arrhythmia criterion was the occurrence of four intermittent or continuous premature ventricular contractions. Once the criterion has been reached, the infusion required to produce it is recorded. The ADE is then calculated by multiplying the duration of the infusion by the infusion rate. The data give the ADE for six dogs measured at ten consecutive times.

Column 1: Dog ID
Column 2: Time in hours
Column 3: Arrhythmogenic dose of epinephrine (ADE)

1.2 Data input

# Loading package
pacman::p_load(tidyverse, nlme)

# Input data
dta1 <- read.csv("ade.csv")

str(dta1)

## 'data.frame':    60 obs. of  3 variables:
##  $ Dog : chr  "D11" "D11" "D11" "D11" ...
##  $ Hour: num  0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ...
##  $ ADE : num  5.7 4.7 4.8 4.9 3.88 3.51 2.8 2.6 2.5 2.5 ...

head(dta1)

##   Dog Hour  ADE
## 1 D11  0.0 5.70
## 2 D11  0.5 4.70
## 3 D11  1.0 4.80
## 4 D11  1.5 4.90
## 5 D11  2.0 3.88
## 6 D11  2.5 3.51

dta1 <- mutate(dta1, Hour_f = factor(Hour))

theme_set(theme_bw())

1.3 Plot

1.3.1 Line chart with error bar

# means with error bars
ggplot(dta1, aes(Hour, ADE)) +
 geom_point(alpha = .3) +
 stat_summary(fun = mean, geom = "line") +
 stat_summary(fun = mean, geom = "point") +
 stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
 labs(x = "Time (in hours)", y = "ADE")

A reducing trend of ADE value and variation by time among the six dog.

1.3.2 Scatterplot matrix

car::scatterplotMatrix(~ ADE + Hour_f,
                  data=dta1,
                  col=8,
                  smooth=FALSE, 
                  regLine=FALSE,
                  ellipse=list(levels=c(0.68, 0.975), 
                               fill=TRUE, 
                               fill.alpha=0.1))

It’s not compound symmetry (may not consider apply “corCompSymm”), should consider time related: AR1.

1.4 Fit Linear Model

Generalized Least Squares

1.4.1 m1

Generalized least squares: the effect of Hour on ADE
varIdent: When Hour_f is a factor, variance is allowed to be different for each level of the Hour_f

summary(m1 <- gls(ADE ~ Hour, 
                  weights = varIdent(form = ~ 1 | Hour_f), 
                  data = dta1))

## Generalized least squares fit by REML
##   Model: ADE ~ Hour 
##   Data: dta1 
##        AIC      BIC    logLik
##   196.6523 221.3776 -86.32617
## 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | Hour_f 
##  Parameter estimates:
##          0        0.5          1        1.5          2        2.5          3 
## 1.00000000 0.55447427 0.64132902 0.51921510 0.43187908 0.08245758 0.10901472 
##        3.5          4        4.5 
## 0.17046144 0.21770576 0.11133075 
## 
## Coefficients:
##                 Value  Std.Error   t-value p-value
## (Intercept)  5.435552 0.27441443 19.807821       0
## Hour        -0.701874 0.08386437 -8.369153       0
## 
##  Correlation: 
##      (Intr)
## Hour -0.964
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.3854719 -0.6272929 -0.1826742  0.5824007  2.3554705 
## 
## Residual standard error: 3.508619 
## Degrees of freedom: 60 total; 58 residual

The SD of ADE in each time points are decreasing (1 to 0.11).
The correlation between ADE and Hour is -0.964. (ADE is reduced by time) Y_t = β₀ + β₁ × hour_t
The estimate value of ADE= 5.44 - 0.7 × hour.

1.4.2 m2 (AR1)

Apply corAR1 (form = ~ Hour | Dog)
The correlation structure is assumed to apply only to observations within the same grouping level (Dog).

summary(m2 <- gls(ADE ~ Hour,
                  weights = varIdent(form = ~ 1 | Hour_f),
                  correlation = corAR1(form = ~ Hour | Dog),
                  data = dta1))

## Generalized least squares fit by REML
##   Model: ADE ~ Hour 
##   Data: dta1 
##        AIC      BIC    logLik
##   168.5913 195.3771 -71.29566
## 
## Correlation Structure: ARMA(1,0)
##  Formula: ~Hour | Dog 
##  Parameter estimate(s):
##      Phi1 
## 0.7553498 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | Hour_f 
##  Parameter estimates:
##          0        0.5          1        1.5          2        2.5          3 
## 1.00000000 0.32803241 0.37194739 0.28590639 0.34018814 0.12571538 0.09896982 
##        3.5          4        4.5 
## 0.16894933 0.22219099 0.09978382 
## 
## Coefficients:
##                 Value Std.Error   t-value p-value
## (Intercept)  4.749848 0.3999065 11.877395       0
## Hour        -0.599160 0.1096671 -5.463442       0
## 
##  Correlation: 
##      (Intr)
## Hour -0.94 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -1.43566691 -0.09530459  0.29216375  1.34205811  5.06634565 
## 
## Residual standard error: 3.278488 
## Degrees of freedom: 60 total; 58 residual

knitr::kable(intervals(m2)$corStruct)

	lower	est.	upper
Phi1	0.1414353	0.7553498	0.9496501

knitr::kable(t(intervals(m2)$coef[1, c(2,1,3)]))

est.	lower	upper
4.749848	3.949348	5.550348

The fitted correlation parameter Phi=0.755.
The AIC of m2 (168.59) is better than m1 (196.65) due to consider the correlation within dogs?
The estimate value of ADE= 4.75 - 0.6 × hour.

1.4.3 m3 (CompSymm)

apply corCompSymm (form = ~ 1 | Dog)

summary(m3 <- gls(ADE ~ Hour,
                  weights = varIdent(form = ~ 1 | Hour_f),
                  correlation = corCompSymm(form = ~ 1 | Dog),
                  data = dta1))

## Generalized least squares fit by REML
##   Model: ADE ~ Hour 
##   Data: dta1 
##        AIC      BIC    logLik
##   161.6077 188.3935 -67.80387
## 
## Correlation Structure: Compound symmetry
##  Formula: ~1 | Dog 
##  Parameter estimate(s):
##       Rho 
## 0.7217933 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | Hour_f 
##  Parameter estimates:
##         0       0.5         1       1.5         2       2.5         3       3.5 
## 1.0000000 0.5100528 0.6140381 0.5228105 0.4479331 0.1852681 0.1129502 0.1611728 
##         4       4.5 
## 0.2309740 0.1131425 
## 
## Coefficients:
##                 Value  Std.Error   t-value p-value
## (Intercept)  4.494975 0.30766482 14.609974       0
## Hour        -0.529169 0.06981253 -7.579863       0
## 
##  Correlation: 
##      (Intr)
## Hour -0.957
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -0.66528429 -0.02390899  0.26405734  0.89073611  2.44460225 
## 
## Residual standard error: 3.832969 
## Degrees of freedom: 60 total; 58 residual

The correlation between ADE and Hour is -0.957. (similar to m1)
WHY the AIC of m3 is better than m1?

1.4.4 m3a (Exponential)

apply exponential time correlation structure: ρ=exp(−t/t₀)
varExp: Exponential of the variance covariate

summary(m3a <- gls(ADE ~ Hour,
                  weights = varExp(form = ~ Hour),
                  correlation = corCompSymm(form = ~ 1 | Dog),
                  data = dta1))

## Generalized least squares fit by REML
##   Model: ADE ~ Hour 
##   Data: dta1 
##        AIC      BIC    logLik
##   163.9703 174.2725 -76.98516
## 
## Correlation Structure: Compound symmetry
##  Formula: ~1 | Dog 
##  Parameter estimate(s):
##       Rho 
## 0.6145863 
## Variance function:
##  Structure: Exponential of variance covariate
##  Formula: ~Hour 
##  Parameter estimates:
##      expon 
## -0.4204529 
## 
## Coefficients:
##                 Value Std.Error   t-value p-value
## (Intercept)  5.302798 0.6777356  7.824287       0
## Hour        -0.674162 0.1346238 -5.007751       0
## 
##  Correlation: 
##      (Intr)
## Hour -0.982
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.1017685 -0.3945215 -0.1031040  0.3711706  2.7263751 
## 
## Residual standard error: 3.079988 
## Degrees of freedom: 60 total; 58 residual

1.4.5 m3b

add Dog as random effect

summary(m3b <- lme(ADE ~ Hour, random = ~ 1 | Dog,
                  weights = varIdent(form = ~ 1 | Hour_f),
                  correlation = corCompSymm(form = ~ 1 | Dog),
                  data = dta1))

## Linear mixed-effects model fit by REML
##  Data: dta1 
##        AIC     BIC    logLik
##   153.2468 182.093 -62.62338
## 
## Random effects:
##  Formula: ~1 | Dog
##         (Intercept) Residual
## StdDev:   0.3654553 3.928157
## 
## Correlation Structure: Compound symmetry
##  Formula: ~1 | Dog 
##  Parameter estimate(s):
##       Rho 
## 0.8457657 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | Hour_f 
##  Parameter estimates:
##          0        0.5          1        1.5          2        2.5          3 
## 1.00000000 0.54618337 0.63785632 0.53997928 0.47199948 0.16064282 0.06300446 
##        3.5          4        4.5 
## 0.21603493 0.25337758 0.18268767 
## Fixed effects: ADE ~ Hour 
##                 Value  Std.Error DF   t-value p-value
## (Intercept)  4.811462 0.23464110 53  20.50562       0
## Hour        -0.612618 0.05378694 53 -11.38973       0
##  Correlation: 
##      (Intr)
## Hour -0.745
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -0.9223391 -0.1299528  0.1645922  0.6995419  2.4095565 
## 
## Number of Observations: 60
## Number of Groups: 6

1.5 Model Comparison

1.5.1 m1 v.s. m2

anova(m1, m2)

##    Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## m1     1 12 196.6523 221.3776 -86.32617                        
## m2     2 13 168.5913 195.3771 -71.29566 1 vs 2 30.06102  <.0001

m2 is better than m1 due to consider the time related effect and the correlation structure within the same grouping level (Dog).

1.5.2 m1, m3, m3a

anova(m1, m3, m3a, m3b)

##     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## m1      1 12 196.6523 221.3776 -86.32617                        
## m3      2 13 161.6077 188.3935 -67.80387 1 vs 2 37.04460  <.0001
## m3a     3  5 163.9703 174.2725 -76.98516 2 vs 3 18.36258  0.0187
## m3b     4 14 153.2468 182.0930 -62.62338 3 vs 4 28.72355  0.0007

m3b might be the best model to predict?

1.6 Residual plot

plot(m3, resid(., type = "pearson") ~ fitted(.) | Dog,
     layout = c(6, 1), aspect = 2, abline = 0, pch = 20, cex = .8,
     xlab = "Ftted values", ylab = "Pearson residuals")

2 Inclass 2

2.1 Info

Problem: Replicate the analysis reported in the smoker example.
Data: In the Netherlands, residents in the rural area of Vlagtwedde in the north-east and residents in the urban, industrial area of Vlaardingen in the south-west, were recruited to participate in a study on chronic obstructive lung diseases. The participants, initially aged 15-44, were surveyed approximately every 3 years for up to 21 years. At each survey, information on respiratory symptoms and smoking status was collected by questionnaire and spirometry was performed. Pulmonary function was determined by spirometry and a measure of forced expiratory volume (FEV1) was obtained every three years for the first 15 years of the study, and also at year 19. The number of repeated measurements of FEV1 on each subject varied from 1 to 7.
Here the dataset is comprised of a sub-sample of 133 residents aged 36 or older at their entry into the study and whose smoking status did not change over the 19 years of follow-up. Each participant was either a current or former smoker. Current smoking was defined as smoking at least one cigarette per day.

Column 1: Subject ID
Column 2: Smoker status
Column 3: Time from baseline (years), 0, 3, 6, 9, 12, 15, 19
Column 4: Forced expiratory volume (FEV1)

2.2 Data input

pacman::p_load(tidyverse, nlme, car)

dta2 <- read.csv("fev_smoker.csv", h = T)

str(dta2)

## 'data.frame':    771 obs. of  4 variables:
##  $ ID    : chr  "P1" "P1" "P1" "P1" ...
##  $ Smoker: chr  "Former" "Former" "Former" "Former" ...
##  $ Time  : int  0 3 6 9 15 19 0 3 6 9 ...
##  $ FEV1  : num  3.4 3.4 3.45 3.2 2.95 2.4 3.1 3.15 3.5 2.95 ...

head(dta2)

##   ID Smoker Time FEV1
## 1 P1 Former    0 3.40
## 2 P1 Former    3 3.40
## 3 P1 Former    6 3.45
## 4 P1 Former    9 3.20
## 5 P1 Former   15 2.95
## 6 P1 Former   19 2.40

# save a copy of data in wide format
dta2W <- dta2 %>%
  spread(key = Time, value = FEV1)
# rename
names(dta2W)[3:9] <- paste0("Year", names(dta2W)[3:9])

2.3 Scatterplot matrix

2.3.1 By year

# make ellipses in pairwise plot
car::scatterplotMatrix(~ Year0 + Year3 + Year6 + Year9 + Year12 + Year15 + Year19,
                       data = dta2W, 
                       smooth = F, 
                       by.group = TRUE,
                       regLine = F, 
                       ellipse = T, 
                       diagonal = T, 
                       pch = c(1, 20))

2.3.2 By smoker group

Divide the smoker type into current(firebrick) and former(forestgreen)

car::scatterplotMatrix(~ Year0 + Year3 + Year6 + Year9 + Year12 + Year15 + Year19 | Smoker,
                       data = dta2W, 
                       smooth = F, 
                       by.group = TRUE,
                       regLine = F, 
                       ellipse = T, 
                       diagonal = T, 
                       pch = c(1, 20),
                       col = c("firebrick", "forestgreen"))

# FEV1 means & variances of differ time and smorker
aggregate(FEV1 ~ Time + Smoker, data = dta2, mean, na.rm=T)

##    Time  Smoker     FEV1
## 1     0 Current 3.229294
## 2     3 Current 3.119474
## 3     6 Current 3.091461
## 4     9 Current 2.870706
## 5    12 Current 2.797901
## 6    15 Current 2.680959
## 7    19 Current 2.497703
## 8     0  Former 3.519130
## 9     3  Former 3.577778
## 10    6  Former 3.264643
## 11    9  Former 3.171667
## 12   12  Former 3.136207
## 13   15  Former 2.870000
## 14   19  Former 2.907857

aggregate(FEV1 ~ Time + Smoker, data = dta2, var, na.rm=T)

##    Time  Smoker      FEV1
## 1     0 Current 0.3402876
## 2     3 Current 0.3007072
## 3     6 Current 0.3191581
## 4     9 Current 0.3165019
## 5    12 Current 0.2969943
## 6    15 Current 0.2642588
## 7    19 Current 0.2546563
## 8     0  Former 0.2358265
## 9     3  Former 0.3941026
## 10    6  Former 0.3871962
## 11    9  Former 0.3380489
## 12   12  Former 0.3760530
## 13   15  Former 0.3709391
## 14   19  Former 0.4178767

2.4 Covariance matrices

2.4.1 Former

# covariance matrices
dta2W %>%
 filter( Smoker == "Former") %>%
 select( -(1:2)) %>%
 var(na.rm = T)

##            Year0     Year3     Year6     Year9    Year12    Year15    Year19
## Year0  0.3617433 0.3442611 0.3550833 0.2358222 0.2398333 0.3091844 0.2144011
## Year3  0.3442611 0.3566944 0.3479167 0.2434444 0.2455556 0.3014333 0.2141278
## Year6  0.3550833 0.3479167 0.3990278 0.2583333 0.2502778 0.3320000 0.2674722
## Year9  0.2358222 0.2434444 0.2583333 0.1926667 0.1827778 0.2322889 0.1953444
## Year12 0.2398333 0.2455556 0.2502778 0.1827778 0.2022222 0.2201111 0.1406667
## Year15 0.3091844 0.3014333 0.3320000 0.2322889 0.2201111 0.3046711 0.2351044
## Year19 0.2144011 0.2141278 0.2674722 0.1953444 0.1406667 0.2351044 0.3509656

The lowest value in the center part?

2.4.2 Current

dta2W %>%
  filter(Smoker == "Current") %>%
  select(-(1:2)) %>%
  var(na.rm = T)

##            Year0     Year3     Year6     Year9    Year12    Year15    Year19
## Year0  0.2068623 0.1958723 0.1550242 0.1740866 0.1580108 0.1527199 0.1550177
## Year3  0.1958723 0.2418398 0.1676342 0.1770779 0.1647835 0.1818398 0.1675022
## Year6  0.1550242 0.1676342 0.1618623 0.1328485 0.1484870 0.1479152 0.1571320
## Year9  0.1740866 0.1770779 0.1328485 0.2496970 0.1501407 0.1508160 0.1533355
## Year12 0.1580108 0.1647835 0.1484870 0.1501407 0.1894372 0.1709978 0.1528485
## Year15 0.1527199 0.1818398 0.1479152 0.1508160 0.1709978 0.1873327 0.1654093
## Year19 0.1550177 0.1675022 0.1571320 0.1533355 0.1528485 0.1654093 0.1876729

The highest value in the center part?

2.5 Line chart

# relevel factor level to produce consistent legend in the plot below
dta2W <- dta2W %>%
  mutate(Smoker = factor(Smoker))

dta2 <- dta2 %>%
  mutate(Smoker = factor(Smoker))

dta2W$Smoker <- relevel(dta2W$Smoker, ref="Former")

dta2$Smoker <- relevel(dta2$Smoker, ref="Former")

theme_set(theme_bw())

# means with error bars
ggplot(dta2, aes(Time, FEV1,
                 group = Smoker, 
                 shape = Smoker, 
                 linetype = Smoker)) +
  stat_summary(fun = mean, geom = "line") +
  stat_summary(fun = mean, geom = "point") +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
  scale_shape_manual(values = c(1, 2)) +
  labs(x = "Time (years since baseline)", 
       y = "Mean FEV1 (liters)",
       linetype = "Smoker", shape = "Smoker") +
  theme(legend.justification = c(1, 1), 
        legend.position = c(1, 1),
        legend.background = element_rect(fill = "white",color = "black"))

Both type of smoker’s FEV1 are decreased by years.
However, the decreasing trend of current smoker is larger than former smoker.

2.6 Fit Linear Model

Generalized Least Squares

2.6.1 m0

dta2 <- mutate(dta2, Time_f = as.factor(Time))

summary(m0 <- gls(FEV1 ~ Smoker*Time_f, data = dta2))

## Generalized least squares fit by REML
##   Model: FEV1 ~ Smoker * Time_f 
##   Data: dta2 
##        AIC      BIC    logLik
##   1359.281 1428.722 -664.6407
## 
## Coefficients:
##                            Value Std.Error   t-value p-value
## (Intercept)             3.519130 0.1171462 30.040506  0.0000
## SmokerCurrent          -0.289836 0.1320476 -2.194938  0.0285
## Time_f3                 0.058647 0.1594157  0.367889  0.7131
## Time_f6                -0.254488 0.1581008 -1.609653  0.1079
## Time_f9                -0.347464 0.1557060 -2.231537  0.0259
## Time_f12               -0.382924 0.1568667 -2.441076  0.0149
## Time_f15               -0.649130 0.1639349 -3.959684  0.0001
## Time_f19               -0.611273 0.1581008 -3.866351  0.0001
## SmokerCurrent:Time_f3  -0.168468 0.1801366 -0.935222  0.3500
## SmokerCurrent:Time_f6   0.116654 0.1795986  0.649527  0.5162
## SmokerCurrent:Time_f9  -0.011124 0.1779636 -0.062510  0.9502
## SmokerCurrent:Time_f12 -0.048469 0.1794916 -0.270037  0.7872
## SmokerCurrent:Time_f15  0.100795 0.1868469  0.539454  0.5897
## SmokerCurrent:Time_f19 -0.120318 0.1815889 -0.662585  0.5078
## 
##  Correlation: 
##                        (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent          -0.887                                                 
## Time_f3                -0.735  0.652                                          
## Time_f6                -0.741  0.657  0.544                                   
## Time_f9                -0.752  0.667  0.553  0.557                            
## Time_f12               -0.747  0.663  0.549  0.553  0.562                     
## Time_f15               -0.715  0.634  0.525  0.529  0.538  0.534              
## Time_f19               -0.741  0.657  0.544  0.549  0.557  0.553  0.529       
## SmokerCurrent:Time_f3   0.650 -0.733 -0.885 -0.482 -0.489 -0.486 -0.465 -0.482
## SmokerCurrent:Time_f6   0.652 -0.735 -0.479 -0.880 -0.491 -0.487 -0.466 -0.483
## SmokerCurrent:Time_f9   0.658 -0.742 -0.484 -0.488 -0.875 -0.492 -0.470 -0.488
## SmokerCurrent:Time_f12  0.653 -0.736 -0.480 -0.484 -0.491 -0.874 -0.466 -0.484
## SmokerCurrent:Time_f15  0.627 -0.707 -0.461 -0.465 -0.472 -0.468 -0.877 -0.465
## SmokerCurrent:Time_f19  0.645 -0.727 -0.474 -0.478 -0.485 -0.482 -0.461 -0.871
##                        SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent                                              
## Time_f3                                                    
## Time_f6                                                    
## Time_f9                                                    
## Time_f12                                                   
## Time_f15                                                   
## Time_f19                                                   
## SmokerCurrent:Time_f3                                      
## SmokerCurrent:Time_f6   0.539                              
## SmokerCurrent:Time_f9   0.544  0.546                       
## SmokerCurrent:Time_f12  0.539  0.541  0.546                
## SmokerCurrent:Time_f15  0.518  0.520  0.524  0.520         
## SmokerCurrent:Time_f19  0.533  0.535  0.540  0.535   0.514 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -2.97158077 -0.67377531  0.05043194  0.66748149  3.13946469 
## 
## Residual standard error: 0.5618133 
## Degrees of freedom: 771 total; 757 residual

2.6.2 m1

variance is allowed to be different for each level of Time_f*Smoker

summary(m1 <- gls(FEV1 ~ Smoker*Time_f,
                  weights = varIdent(form = ~ 1 | Time_f*Smoker),
                  data = dta2))

## Generalized least squares fit by REML
##   Model: FEV1 ~ Smoker * Time_f 
##   Data: dta2 
##        AIC      BIC    logLik
##   1377.981 1507.604 -660.9907
## 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | Time_f * Smoker 
##  Parameter estimates:
##   0*Former   3*Former   6*Former   9*Former  15*Former  19*Former  0*Current 
##   1.000000   1.292730   1.281354   1.197259   1.254158   1.331145   1.201226 
##  3*Current  6*Current  9*Current 12*Current 15*Current 19*Current  12*Former 
##   1.129204   1.163337   1.158488   1.122216   1.058562   1.039166   1.262785 
## 
## Coefficients:
##                            Value Std.Error  t-value p-value
## (Intercept)             3.519130 0.1012590 34.75377  0.0000
## SmokerCurrent          -0.289836 0.1194016 -2.42741  0.0154
## Time_f3                 0.058647 0.1576382  0.37204  0.7100
## Time_f6                -0.254488 0.1551834 -1.63992  0.1014
## Time_f9                -0.347464 0.1467019 -2.36850  0.0181
## Time_f12               -0.382924 0.1523839 -2.51289  0.0122
## Time_f15               -0.649130 0.1603404 -4.04845  0.0001
## Time_f19               -0.611273 0.1586741 -3.85238  0.0001
## SmokerCurrent:Time_f3  -0.168468 0.1789371 -0.94149  0.3468
## SmokerCurrent:Time_f6   0.116654 0.1779643  0.65549  0.5124
## SmokerCurrent:Time_f9  -0.011124 0.1710216 -0.06505  0.9482
## SmokerCurrent:Time_f12 -0.048469 0.1757578 -0.27577  0.7828
## SmokerCurrent:Time_f15  0.100795 0.1825716  0.55209  0.5811
## SmokerCurrent:Time_f19 -0.120318 0.1806162 -0.66615  0.5055
## 
##  Correlation: 
##                        (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent          -0.848                                                 
## Time_f3                -0.642  0.545                                          
## Time_f6                -0.653  0.553  0.419                                   
## Time_f9                -0.690  0.585  0.443  0.450                            
## Time_f12               -0.664  0.564  0.427  0.434  0.459                     
## Time_f15               -0.632  0.536  0.406  0.412  0.436  0.420              
## Time_f19               -0.638  0.541  0.410  0.416  0.440  0.424  0.403       
## SmokerCurrent:Time_f3   0.566 -0.667 -0.881 -0.369 -0.391 -0.376 -0.357 -0.361
## SmokerCurrent:Time_f6   0.569 -0.671 -0.365 -0.872 -0.393 -0.378 -0.359 -0.363
## SmokerCurrent:Time_f9   0.592 -0.698 -0.380 -0.386 -0.858 -0.393 -0.374 -0.378
## SmokerCurrent:Time_f12  0.576 -0.679 -0.370 -0.376 -0.398 -0.867 -0.364 -0.368
## SmokerCurrent:Time_f15  0.555 -0.654 -0.356 -0.362 -0.383 -0.369 -0.878 -0.354
## SmokerCurrent:Time_f19  0.561 -0.661 -0.360 -0.366 -0.387 -0.373 -0.354 -0.879
##                        SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent                                              
## Time_f3                                                    
## Time_f6                                                    
## Time_f9                                                    
## Time_f12                                                   
## Time_f15                                                   
## Time_f19                                                   
## SmokerCurrent:Time_f3                                      
## SmokerCurrent:Time_f6   0.448                              
## SmokerCurrent:Time_f9   0.466  0.468                       
## SmokerCurrent:Time_f12  0.453  0.456  0.474                
## SmokerCurrent:Time_f15  0.436  0.439  0.457  0.444         
## SmokerCurrent:Time_f19  0.441  0.444  0.462  0.449   0.432 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -3.04445773 -0.68139172  0.04873177  0.67419760  2.87621234 
## 
## Residual standard error: 0.4856209 
## Degrees of freedom: 771 total; 757 residual

2.6.3 m2

exponential time correlation structure

summary(m2 <- gls(FEV1 ~ Smoker*Time_f,
                  weights = varExp(form = ~ Time),
                  data = dta2))

## Generalized least squares fit by REML
##   Model: FEV1 ~ Smoker * Time_f 
##   Data: dta2 
##        AIC      BIC    logLik
##   1360.927 1434.996 -664.4633
## 
## Variance function:
##  Structure: Exponential of variance covariate
##  Formula: ~Time 
##  Parameter estimates:
##        expon 
## -0.002528783 
## 
## Coefficients:
##                            Value Std.Error   t-value p-value
## (Intercept)             3.519130 0.1197648 29.383681  0.0000
## SmokerCurrent          -0.289836 0.1349993 -2.146947  0.0321
## Time_f3                 0.058647 0.1624138  0.361098  0.7181
## Time_f6                -0.254488 0.1605419 -1.585179  0.1133
## Time_f9                -0.347464 0.1576421 -2.204130  0.0278
## Time_f12               -0.382924 0.1582709 -2.419418  0.0158
## Time_f15               -0.649130 0.1645762 -3.944255  0.0001
## Time_f19               -0.611273 0.1582604 -3.862453  0.0001
## SmokerCurrent:Time_f3  -0.168468 0.1835206 -0.917977  0.3589
## SmokerCurrent:Time_f6   0.116654 0.1823482  0.639733  0.5225
## SmokerCurrent:Time_f9  -0.011124 0.1801132 -0.061764  0.9508
## SmokerCurrent:Time_f12 -0.048469 0.1810081 -0.267774  0.7889
## SmokerCurrent:Time_f15  0.100795 0.1874982  0.537580  0.5910
## SmokerCurrent:Time_f19 -0.120318 0.1815964 -0.662558  0.5078
## 
##  Correlation: 
##                        (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent          -0.887                                                 
## Time_f3                -0.737  0.654                                          
## Time_f6                -0.746  0.662  0.550                                   
## Time_f9                -0.760  0.674  0.560  0.567                            
## Time_f12               -0.757  0.671  0.558  0.565  0.575                     
## Time_f15               -0.728  0.646  0.537  0.543  0.553  0.551              
## Time_f19               -0.757  0.671  0.558  0.565  0.575  0.573  0.551       
## SmokerCurrent:Time_f3   0.653 -0.736 -0.885 -0.487 -0.496 -0.494 -0.475 -0.494
## SmokerCurrent:Time_f6   0.657 -0.740 -0.484 -0.880 -0.499 -0.497 -0.478 -0.497
## SmokerCurrent:Time_f9   0.665 -0.750 -0.490 -0.496 -0.875 -0.503 -0.484 -0.503
## SmokerCurrent:Time_f12  0.662 -0.746 -0.488 -0.494 -0.503 -0.874 -0.481 -0.501
## SmokerCurrent:Time_f15  0.639 -0.720 -0.471 -0.477 -0.485 -0.483 -0.878 -0.483
## SmokerCurrent:Time_f19  0.660 -0.743 -0.486 -0.492 -0.501 -0.499 -0.480 -0.871
##                        SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent                                              
## Time_f3                                                    
## Time_f6                                                    
## Time_f9                                                    
## Time_f12                                                   
## Time_f15                                                   
## Time_f19                                                   
## SmokerCurrent:Time_f3                                      
## SmokerCurrent:Time_f6   0.545                              
## SmokerCurrent:Time_f9   0.551  0.555                       
## SmokerCurrent:Time_f12  0.549  0.552  0.559                
## SmokerCurrent:Time_f15  0.530  0.533  0.540  0.537         
## SmokerCurrent:Time_f19  0.547  0.550  0.557  0.554   0.535 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -2.92874262 -0.66677628  0.05044755  0.66745312  3.16543499 
## 
## Residual standard error: 0.5743718 
## Degrees of freedom: 771 total; 757 residual

2.6.4 m3

apply exponential time correlation structure in differ type of smoker

summary(m3 <- gls(FEV1 ~ Smoker*Time_f,
                  weights = varExp(form = ~ Time | Smoker),
                  data = dta2))

## Generalized least squares fit by REML
##   Model: FEV1 ~ Smoker * Time_f 
##   Data: dta2 
##        AIC      BIC    logLik
##   1358.216 1436.915 -662.1079
## 
## Variance function:
##  Structure: Exponential of variance covariate, different strata
##  Formula: ~Time | Smoker 
##  Parameter estimates:
##       Former      Current 
##  0.005425870 -0.006170002 
## 
## Coefficients:
##                            Value Std.Error   t-value p-value
## (Intercept)             3.519130 0.1201359 29.292914  0.0000
## SmokerCurrent          -0.289836 0.1354176 -2.140315  0.0326
## Time_f3                 0.058647 0.1647238  0.356034  0.7219
## Time_f6                -0.254488 0.1645770 -1.546313  0.1224
## Time_f9                -0.347464 0.1631958 -2.129122  0.0336
## Time_f12               -0.382924 0.1657444 -2.310326  0.0211
## Time_f15               -0.649130 0.1752396 -3.704246  0.0002
## Time_f19               -0.611273 0.1703016 -3.589357  0.0004
## SmokerCurrent:Time_f3  -0.168468 0.1854898 -0.908232  0.3640
## SmokerCurrent:Time_f6   0.116654 0.1856193  0.628459  0.5299
## SmokerCurrent:Time_f9  -0.011124 0.1844802 -0.060302  0.9519
## SmokerCurrent:Time_f12 -0.048469 0.1868438 -0.259411  0.7954
## SmokerCurrent:Time_f15  0.100795 0.1959415  0.514415  0.6071
## SmokerCurrent:Time_f19 -0.120318 0.1909351 -0.630152  0.5288
## 
##  Correlation: 
##                        (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent          -0.887                                                 
## Time_f3                -0.729  0.647                                          
## Time_f6                -0.730  0.648  0.532                                   
## Time_f9                -0.736  0.653  0.537  0.537                            
## Time_f12               -0.725  0.643  0.529  0.529  0.534                     
## Time_f15               -0.686  0.608  0.500  0.500  0.505  0.497              
## Time_f19               -0.705  0.626  0.514  0.515  0.519  0.511  0.484       
## SmokerCurrent:Time_f3   0.648 -0.730 -0.888 -0.473 -0.477 -0.469 -0.444 -0.457
## SmokerCurrent:Time_f6   0.647 -0.730 -0.472 -0.887 -0.476 -0.469 -0.444 -0.457
## SmokerCurrent:Time_f9   0.651 -0.734 -0.475 -0.475 -0.885 -0.472 -0.446 -0.459
## SmokerCurrent:Time_f12  0.643 -0.725 -0.469 -0.469 -0.473 -0.887 -0.441 -0.454
## SmokerCurrent:Time_f15  0.613 -0.691 -0.447 -0.448 -0.451 -0.444 -0.894 -0.433
## SmokerCurrent:Time_f19  0.629 -0.709 -0.459 -0.459 -0.463 -0.456 -0.431 -0.892
##                        SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent                                              
## Time_f3                                                    
## Time_f6                                                    
## Time_f9                                                    
## Time_f12                                                   
## Time_f15                                                   
## Time_f19                                                   
## SmokerCurrent:Time_f3                                      
## SmokerCurrent:Time_f6   0.533                              
## SmokerCurrent:Time_f9   0.536  0.536                       
## SmokerCurrent:Time_f12  0.529  0.529  0.532                
## SmokerCurrent:Time_f15  0.505  0.504  0.507  0.501         
## SmokerCurrent:Time_f19  0.518  0.517  0.521  0.514   0.490 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -2.95176426 -0.67527408  0.04683313  0.67280125  2.86836115 
## 
## Residual standard error: 0.5761515 
## Degrees of freedom: 771 total; 757 residual

2.6.5 m4

General correlation matrix, with no additional structure.

summary(m4 <- gls(FEV1 ~ Smoker*Time_f,
                  weights = varIdent(form = ~ 1 | Time_f*Smoker),
                  correlation = corSymm(form = ~ 1 | ID),
                  data = dta2))

## Generalized least squares fit by REML
##   Model: FEV1 ~ Smoker * Time_f 
##   Data: dta2 
##        AIC      BIC    logLik
##   361.1039 587.9427 -131.5519
## 
## Correlation Structure: General
##  Formula: ~1 | ID 
##  Parameter estimate(s):
##  Correlation: 
##   1     2     3     4     5     6    
## 2 0.852                              
## 3 0.856 0.888                        
## 4 0.849 0.846 0.894                  
## 5 0.841 0.853 0.891 0.892            
## 6 0.861 0.866 0.896 0.872 0.935      
## 7 0.753 0.756 0.844 0.789 0.789 0.866
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | Time_f * Smoker 
##  Parameter estimates:
##   0*Former   3*Former   6*Former   9*Former  15*Former  19*Former  0*Current 
##  1.0000000  1.1548253  1.0981217  1.0095093  1.1970028  1.0796092  1.1241763 
##  6*Current  9*Current 12*Current 15*Current 19*Current  3*Current  12*Former 
##  1.0518756  1.1013750  1.0380913  1.0158861  0.9676265  1.0363118  1.0527522 
## 
## Coefficients:
##                            Value  Std.Error  t-value p-value
## (Intercept)             3.484680 0.09850685 35.37500  0.0000
## SmokerCurrent          -0.257911 0.11599433 -2.22348  0.0265
## Time_f3                -0.024095 0.06609965 -0.36453  0.7156
## Time_f6                -0.221219 0.06278074 -3.52368  0.0005
## Time_f9                -0.328856 0.05936269 -5.53977  0.0000
## Time_f12               -0.394901 0.06223303 -6.34552  0.0000
## Time_f15               -0.479012 0.06777450 -7.06774  0.0000
## Time_f19               -0.598701 0.06580692 -9.09784  0.0000
## SmokerCurrent:Time_f3  -0.065321 0.07460797 -0.87552  0.3816
## SmokerCurrent:Time_f6   0.063890 0.07168101  0.89130  0.3730
## SmokerCurrent:Time_f9  -0.052926 0.06933374 -0.76336  0.4455
## SmokerCurrent:Time_f12 -0.014889 0.07199623 -0.20680  0.8362
## SmokerCurrent:Time_f15 -0.103704 0.07642577 -1.35692  0.1752
## SmokerCurrent:Time_f19 -0.127043 0.07531976 -1.68672  0.0921
## 
##  Correlation: 
##                        (Intr) SmkrCr Tim_f3 Tim_f6 Tim_f9 Tm_f12 Tm_f15 Tm_f19
## SmokerCurrent          -0.849                                                 
## Time_f3                -0.129  0.110                                          
## Time_f6                -0.197  0.168  0.608                                   
## Time_f9                -0.334  0.283  0.521  0.662                            
## Time_f12               -0.289  0.245  0.524  0.647  0.700                     
## Time_f15               -0.062  0.053  0.515  0.588  0.572  0.719              
## Time_f19               -0.258  0.219  0.493  0.627  0.613  0.653  0.690       
## SmokerCurrent:Time_f3   0.115 -0.204 -0.886 -0.539 -0.462 -0.465 -0.456 -0.437
## SmokerCurrent:Time_f6   0.173 -0.249 -0.533 -0.876 -0.579 -0.566 -0.515 -0.549
## SmokerCurrent:Time_f9   0.286 -0.332 -0.446 -0.566 -0.856 -0.600 -0.490 -0.524
## SmokerCurrent:Time_f12  0.250 -0.326 -0.453 -0.559 -0.605 -0.864 -0.622 -0.565
## SmokerCurrent:Time_f15  0.055 -0.154 -0.457 -0.522 -0.507 -0.638 -0.887 -0.612
## SmokerCurrent:Time_f19  0.225 -0.322 -0.431 -0.548 -0.535 -0.571 -0.603 -0.874
##                        SC:T_3 SC:T_6 SC:T_9 SC:T_12 SC:T_15
## SmokerCurrent                                              
## Time_f3                                                    
## Time_f6                                                    
## Time_f9                                                    
## Time_f12                                                   
## Time_f15                                                   
## Time_f19                                                   
## SmokerCurrent:Time_f3                                      
## SmokerCurrent:Time_f6   0.620                              
## SmokerCurrent:Time_f9   0.528  0.661                       
## SmokerCurrent:Time_f12  0.539  0.650  0.692                
## SmokerCurrent:Time_f15  0.532  0.600  0.579  0.724         
## SmokerCurrent:Time_f19  0.512  0.632  0.604  0.654   0.701 
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -3.00857813 -0.65224622  0.05368483  0.64660517  3.17724882 
## 
## Residual standard error: 0.5411956 
## Degrees of freedom: 771 total; 757 residual

2.7 Model Comparison

# Likelihood ratio test
anova(m0, m1, m2, m3, m4)

##    Model df       AIC       BIC    logLik   Test   L.Ratio p-value
## m0     1 15 1359.2814 1428.7218 -664.6407                         
## m1     2 28 1377.9814 1507.6035 -660.9907 1 vs 2    7.3000  0.8860
## m2     3 16 1360.9267 1434.9965 -664.4633 2 vs 3    6.9453  0.8612
## m3     4 17 1358.2158 1436.9149 -662.1079 3 vs 4    4.7109  0.0300
## m4     5 49  361.1039  587.9427 -131.5519 4 vs 5 1061.1119  <.0001

# show estimated covariance
m4 <- update(m4, method="ML")
intervals(m4, which="var-cov")

## Approximate 95% confidence intervals
## 
##  Correlation structure:
##              lower      est.     upper
## cor(1,2) 0.7984104 0.8523042 0.8926498
## cor(1,3) 0.8033256 0.8559242 0.8952727
## cor(1,4) 0.7947990 0.8498786 0.8910711
## cor(1,5) 0.7853424 0.8419941 0.8846604
## cor(1,6) 0.8045082 0.8615122 0.9027897
## cor(1,7) 0.5939791 0.7501549 0.8518415
## cor(2,3) 0.8467963 0.8884937 0.9193380
## cor(2,4) 0.7906006 0.8468000 0.8888527
## cor(2,5) 0.8008437 0.8539526 0.8937347
## cor(2,6) 0.8117837 0.8671055 0.9070014
## cor(2,7) 0.6008394 0.7537021 0.8533860
## cor(3,4) 0.8536486 0.8939198 0.9235665
## cor(3,5) 0.8508037 0.8914253 0.9214563
## cor(3,6) 0.8481543 0.8962899 0.9297465
## cor(3,7) 0.7174308 0.8426634 0.9151364
## cor(4,5) 0.8516577 0.8924253 0.9224583
## cor(4,6) 0.8179564 0.8724600 0.9114402
## cor(4,7) 0.6580127 0.7872924 0.8714891
## cor(5,6) 0.9056436 0.9358770 0.9566434
## cor(5,7) 0.6478585 0.7869055 0.8752062
## cor(6,7) 0.7618984 0.8638875 0.9240729
## attr(,"label")
## [1] "Correlation structure:"
## 
##  Variance function:
##                lower      est.    upper
## 3*Former   0.9687025 1.1565059 1.380719
## 6*Former   0.9312166 1.0999878 1.299347
## 9*Former   0.8537168 1.0114297 1.198278
## 15*Former  0.9942021 1.1984605 1.444684
## 19*Former  0.9035607 1.0802535 1.291499
## 0*Current  0.9718621 1.1411996 1.340042
## 6*Current  0.9271746 1.0677355 1.229605
## 9*Current  0.9585674 1.1179806 1.303905
## 12*Current 0.9069306 1.0541140 1.225183
## 15*Current 0.8851592 1.0311232 1.201157
## 19*Current 0.8413179 0.9817466 1.145615
## 3*Current  0.9129852 1.0521247 1.212469
## 12*Former  0.8828825 1.0546222 1.259769
## attr(,"label")
## [1] "Variance function:"
## 
##  Residual standard error:
##     lower      est.     upper 
## 0.4530627 0.5310278 0.6224094

2.8 Fitted Line chart

dta2_m4 <- data.frame(dta2, fit = fitted(m4), fit0 = fitted(m0))
# use fitted values to get error bars and see if they cover the data points
ggplot(dta2_m4, aes(Time, fit, 
                    linetype = Smoker, shape = Smoker, group = Smoker)) +
  stat_summary(fun = mean, geom = "line") +
  stat_summary(fun = mean, geom = "point") +
  stat_summary(aes(Time, fit0), fun = mean, geom = "point", col = "skyblue") +
  stat_summary(aes(Time, FEV1), fun.data = mean_se, 
               geom = "pointrange", col = "firebrick", alpha = .3) +
  scale_shape_manual(values = c(1, 2)) +
  labs(x = "Time (years since baseline)", y = "FEV1", linetype = "Smoker",
       shape = "Smoker") +
  theme(legend.position = c(0.9, 0.9),
        legend.background = element_rect(fill = "white", color = "black"))

2.9 Residual plot

2.9.1 m0

#  residual plot
plot(m0, resid(., type = "pearson") ~ fitted(.) | Smoker,
     layout = c(2, 1), 
     aspect = 2, 
     abline = 0, 
     pch = 20, 
     cex = .8,
     xlab = "Ftted values", ylab = "Pearson residuals")

2.9.2 m4

#  residual plot
plot(m4, resid(., type = "pearson") ~ fitted(.) | Smoker,
     layout = c(2, 1), 
     aspect = 2, 
     abline = 0, 
     pch = 20, 
     cex = .8,
     xlab = "Ftted values", ylab = "Pearson residuals")

Question: Notice that the GLM fit above, compared with the mixed-effects analysis, underestimates the standard errors for intercept and the effect of smoking, but overestimates that for the time effect and the interaction between time and smoking. Could you explain why?

3 Inclass 3

3.1 Info

Problem: Reproduce the results of analysis with the markdown file in the reading programs example and compare them with that of an anlysis of covariance.
Data: An experiment was conducted to study two new reading programs in the fourth grade. Thirty subjects were randomly assigned to three conditions: a control (traditional program), and two experimental programs. Before the study took place, a test was administered to obtain grade-equivalent reading levels for the subjects. At the end of the study, six months later, similar reading evaluation scores were obtained.

Each subject has equal chance to be assigned to either the Control group or the Experimental groups. The assignment has no influence on the pre measurement. The post measurement depends on group and on the baseline measurement (Pre).

Column 1: Group ID, Control = C, Experimental 1 = T1, Experimental 2 = T2
Column 2: Reading score at the end of the study
Column 3: Reading score at the beginning of the study
Column 4: Subject ID

“Constrained Pre-Post Analysis”

3.2 Data input

dta3 <- read.csv('reading_g3.csv')

3.3 Wide to long format

dta3L <- gather(data=dta3, key=Time, value=Score, Pre:Post, factor_key=TRUE)  
kable(head(arrange(dta3L, ID), 10))

Group	ID	Time	Score
C	S11	Pre	32
C	S11	Post	41
C	S12	Pre	42
C	S12	Post	46
C	S13	Pre	45
C	S13	Post	48
C	S14	Pre	46
C	S14	Post	54
C	S15	Pre	49
C	S15	Post	52

3.4 Visualization

We inspect the data with a boxplot and a scatter plot.

boxp <- ggplot(dta3L , 
               aes(x=Time, y=Score, col=Group))+ 
  geom_boxplot(outlier.size=0) + 
  geom_point(aes(fill=Group, col=NULL), 
             shape=21, alpha=0.5, size=2, 
             position=position_jitterdodge(jitter.width=0.2)) + 
  theme_bw() + 
  xlab("") 
boxp

ggplot(dta3, 
       aes (x=Pre, y=Post, col=Group)) + 
  geom_point() + 
  geom_smooth(method="lm", se=FALSE) + 
  labs(x="Pre-test reading score",
       y="Post-test reading score")+
  theme_minimal()

3.5 The constrained baseline model

The terms of the most basic constrained model are ‘Time’ and ‘Time * Group’. By omitting ‘Group’, there is no term allowing for a difference in means at baseline between the groups. Consequently, the baseline means are assumed to be equal. However, a general rule in regression modeling is that when there is an interaction in the model, the lower ranked effects should also be included in the model. For this reason, R automatically includes Time and Group as main effects in the model when including the interaction term Time*Group. To avoid to have Group as the main effect in our model, we will create an alternative model matrix: Xalt.

X <- model.matrix(~ Time * Group, data = dta3L)
colnames(X)

[1] "(Intercept)"      "TimePost"         "GroupT1"          "GroupT2"         
[5] "TimePost:GroupT1" "TimePost:GroupT2"

Xalt <- X[, c("TimePost", "TimePost:GroupT1", "TimePost:GroupT2")] 
colnames(Xalt)

[1] "TimePost"         "TimePost:GroupT1" "TimePost:GroupT2"

3.6 Analysis

The gls() function of the ‘nlme’ package can be used for modeling data.

The standard deviations and correlations that should be estimated are defined by the constant variance function varIdent(). By setting weights = varIdent(form = ~ 1 | Time) a separate standard deviation will be estimated for each time point and a separate correlation will be estimated for each pair of time points (= unstructured variance-covariance matrix). In this example, we expect an estimated standard deviation for Time=Pre, an estimated standard deviation for Time=Post and one estimated correlation between the pre- and post reading measurements of the same subject.

By setting weights = varIdent(form = ~ 1 | Time:Group), a separate variance is estimated for each combination of Group and Time (Pre-C Post-C Pre-T1 Post-T1 Pre-T2 Post-T2).

The argument correlation=corSymm (form = ~ 1 | ID) defines the subject levels. The correlation structure is assumed to apply only to observations within the same subject (here: ID); observations from different subjects (a different value for ‘ID’) are assumed to be uncorrelated.

m0_gls <- gls(Score ~ Xalt, 
                weights = varIdent(form = ~ 1 | Time), 
                correlation=corSymm (form = ~ 1 | ID), 
                data = dta3L)
summary(m0_gls)

Generalized least squares fit by REML
  Model: Score ~ Xalt 
  Data: dta3L 
       AIC     BIC    logLik
  350.2136 364.391 -168.1068

Correlation Structure: General
 Formula: ~1 | ID 
 Parameter estimate(s):
 Correlation: 
  1   
2 0.91
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Time 
 Parameter estimates:
     Pre     Post 
1.000000 1.016413 

Coefficients:
                        Value Std.Error  t-value p-value
(Intercept)          50.50000 1.2296621 41.06819  0.0000
XaltTimePost          6.38144 0.9027047  7.06924  0.0000
XaltTimePost:GroupT1 -2.80337 1.2699097 -2.20753  0.0314
XaltTimePost:GroupT2  7.15906 1.2699097  5.63745  0.0000

 Correlation: 
                     (Intr) XltTmP XTP:GT1
XaltTimePost         -0.102               
XaltTimePost:GroupT1  0.000 -0.703        
XaltTimePost:GroupT2  0.000 -0.703  0.500 

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.7467890 -0.3836191  0.1374178  0.7475076  1.4493709 

Residual standard error: 6.735137 
Degrees of freedom: 60 total; 56 residual

The estimated correlation between Pre and Post is 0.91. The estimated standard deviation for the Pre measurement is 6.735 and the estimated standard deviation for the Post measurement is 1.016 * 6.735.

3.6.1 Prediction

We can obtain predicted means using the gls() function. As expected, the predicted value is the same for three groups at baseline (Time == Pre).

To obtain the confidence intervals, the Gls() function of the rms package can be used exactly as the gls() function, leading to exactly the same predictions but with confidence intervals.

m0_Gls <- Gls(Score ~ Xalt, 
                weights = varIdent(form = ~ 1 | Time), 
                correlation=corSymm (form = ~ 1 | ID), 
                data = dta3L)

predictions <- cbind(dta3L, predict(m0_Gls, dta3L, conf.int=0.95))

predictions$Group2 <- factor(predictions$Group, levels=c("All", "C", "T1", "T2"))
predictions$Group2[predictions$Time=="Pre"] <- "All"

pd <- position_dodge(.08)
limits <- aes(ymax=upper , ymin=lower, shape=Group2)

pCI <- ggplot(predictions, 
              aes(y=linear.predictors, 
                  x=Time)) + 
  geom_errorbar(limits, 
                width=0.09, 
                position=pd) +
  geom_line(aes(group=Group, 
                y=linear.predictors), 
            position=pd)+
  geom_point(aes(fill=Group2),
             position=pd, 
             shape=21, 
             size=rel(4)) + 
  scale_fill_manual(values=c("white", "gray80", "gray20", "black")) + 
  theme_minimal() + 
  theme(panel.grid.major=element_blank(), 
        legend.title=element_blank(), 
        text=element_text(size=14), 
        legend.position="bottom") + 
  labs(x="Time", 
       y="Estimated mean with corresponding 95% CI")
pCI

3.7 Non-randomized studies

In the case of non-randomized experiments or studies an equal mean at baseline cannot be assumed. Consequently, applying constrained model is not appropriate.

3.8 Reference

Timm, N.H. (1975). Multivariate Analysis with Applications in Education and Psychology, p. 490.

Liu, G.F., Lu, K., Mogg, R., et al. (2009). Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Statistics in Medicine, 28, 2509-2530.

In-class Exercises: Covariance patterns models

Chang-Chun Chen

2020-10-25