BIO 226: Homework Assignment 3 in R

Analysis of Response Profiles: Study of effects of treatment on rheumatoid arthritis

A randomized clinical trial was completed to compare the effectiveness of 2 rheumatoid arthritis treatments. The grip strength was measured on each of the patients at 4 time points: week 0, week 1, week 2 and week 3. Grip strength is a continuous outcome. The data set is complete and balanced. Note that only a subset of patients is included in the data set for this assignment. We are most interested in determining the association between treatment and grip strength.

## Parallel backend
library(doMC)           # parallel backend to foreach/plyr
registerDoMC()          # Turn on multicore processing

## Prepare dataset
compgrip <- read.table("http://isites.harvard.edu/fs/docs/icb.topic1100055.files/compgrip.txt")
names(compgrip) <- c("id","tx","y0","y1","y2","y3")

## Wide to long
compgrip.long <- reshape(data      = compgrip,
                         varying   = c("y0","y1","y2","y3"),
                         timevar   = "time",
                         idvar     = "id",
                         direction = "long",
                         sep       = ""
                         )

compgrip.long <- compgrip.long[with(compgrip.long, order(id, time)),]

compgrip.long <- within(compgrip.long, {

    time.cat <- factor(time)
    tx <- factor(tx, 1:2, c("A","B"))
})

The data are stored in an ASCII file: compgrip.txt. Each row of the data set contains the following six variables: subject ID number, treatment indicator (1=treatment A and 2=treatment B), Y0, Y1, Y2, Y3.

1. Obtain the sample size, and the sample means and standard deviations of the grip strengths at each occasion for each treatment group. On the same graph, plot the mean grip strength versus time (in weeks) for each of the two treatment groups. Describe the general characteristics of the time trends for the two groups.

## Summarize
library(plyr)
summary.compgrip.long <- ddply(.data = compgrip.long,
                               .variables = c("tx","time"),
                               .fun = summarize,
                               n = length(!is.na(y)),
                               mean = mean(y),
                               sd = sd(y))
summary.compgrip.long

  tx time  n  mean    sd
1  A    0 26 134.6 71.29
2  A    1 26 145.5 68.94
3  A    2 26 153.8 70.94
4  A    3 26 159.2 71.41
5  B    0 32 134.8 72.71
6  B    1 32 137.1 75.03
7  B    2 32 138.6 71.45
8  B    3 32 138.6 73.55


## Plot
library(ggplot2)
ggplot(data = summary.compgrip.long,
       mapping = aes(x = time, y = mean, group = tx, color = tx)) +
    layer(geom = "point") +
    layer(geom = "line") +
    ## layer(geom = "errorbar", mapping = aes(ymax = mean + (sd)/sqrt(n), ymin = mean - (sd)/sqrt(n)), width = 0.1) +
    theme_bw() +
    theme(legend.key = element_blank())

plot of chunk unnamed-chunk-3

The mean grip strength improved in both groups. The trend appears curvilinear (slightly decreasing slope), but it is almost straight. The response appears more prominent in treatment group A.

2. With baseline (month 0) and the treatment A as the reference group, write out the complete definition of the regression model for the analysis of response profiles for mean grip strength. In this model, let β denote the vector of parameters in the model for the means and assume an unstructured variance- covariance structure.

$ Y_{ij} = β_1 + β_2 I(Time_{ij} = 1) + β_3 I(Time_{ij} = 2) + β_4 I(Time_{ij} = 3) + β_5 I(Treatment_{i} = 2) + β_6 I(Time_{ij} = 1) \times I(Treatment_{i} = 2) + β_7 I(Time_{ij} = 2) \times I(Treatment_{i} = 2) + β_8 I(Time_{ij} = 3) \times I(Treatment_{i} = 2) + e_{ij} $

where $ I(Time_{ij} = x) $ is an indicator variable for $ Time = x $ and $ I(Treatment = 2) $ is an indicator variable for treatment B. $ i $ denotes $i$th individual while $ j $ denotes $j$th observation.

3. Using PROC MIXED, fit the model described in question 2. Include the SOLUTION (or S) option on the MODEL statement to obtain estimates and standard errors for the components of β. Include the CHISQ option on the MODEL statement and hence evaluate whether there is evidence of a difference in the pattern of change over time in mean serum cholesterol between the two treatment groups. Justify your answer.

library(nlme)
res.gls <- gls(model       = y ~ time.cat * tx,
               data        = compgrip.long,
               ## Covariance structure: unstructured
               correlation = corSymm(form = ~ as.numeric( time.cat) |  id),
               ## Variance structure: different for each time point
               weights     = varIdent(form = ~ 1 |  time.cat),
               method      = "REML"
               )
summary(res.gls)

Generalized least squares fit by REML
  Model: y ~ time.cat * tx 
  Data: compgrip.long 
   AIC  BIC logLik
  2070 2131  -1017

Correlation Structure: General
 Formula: ~as.numeric(time.cat) | id 
 Parameter estimate(s):
 Correlation: 
  1     2     3    
2 0.977            
3 0.970 0.977      
4 0.962 0.967 0.983
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | time.cat 
 Parameter estimates:
     0      1      2      3 
1.0000 1.0041 0.9881 1.0073 

Coefficients:
               Value Std.Error t-value p-value
(Intercept)   134.62    14.136   9.523  0.0000
time.cat1      10.88     3.053   3.565  0.0004
time.cat2      19.15     3.457   5.540  0.0000
time.cat3      24.54     3.926   6.251  0.0000
txB             0.13    19.031   0.007  0.9944
time.cat1:txB  -8.51     4.110  -2.070  0.0396
time.cat2:txB -15.28     4.654  -3.283  0.0012
time.cat3:txB -20.69     5.285  -3.916  0.0001

 Correlation: 
              (Intr) tm.ct1 tm.ct2 tm.ct3 txB    tm.1:B tm.2:B
time.cat1     -0.089                                          
time.cat2     -0.171  0.579                                   
time.cat3     -0.112  0.479  0.755                            
txB           -0.743  0.066  0.127  0.084                     
time.cat1:txB  0.066 -0.743 -0.430 -0.356 -0.089              
time.cat2:txB  0.127 -0.430 -0.743 -0.561 -0.171  0.579       
time.cat3:txB  0.084 -0.356 -0.561 -0.743 -0.112  0.479  0.755

Standardized residuals:
     Min       Q1      Med       Q3      Max 
-1.50338 -0.89565 -0.05848  0.86025  2.31950 

Residual standard error: 72.08 
Degrees of freedom: 232 total; 224 residual

intervals(res.gls)

Approximate 95% confidence intervals

 Coefficients:
                lower     est.    upper
(Intercept)   106.759 134.6154 162.4719
time.cat1       4.868  10.8846  16.9012
time.cat2      12.341  19.1538  25.9667
time.cat3      16.803  24.5385  32.2741
txB           -37.368   0.1346  37.6376
time.cat1:txB -16.610  -8.5096  -0.4096
time.cat2:txB -24.451 -15.2788  -6.1067
time.cat3:txB -31.109 -20.6947 -10.2802
attr(,"label")
[1] "Coefficients:"

 Correlation structure:
          lower   est.  upper
cor(1,2) 0.9611 0.9768 0.9862
cor(1,3) 0.9495 0.9698 0.9820
cor(1,4) 0.9363 0.9617 0.9772
cor(2,3) 0.9620 0.9773 0.9865
cor(2,4) 0.9453 0.9672 0.9804
cor(3,4) 0.9713 0.9829 0.9898
attr(,"label")
[1] "Correlation structure:"

 Variance function:
   lower   est. upper
1 0.9491 1.0041 1.062
2 0.9264 0.9881 1.054
3 0.9370 1.0073 1.083
attr(,"label")
[1] "Variance function:"

 Residual standard error:
lower  est. upper 
59.89 72.08 86.75

getVarCov(res.gls)

Marginal variance covariance matrix
     [,1] [,2] [,3] [,4]
[1,] 5196 5096 4978 5033
[2,] 5096 5238 5038 5083
[3,] 4978 5038 5072 5083
[4,] 5033 5083 5083 5272
  Standard Deviations: 72.08 72.38 71.22 72.61


## Likelihood ratio test
res.gls.int.ML <- gls(model       = y ~ time.cat * tx,
                      data        = compgrip.long,
                      ## Covariance structure
                      correlation = corSymm(form = ~ as.numeric( time.cat) |  id),
                      ## Variance structure
                      weights     = varIdent(form = ~ 1 |  time.cat),
                      method      = "ML"
                      )

res.gls.noint.ML <- gls(model       = y ~ time.cat + tx,
                        data        = compgrip.long,
                        ## Covariance structure
                        correlation = corSymm(form = ~ as.numeric( time.cat) |  id),
                        ## Variance structure
                        weights     = varIdent(form = ~ 1 |  time.cat),
                        method      = "ML"
                        )

anova(res.gls.int.ML, res.gls.noint.ML)

                 Model df  AIC  BIC logLik   Test L.Ratio p-value
res.gls.int.ML       1 18 2107 2169  -1035                       
res.gls.noint.ML     2 15 2115 2167  -1042 1 vs 2   14.24  0.0026

drop1(res.gls.int.ML, test = "Chisq")

Single term deletions

Model:
y ~ time.cat * tx
            Df  AIC  LRT Pr(>Chi)   
<none>         2107                 
time.cat:tx  3 2115 14.2   0.0026 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The likelihood ratio test for the interaction term (3 degrees of freedom) is statistically significant, meaning the effect of time is not the same in the treatment groups. Specifically, the interaction terms between the treatment and each one of the multiple dichotomous time variables are significant. This indicates that the treatment group differences at times 1, 2, and 3 are all different from the treatment group difference at time 0.

4. Provide an interpretation for the estimate of each component of β.

$ β_1 $, the intercept indicates the mean at Time 0 fortreatment group A. It is 134.6.
$ β_2 $, the main effect of Time 1 indicates the mean difference from the intercept at Time 1 fortreatment group A. Thus, at Time 1, the mean in treatment group A is 134.6 + 10.9 = 145.5.
$ β_3 $, the main effect of Time 2 indicates the mean difference from the intercept at Time 2 fortreatment group A. Thus, at Time 2, the mean in treatment group A is 134.6 + 19.2 = 153.8.
$ β_4 $, the main effect of Time 3 indicates the mean difference from the intercept at Time 3 fortreatment group A. Thus, at Time 3, the mean in treatment group A is 134.6 + 24.5 = 159.2.
$ β_5 $, the main effect of Treatment B indicates the mean difference from the intercept to the mean at Time 0 fortreatment group B. Thus, at Time 0, the mean in treatment group B is 134.6 + 0.134 = 134.8.
$ β_6 $, the interaction effect between Treatment B and Time 1 indicates the difference from $ β_5 $ in the mean difference between the treatment groups at Time 1. Thus, at Time 1, the mean in treatment group B is 134.6 + 0.134 -8.5 = 137.1.
$ β_7 $, the interaction effect between Treatment B and Time 2 indicates the difference from $ β_5 $ in the mean difference between the treatment groups at Time 2. Thus, at Time 2, the mean in treatment group B is 134.6 + 0.134 -15.3 = 138.6.
$ β_8 $, the interaction effect between Treatment B and Time 3 indicates the difference from $ β_5 $ in the mean difference between the treatment groups at Time 3. Thus, at Time 3, the mean in treatment group B is 134.6 + 0.134 -20.7 = 138.6.

5. Show how the estimates for the components of β can be used to construct the sample means at each measurement time in each treatment group. Compare these estimated means with the sample means that you obtained in question 1. Do they differ? If so, can you suggest why?

Construction of predicted values from estimates was already mentioned in Question 4. The predicted values are as follows.

newdat <- expand.grid(time.cat = c("0","1","2","3"),
                      tx       = c("A","B"))

newdat$predicted <- predict(res.gls, newdata = newdat, type = "response")
print(newdat, digits = 4, row.names = F)

 time.cat tx predicted
        0  A     134.6
        1  A     145.5
        2  A     153.8
        3  A     159.2
        0  B     134.8
        1  B     137.1
        2  B     138.6
        3  B     138.6

These values are the same as the sample means. The identity is due to the fact that the model used to estimate these values was a saturated model and the design was balanced, i.e., eight parameters were used to estimate eight estimates.

6. Now conduct a profile analysis (week, treatment, week*treatment) on the difference vector D = (Y1-Y0, Y2-Y0, Y3-Y0). Show that this analysis yields the same conclusions as those obtained from the full profile analysis fit in question 3.

## Create differences
compgrip2 <- within(compgrip, {
    y1 <- y1 - y0
    y2 <- y2 - y0
    y3 <- y3 - y0
})
## Discard time 0
compgrip2 <- compgrip2[,-3]

## Wide to long
compgrip2.long <- reshape(data      = compgrip2,
                          varying   = c("y1", "y2", "y3"),
                          timevar   = "time",
                          idvar     = "id",
                          direction = "long",
                          sep       = ""
                          )

## Create categorical time and group assignment
compgrip2.long <- within(compgrip2.long, {

     time.cat <- factor(time)
     tx <- factor(tx, 1:2, c("A","B"))
 })

## Reoder dataset
compgrip2.long <- compgrip2.long[with(compgrip2.long, order(id, time)),]

## summarize
summary.compgrip2.long <- ddply(.data = compgrip2.long,
                                .variables = c("tx","time"),
                                .fun = summarize,
                                n = length(!is.na(y)),
                                mean = mean(y),
                                sd = sd(y))
summary.compgrip2.long

  tx time  n   mean    sd
1  A    1 26 10.885 16.99
2  A    2 26 19.154 21.71
3  A    3 26 24.538 26.93
4  B    1 32  2.375 14.32
5  B    2 32  3.875 13.47
6  B    3 32  3.844 11.79


## Analyze
res.gls2 <- gls(model       = y ~ time.cat * tx,
                data        = compgrip2.long,
                ## Covariance structure: unstructured
                correlation = corSymm(form = ~ as.numeric( time.cat) |  id),
                ## Variance structure: different for each time point
                weights     = varIdent(form = ~ 1 |  time.cat),
                method      = "REML"
                )
summary(res.gls2)

Generalized least squares fit by REML
  Model: y ~ time.cat * tx 
  Data: compgrip2.long 
   AIC  BIC logLik
  1415 1452 -695.5

Correlation Structure: General
 Formula: ~as.numeric(time.cat) | id 
 Parameter estimate(s):
 Correlation: 
  1     2    
2 0.579      
3 0.479 0.755
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | time.cat 
 Parameter estimates:
    1     2     3 
1.000 1.132 1.286 

Coefficients:
                Value Std.Error t-value p-value
(Intercept)    10.885     3.053   3.565  0.0005
time.cat2       8.269     3.009   2.748  0.0066
time.cat3      13.654     3.641   3.750  0.0002
txB            -8.510     4.110  -2.070  0.0400
time.cat2:txB  -6.769     4.051  -1.671  0.0966
time.cat3:txB -12.185     4.902  -2.486  0.0139

 Correlation: 
              (Intr) tm.ct2 tm.ct3 txB    tm.2:B
time.cat2     -0.350                            
time.cat3     -0.322  0.705                     
txB           -0.743  0.260  0.239              
time.cat2:txB  0.260 -0.743 -0.524 -0.350       
time.cat3:txB  0.239 -0.524 -0.743 -0.322  0.705

Standardized residuals:
    Min      Q1     Med      Q3     Max 
-4.3234 -0.5736 -0.1526  0.4898  3.7700 

Residual standard error: 15.57 
Degrees of freedom: 174 total; 168 residual

intervals(res.gls2)

Approximate 95% confidence intervals

 Coefficients:
                lower    est.  upper
(Intercept)     4.857  10.885 16.912
time.cat2       2.329   8.269 14.209
time.cat3       6.466  13.654 20.842
txB           -16.624  -8.510 -0.395
time.cat2:txB -14.766  -6.769  1.228
time.cat3:txB -21.862 -12.185 -2.508
attr(,"label")
[1] "Coefficients:"

 Correlation structure:
          lower   est.  upper
cor(1,2) 0.3790 0.5789 0.7272
cor(1,3) 0.2537 0.4787 0.6546
cor(2,3) 0.6184 0.7551 0.8474
attr(,"label")
[1] "Correlation structure:"

 Variance function:
   lower  est. upper
2 0.9138 1.132 1.403
3 1.0209 1.286 1.619
attr(,"label")
[1] "Variance function:"

 Residual standard error:
lower  est. upper 
12.93 15.57 18.74

getVarCov(res.gls2)

Marginal variance covariance matrix
      [,1]  [,2]  [,3]
[1,] 242.4 158.9 149.2
[2,] 158.9 310.8 266.4
[3,] 149.2 266.4 400.6
  Standard Deviations: 15.57 17.63 20.02


## Prediction
newdat2 <- expand.grid(time.cat = c("1","2","3"),
                       tx       = c("A","B"))
newdat2$predicted <- predict(res.gls2, newdata = newdat2, type = "response")
print(newdat2, digits = 4, row.names = F)

 time.cat tx predicted
        1  A    10.885
        2  A    19.154
        3  A    24.538
        1  B     2.375
        2  B     3.875
        3  B     3.844

The intercept (10.9) here indicates the difference between Time 1 and Time 0 in treatment group A. This is the same as the main effect of Time 1 in Question 4.
The main effect of the Time 2 - Time 0 difference (8.3) is the difference between Time 2 and Time 0 in treatment group A in addition to the intercept. Thus, the difference is 19.2 in total from Time 0. This is the same as the main effect of Time 2 in Question 4.
The main effect of the Time 3 - Time 0 difference (13.7) is the difference between Time 3 and Time 0 in treatment group A in addition to the intercept. Thus, the difference is 24.5 in total from Time 0. This is the same as the main effect of Time 3 in Question 4.
The main effect of the treatment group (-8.5) here indicates the treatment group difference in the difference between Time 1 and Time 0. This is the same as the effect of the Time 1 and treatment group interaction in Question 4. This term is significant and compatible with the result for Question 4.
The interaction effect between the Time 2 - Time 0 difference and the treatment group (-6.8) is the difference in the treatment group difference in addition to the main effect of the treatment group. Thus, the difference in the group difference at time 2 is -15.3 in total. This is the same as the effect of the Time 2 and treatment group interaction in Question 4. This term by itself is not significant, but the Time 2 and treatment group interaction is represented by -15.3, which has a larger absolute value and should be significant.
The interaction effect between the Time 3 - Time 0 difference and the treatment group (-12.2) is the difference in the treatment group difference in addition to the main effect of the treatment group. Thus, the difference in the group diffence at time 3 is -20.7 in total. This is the same as the effect of the Time 3 and treatment group interaction in Question 4. This term by itself is significant, and the Time 3 and treatment group interaction is represented by -20.7, which has a larger absolute value and should also be significant.

Therefore, the equivalent terms for the three interaction terms presented in Question 4 are all considered significant. Thus, the conclusion is also the same, i.e., there is a treatment group difference in the response over the follow up period.

7. Use the ESTIMATE statement in SAS to test the hypothesis that the mean AUCMB is the same in the two treatment groups. Show, using the E option, that you have the appropriate contrast applied to the means. Provide the estimated mean AUCMB with units of measurement for each treatment group and the p-value for the test. What do you conclude?

As the time points are at weeks 0, 1, 2, 3, the contrast is defined as follows.

$ L = \frac {1}{2} (t_1 + t_2 - 2t_n, t_3 - t_1, t_4 - t_2, t_4 - t_3) = \frac {1}{2} (0+1-2*3, 2-0, 3-1, 3-2) = (-2.5, 1, 1, 0.5) $

Contrast $ = (-L, L) = (2.5, -1, -1, -0.5, -2.5, 1, 1, 0.5) $

## Define a function to obtain AUCMB contrast
GetAucmbContrast <- function(t) {
    n <- length(t)

    first <- t[1] + t[2] - 2*t[n]
    last  <- t[n] - t[n-1]

    middle <- sapply(seq_len(n-2),
                     FUN = function(i) {                       
                         t[i+2] - t[i]
                     })

    1/2 * c(first, middle, last)
}

## AUC-MB contrast
## This is doing 
## 1/2 * (t_1 + t_2 - 2t_n, t_3 - t_1, t_4 - t2, ..., t_j+1 - t_j-1, ..., t_n - t_n-1)
## L <- 1/2 * c(0 + 1 - 2 * 3, 2 - 0, 3 - 1, 3 - 2)
L <- GetAucmbContrast(c(0,1,2,3))
L

[1] -2.5  1.0  1.0  0.5


## Apply contrast to means at time points in each group for AUCMB in each group
res.aucmb <- ddply(.data = summary.compgrip.long,
                   .variables = c("tx"),                
                   .fun = function(DF) {

                       data.frame(AUCMB = sum(DF$mean * L))
                   })
res.aucmb

  tx  AUCMB
1  A 42.308
2  B  8.172


## AUCMB difference
sum(summary.compgrip.long$mean * c(L, -L))

[1] 34.14