A randomized clinical trial was completed to compare the effectiveness of 2 rheumatoid arthritis treatments. The grip strength was measured on each of the patients at 4 time points: week 0, week 1, week 2 and week 3. Grip strength is a continuous outcome. The data set is complete and balanced. Note that only a subset of patients is included in the data set for this assignment. We are most interested in determining the association between treatment and grip strength.
## Parallel backend
library(doMC) # parallel backend to foreach/plyr
registerDoMC() # Turn on multicore processing
## Prepare dataset
compgrip <- read.table("http://isites.harvard.edu/fs/docs/icb.topic1100055.files/compgrip.txt")
names(compgrip) <- c("id","tx","y0","y1","y2","y3")
## Wide to long
compgrip.long <- reshape(data = compgrip,
varying = c("y0","y1","y2","y3"),
timevar = "time",
idvar = "id",
direction = "long",
sep = ""
)
compgrip.long <- compgrip.long[with(compgrip.long, order(id, time)),]
compgrip.long <- within(compgrip.long, {
time.cat <- factor(time)
tx <- factor(tx, 1:2, c("A","B"))
})
The data are stored in an ASCII file: compgrip.txt. Each row of the data set contains the following six variables: subject ID number, treatment indicator (1=treatment A and 2=treatment B), Y0, Y1, Y2, Y3.
1. Obtain the sample size, and the sample means and standard deviations of the grip strengths at each occasion for each treatment group. On the same graph, plot the mean grip strength versus time (in weeks) for each of the two treatment groups. Describe the general characteristics of the time trends for the two groups.
## Summarize
library(plyr)
summary.compgrip.long <- ddply(.data = compgrip.long,
.variables = c("tx","time"),
.fun = summarize,
n = length(!is.na(y)),
mean = mean(y),
sd = sd(y))
summary.compgrip.long
tx time n mean sd
1 A 0 26 134.6 71.29
2 A 1 26 145.5 68.94
3 A 2 26 153.8 70.94
4 A 3 26 159.2 71.41
5 B 0 32 134.8 72.71
6 B 1 32 137.1 75.03
7 B 2 32 138.6 71.45
8 B 3 32 138.6 73.55
## Plot
library(ggplot2)
ggplot(data = summary.compgrip.long,
mapping = aes(x = time, y = mean, group = tx, color = tx)) +
layer(geom = "point") +
layer(geom = "line") +
## layer(geom = "errorbar", mapping = aes(ymax = mean + (sd)/sqrt(n), ymin = mean - (sd)/sqrt(n)), width = 0.1) +
theme_bw() +
theme(legend.key = element_blank())
The mean grip strength improved in both groups. The trend appears curvilinear (slightly decreasing slope), but it is almost straight. The response appears more prominent in treatment group A.
2. With baseline (month 0) and the treatment A as the reference group, write out the complete definition of the regression model for the analysis of response profiles for mean grip strength. In this model, let β denote the vector of parameters in the model for the means and assume an unstructured variance- covariance structure.
\( Y_{ij} = β_1 + β_2 I(Time_{ij} = 1) + β_3 I(Time_{ij} = 2) + β_4 I(Time_{ij} = 3) + β_5 I(Treatment_{i} = 2) + β_6 I(Time_{ij} = 1) \times I(Treatment_{i} = 2) + β_7 I(Time_{ij} = 2) \times I(Treatment_{i} = 2) + β_8 I(Time_{ij} = 3) \times I(Treatment_{i} = 2) + e_{ij} \)
where \( I(Time_{ij} = x) \) is an indicator variable for \( Time = x \) and \( I(Treatment = 2) \) is an indicator variable for treatment B. \( i \) denotes $i$th individual while \( j \) denotes $j$th observation.
3. Using PROC MIXED, fit the model described in question 2. Include the SOLUTION (or S) option on the MODEL statement to obtain estimates and standard errors for the components of β. Include the CHISQ option on the MODEL statement and hence evaluate whether there is evidence of a difference in the pattern of change over time in mean serum cholesterol between the two treatment groups. Justify your answer.
library(nlme)
res.gls <- gls(model = y ~ time.cat * tx,
data = compgrip.long,
## Covariance structure: unstructured
correlation = corSymm(form = ~ as.numeric( time.cat) | id),
## Variance structure: different for each time point
weights = varIdent(form = ~ 1 | time.cat),
method = "REML"
)
summary(res.gls)
Generalized least squares fit by REML
Model: y ~ time.cat * tx
Data: compgrip.long
AIC BIC logLik
2070 2131 -1017
Correlation Structure: General
Formula: ~as.numeric(time.cat) | id
Parameter estimate(s):
Correlation:
1 2 3
2 0.977
3 0.970 0.977
4 0.962 0.967 0.983
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | time.cat
Parameter estimates:
0 1 2 3
1.0000 1.0041 0.9881 1.0073
Coefficients:
Value Std.Error t-value p-value
(Intercept) 134.62 14.136 9.523 0.0000
time.cat1 10.88 3.053 3.565 0.0004
time.cat2 19.15 3.457 5.540 0.0000
time.cat3 24.54 3.926 6.251 0.0000
txB 0.13 19.031 0.007 0.9944
time.cat1:txB -8.51 4.110 -2.070 0.0396
time.cat2:txB -15.28 4.654 -3.283 0.0012
time.cat3:txB -20.69 5.285 -3.916 0.0001
Correlation:
(Intr) tm.ct1 tm.ct2 tm.ct3 txB tm.1:B tm.2:B
time.cat1 -0.089
time.cat2 -0.171 0.579
time.cat3 -0.112 0.479 0.755
txB -0.743 0.066 0.127 0.084
time.cat1:txB 0.066 -0.743 -0.430 -0.356 -0.089
time.cat2:txB 0.127 -0.430 -0.743 -0.561 -0.171 0.579
time.cat3:txB 0.084 -0.356 -0.561 -0.743 -0.112 0.479 0.755
Standardized residuals:
Min Q1 Med Q3 Max
-1.50338 -0.89565 -0.05848 0.86025 2.31950
Residual standard error: 72.08
Degrees of freedom: 232 total; 224 residual
intervals(res.gls)
Approximate 95% confidence intervals
Coefficients:
lower est. upper
(Intercept) 106.759 134.6154 162.4719
time.cat1 4.868 10.8846 16.9012
time.cat2 12.341 19.1538 25.9667
time.cat3 16.803 24.5385 32.2741
txB -37.368 0.1346 37.6376
time.cat1:txB -16.610 -8.5096 -0.4096
time.cat2:txB -24.451 -15.2788 -6.1067
time.cat3:txB -31.109 -20.6947 -10.2802
attr(,"label")
[1] "Coefficients:"
Correlation structure:
lower est. upper
cor(1,2) 0.9611 0.9768 0.9862
cor(1,3) 0.9495 0.9698 0.9820
cor(1,4) 0.9363 0.9617 0.9772
cor(2,3) 0.9620 0.9773 0.9865
cor(2,4) 0.9453 0.9672 0.9804
cor(3,4) 0.9713 0.9829 0.9898
attr(,"label")
[1] "Correlation structure:"
Variance function:
lower est. upper
1 0.9491 1.0041 1.062
2 0.9264 0.9881 1.054
3 0.9370 1.0073 1.083
attr(,"label")
[1] "Variance function:"
Residual standard error:
lower est. upper
59.89 72.08 86.75
getVarCov(res.gls)
Marginal variance covariance matrix
[,1] [,2] [,3] [,4]
[1,] 5196 5096 4978 5033
[2,] 5096 5238 5038 5083
[3,] 4978 5038 5072 5083
[4,] 5033 5083 5083 5272
Standard Deviations: 72.08 72.38 71.22 72.61
## Likelihood ratio test
res.gls.int.ML <- gls(model = y ~ time.cat * tx,
data = compgrip.long,
## Covariance structure
correlation = corSymm(form = ~ as.numeric( time.cat) | id),
## Variance structure
weights = varIdent(form = ~ 1 | time.cat),
method = "ML"
)
res.gls.noint.ML <- gls(model = y ~ time.cat + tx,
data = compgrip.long,
## Covariance structure
correlation = corSymm(form = ~ as.numeric( time.cat) | id),
## Variance structure
weights = varIdent(form = ~ 1 | time.cat),
method = "ML"
)
anova(res.gls.int.ML, res.gls.noint.ML)
Model df AIC BIC logLik Test L.Ratio p-value
res.gls.int.ML 1 18 2107 2169 -1035
res.gls.noint.ML 2 15 2115 2167 -1042 1 vs 2 14.24 0.0026
drop1(res.gls.int.ML, test = "Chisq")
Single term deletions
Model:
y ~ time.cat * tx
Df AIC LRT Pr(>Chi)
<none> 2107
time.cat:tx 3 2115 14.2 0.0026 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The likelihood ratio test for the interaction term (3 degrees of freedom) is statistically significant, meaning the effect of time is not the same in the treatment groups. Specifically, the interaction terms between the treatment and each one of the multiple dichotomous time variables are significant. This indicates that the treatment group differences at times 1, 2, and 3 are all different from the treatment group difference at time 0.
4. Provide an interpretation for the estimate of each component of β.
\( β_1 \), the intercept indicates the mean at Time 0 fortreatment group A. It is 134.6.
\( β_2 \), the main effect of Time 1 indicates the mean difference from the intercept at Time 1 fortreatment group A. Thus, at Time 1, the mean in treatment group A is 134.6 + 10.9 = 145.5.
\( β_3 \), the main effect of Time 2 indicates the mean difference from the intercept at Time 2 fortreatment group A. Thus, at Time 2, the mean in treatment group A is 134.6 + 19.2 = 153.8.
\( β_4 \), the main effect of Time 3 indicates the mean difference from the intercept at Time 3 fortreatment group A. Thus, at Time 3, the mean in treatment group A is 134.6 + 24.5 = 159.2.
\( β_5 \), the main effect of Treatment B indicates the mean difference from the intercept to the mean at Time 0 fortreatment group B. Thus, at Time 0, the mean in treatment group B is 134.6 + 0.134 = 134.8.
\( β_6 \), the interaction effect between Treatment B and Time 1 indicates the difference from \( β_5 \) in the mean difference between the treatment groups at Time 1. Thus, at Time 1, the mean in treatment group B is 134.6 + 0.134 -8.5 = 137.1.
\( β_7 \), the interaction effect between Treatment B and Time 2 indicates the difference from \( β_5 \) in the mean difference between the treatment groups at Time 2. Thus, at Time 2, the mean in treatment group B is 134.6 + 0.134 -15.3 = 138.6.
\( β_8 \), the interaction effect between Treatment B and Time 3 indicates the difference from \( β_5 \) in the mean difference between the treatment groups at Time 3. Thus, at Time 3, the mean in treatment group B is 134.6 + 0.134 -20.7 = 138.6.
5. Show how the estimates for the components of β can be used to construct the sample means at each measurement time in each treatment group. Compare these estimated means with the sample means that you obtained in question 1. Do they differ? If so, can you suggest why?
Construction of predicted values from estimates was already mentioned in Question 4. The predicted values are as follows.
newdat <- expand.grid(time.cat = c("0","1","2","3"),
tx = c("A","B"))
newdat$predicted <- predict(res.gls, newdata = newdat, type = "response")
print(newdat, digits = 4, row.names = F)
time.cat tx predicted
0 A 134.6
1 A 145.5
2 A 153.8
3 A 159.2
0 B 134.8
1 B 137.1
2 B 138.6
3 B 138.6
These values are the same as the sample means. The identity is due to the fact that the model used to estimate these values was a saturated model and the design was balanced, i.e., eight parameters were used to estimate eight estimates.
6. Now conduct a profile analysis (week, treatment, week*treatment) on the difference vector D = (Y1-Y0, Y2-Y0, Y3-Y0). Show that this analysis yields the same conclusions as those obtained from the full profile analysis fit in question 3.
## Create differences
compgrip2 <- within(compgrip, {
y1 <- y1 - y0
y2 <- y2 - y0
y3 <- y3 - y0
})
## Discard time 0
compgrip2 <- compgrip2[,-3]
## Wide to long
compgrip2.long <- reshape(data = compgrip2,
varying = c("y1", "y2", "y3"),
timevar = "time",
idvar = "id",
direction = "long",
sep = ""
)
## Create categorical time and group assignment
compgrip2.long <- within(compgrip2.long, {
time.cat <- factor(time)
tx <- factor(tx, 1:2, c("A","B"))
})
## Reoder dataset
compgrip2.long <- compgrip2.long[with(compgrip2.long, order(id, time)),]
## summarize
summary.compgrip2.long <- ddply(.data = compgrip2.long,
.variables = c("tx","time"),
.fun = summarize,
n = length(!is.na(y)),
mean = mean(y),
sd = sd(y))
summary.compgrip2.long
tx time n mean sd
1 A 1 26 10.885 16.99
2 A 2 26 19.154 21.71
3 A 3 26 24.538 26.93
4 B 1 32 2.375 14.32
5 B 2 32 3.875 13.47
6 B 3 32 3.844 11.79
## Analyze
res.gls2 <- gls(model = y ~ time.cat * tx,
data = compgrip2.long,
## Covariance structure: unstructured
correlation = corSymm(form = ~ as.numeric( time.cat) | id),
## Variance structure: different for each time point
weights = varIdent(form = ~ 1 | time.cat),
method = "REML"
)
summary(res.gls2)
Generalized least squares fit by REML
Model: y ~ time.cat * tx
Data: compgrip2.long
AIC BIC logLik
1415 1452 -695.5
Correlation Structure: General
Formula: ~as.numeric(time.cat) | id
Parameter estimate(s):
Correlation:
1 2
2 0.579
3 0.479 0.755
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | time.cat
Parameter estimates:
1 2 3
1.000 1.132 1.286
Coefficients:
Value Std.Error t-value p-value
(Intercept) 10.885 3.053 3.565 0.0005
time.cat2 8.269 3.009 2.748 0.0066
time.cat3 13.654 3.641 3.750 0.0002
txB -8.510 4.110 -2.070 0.0400
time.cat2:txB -6.769 4.051 -1.671 0.0966
time.cat3:txB -12.185 4.902 -2.486 0.0139
Correlation:
(Intr) tm.ct2 tm.ct3 txB tm.2:B
time.cat2 -0.350
time.cat3 -0.322 0.705
txB -0.743 0.260 0.239
time.cat2:txB 0.260 -0.743 -0.524 -0.350
time.cat3:txB 0.239 -0.524 -0.743 -0.322 0.705
Standardized residuals:
Min Q1 Med Q3 Max
-4.3234 -0.5736 -0.1526 0.4898 3.7700
Residual standard error: 15.57
Degrees of freedom: 174 total; 168 residual
intervals(res.gls2)
Approximate 95% confidence intervals
Coefficients:
lower est. upper
(Intercept) 4.857 10.885 16.912
time.cat2 2.329 8.269 14.209
time.cat3 6.466 13.654 20.842
txB -16.624 -8.510 -0.395
time.cat2:txB -14.766 -6.769 1.228
time.cat3:txB -21.862 -12.185 -2.508
attr(,"label")
[1] "Coefficients:"
Correlation structure:
lower est. upper
cor(1,2) 0.3790 0.5789 0.7272
cor(1,3) 0.2537 0.4787 0.6546
cor(2,3) 0.6184 0.7551 0.8474
attr(,"label")
[1] "Correlation structure:"
Variance function:
lower est. upper
2 0.9138 1.132 1.403
3 1.0209 1.286 1.619
attr(,"label")
[1] "Variance function:"
Residual standard error:
lower est. upper
12.93 15.57 18.74
getVarCov(res.gls2)
Marginal variance covariance matrix
[,1] [,2] [,3]
[1,] 242.4 158.9 149.2
[2,] 158.9 310.8 266.4
[3,] 149.2 266.4 400.6
Standard Deviations: 15.57 17.63 20.02
## Prediction
newdat2 <- expand.grid(time.cat = c("1","2","3"),
tx = c("A","B"))
newdat2$predicted <- predict(res.gls2, newdata = newdat2, type = "response")
print(newdat2, digits = 4, row.names = F)
time.cat tx predicted
1 A 10.885
2 A 19.154
3 A 24.538
1 B 2.375
2 B 3.875
3 B 3.844
The intercept (10.9) here indicates the difference between Time 1 and Time 0 in treatment group A. This is the same as the main effect of Time 1 in Question 4.
The main effect of the Time 2 - Time 0 difference (8.3) is the difference between Time 2 and Time 0 in treatment group A in addition to the intercept. Thus, the difference is 19.2 in total from Time 0. This is the same as the main effect of Time 2 in Question 4.
The main effect of the Time 3 - Time 0 difference (13.7) is the difference between Time 3 and Time 0 in treatment group A in addition to the intercept. Thus, the difference is 24.5 in total from Time 0. This is the same as the main effect of Time 3 in Question 4.
The main effect of the treatment group (-8.5) here indicates the treatment group difference in the difference between Time 1 and Time 0. This is the same as the effect of the Time 1 and treatment group interaction in Question 4. This term is significant and compatible with the result for Question 4.
The interaction effect between the Time 2 - Time 0 difference and the treatment group (-6.8) is the difference in the treatment group difference in addition to the main effect of the treatment group. Thus, the difference in the group difference at time 2 is -15.3 in total. This is the same as the effect of the Time 2 and treatment group interaction in Question 4. This term by itself is not significant, but the Time 2 and treatment group interaction is represented by -15.3, which has a larger absolute value and should be significant.
The interaction effect between the Time 3 - Time 0 difference and the treatment group (-12.2) is the difference in the treatment group difference in addition to the main effect of the treatment group. Thus, the difference in the group diffence at time 3 is -20.7 in total. This is the same as the effect of the Time 3 and treatment group interaction in Question 4. This term by itself is significant, and the Time 3 and treatment group interaction is represented by -20.7, which has a larger absolute value and should also be significant.
Therefore, the equivalent terms for the three interaction terms presented in Question 4 are all considered significant. Thus, the conclusion is also the same, i.e., there is a treatment group difference in the response over the follow up period.
7. Use the ESTIMATE statement in SAS to test the hypothesis that the mean AUCMB is the same in the two treatment groups. Show, using the E option, that you have the appropriate contrast applied to the means. Provide the estimated mean AUCMB with units of measurement for each treatment group and the p-value for the test. What do you conclude?
As the time points are at weeks 0, 1, 2, 3, the contrast is defined as follows.
\( L = \frac {1}{2} (t_1 + t_2 - 2t_n, t_3 - t_1, t_4 - t_2, t_4 - t_3) = \frac {1}{2} (0+1-2*3, 2-0, 3-1, 3-2) = (-2.5, 1, 1, 0.5) \)
Contrast \( = (-L, L) = (2.5, -1, -1, -0.5, -2.5, 1, 1, 0.5) \)
## Define a function to obtain AUCMB contrast
GetAucmbContrast <- function(t) {
n <- length(t)
first <- t[1] + t[2] - 2*t[n]
last <- t[n] - t[n-1]
middle <- sapply(seq_len(n-2),
FUN = function(i) {
t[i+2] - t[i]
})
1/2 * c(first, middle, last)
}
## AUC-MB contrast
## This is doing
## 1/2 * (t_1 + t_2 - 2t_n, t_3 - t_1, t_4 - t2, ..., t_j+1 - t_j-1, ..., t_n - t_n-1)
## L <- 1/2 * c(0 + 1 - 2 * 3, 2 - 0, 3 - 1, 3 - 2)
L <- GetAucmbContrast(c(0,1,2,3))
L
[1] -2.5 1.0 1.0 0.5
## Apply contrast to means at time points in each group for AUCMB in each group
res.aucmb <- ddply(.data = summary.compgrip.long,
.variables = c("tx"),
.fun = function(DF) {
data.frame(AUCMB = sum(DF$mean * L))
})
res.aucmb
tx AUCMB
1 A 42.308
2 B 8.172
## AUCMB difference
sum(summary.compgrip.long$mean * c(L, -L))
[1] 34.14