An experiment was conducted to study two new reading programs in the fourth grade. Thirty subjects were randomly assigned to three conditions: a control (traditional program), and two experimental programs. Before the study took place, a test was administered to obtain grade-equivalent reading levels for the subjects. At the end of the study, six months later, similar reading evaluation scores were obtained.
Each subject has equal chance to be assigned to either the Control group or the Experimental groups. The assignment has no influence on the pre measurement. The post measurement depends on group and on the baseline measurement (Pre).
We inspect the data with a boxplot and a scatter plot.
boxp <- ggplot(dtaL ,
aes(x=Time, y=Score, col=Group))+
geom_boxplot(outlier.size=0) +
geom_point(aes(fill=Group, col=NULL),
shape=21, alpha=0.5, size=2,
position=position_jitterdodge(jitter.width=0.2)) +
theme_bw() +
xlab("")
boxp
ggplot(dta,
aes (x=Pre, y=Post, col=Group)) +
geom_point() +
geom_smooth(method="lm", se=FALSE) +
labs(x="Pre-test reading score",
y="Post-test reading score")+
theme_minimal()
The terms of the most basic constrained model are ‘Time’ and ‘Time * Group’. By omitting ‘Group’, there is no term allowing for a difference in means at baseline between the groups. Consequently, the baseline means are assumed to be equal. However, a general rule in regression modeling is that when there is an interaction in the model, the lower ranked effects should also be included in the model. For this reason, R automatically includes Time and Group as main effects in the model when including the interaction term Time*Group. To avoid to have Group as the main effect in our model, we will create an alternative model matrix: Xalt.
[1] "(Intercept)" "TimePost" "GroupT1" "GroupT2"
[5] "TimePost:GroupT1" "TimePost:GroupT2"
[1] "TimePost" "TimePost:GroupT1" "TimePost:GroupT2"
The gls() function of the ‘nlme’ package can be used for modeling data.
The standard deviations and correlations that should be estimated are defined by the constant variance function varIdent(). By setting weights = varIdent(form = ~ 1 | Time) a separate standard deviation will be estimated for each time point and a separate correlation will be estimated for each pair of time points (= unstructured variance-covariance matrix). In this example, we expect an estimated standard deviation for Time=Pre, an estimated standard deviation for Time=Post and one estimated correlation between the pre- and post reading measurements of the same subject.
By setting weights = varIdent(form = ~ 1 | Time:Group), a separate variance is estimated for each combination of Group and Time (Pre-C Post-C Pre-T1 Post-T1 Pre-T2 Post-T2).
The argument correlation=corSymm (form = ~ 1 | ID) defines the subject levels. The correlation structure is assumed to apply only to observations within the same subject (here: ID); observations from different subjects (a different value for ‘ID’) are assumed to be uncorrelated.
m0_gls <- gls(Score ~ Xalt,
weights = varIdent(form = ~ 1 | Time),
correlation=corSymm (form = ~ 1 | ID),
data = dtaL)
summary(m0_gls)
Generalized least squares fit by REML
Model: Score ~ Xalt
Data: dtaL
AIC BIC logLik
350.2136 364.391 -168.1068
Correlation Structure: General
Formula: ~1 | ID
Parameter estimate(s):
Correlation:
1
2 0.91
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Time
Parameter estimates:
Pre Post
1.000000 1.016413
Coefficients:
Value Std.Error t-value p-value
(Intercept) 50.50000 1.2296621 41.06819 0.0000
XaltTimePost 6.38144 0.9027047 7.06924 0.0000
XaltTimePost:GroupT1 -2.80337 1.2699097 -2.20753 0.0314
XaltTimePost:GroupT2 7.15906 1.2699097 5.63745 0.0000
Correlation:
(Intr) XltTmP XTP:GT1
XaltTimePost -0.102
XaltTimePost:GroupT1 0.000 -0.703
XaltTimePost:GroupT2 0.000 -0.703 0.500
Standardized residuals:
Min Q1 Med Q3 Max
-2.7467890 -0.3836191 0.1374178 0.7475076 1.4493709
Residual standard error: 6.735137
Degrees of freedom: 60 total; 56 residual
The estimated correlation between Pre and Post is 0.91. The estimated standard deviation for the Pre measurement is 6.735 and the estimated standard deviation for the Post measurement is 1.016 * 6.735.
We can obtain predicted means using the gls() function. As expected, the predicted value is the same for three groups at baseline (Time == Pre).
To obtain the confidence intervals, the Gls() function of the rms package can be used exactly as the gls() function, leading to exactly the same predictions but with confidence intervals.
m0_Gls <- Gls(Score ~ Xalt,
weights = varIdent(form = ~ 1 | Time),
correlation=corSymm (form = ~ 1 | ID),
data = dtaL)
predictions$Group2 <- factor(predictions$Group, levels=c("All", "C", "T1", "T2"))
predictions$Group2[predictions$Time=="Pre"] <- "All"
pCI <- ggplot(predictions,
aes(y=linear.predictors,
x=Time)) +
geom_errorbar(limits,
width=0.09,
position=pd) +
geom_line(aes(group=Group,
y=linear.predictors),
position=pd)+
geom_point(aes(fill=Group2),
position=pd,
shape=21,
size=rel(4)) +
scale_fill_manual(values=c("white", "gray80", "gray20", "black")) +
theme_minimal() +
theme(panel.grid.major=element_blank(),
legend.title=element_blank(),
text=element_text(size=14),
legend.position="bottom") +
labs(x="Time",
y="Estimated mean with corresponding 95% CI")
pCI
In the case of non-randomized experiments or studies an equal mean at baseline cannot be assumed. Consequently, applying constrained model is not appropriate.
Timm, N.H. (1975). Multivariate Analysis with Applications in Education and Psychology, p. 490.
Liu, G.F., Lu, K., Mogg, R., et al. (2009). Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Statistics in Medicine, 28, 2509-2530.