Processing math: 100%

1. Repeated measurement and clinical tiral

Repeated measures are quite common in clinical trials. A repeated measures study involves measuring outcomes at different times; it does not necessarily mean that covariates or predictor variables were also measured repeatedly. For the design of repeated measures studies, general linear mixed models, generalized linear mixed models, or GEE models are commonly used to analyze the data.

Mixed linear regression models or GEE regression models are true multivariate regression models because they analyze more than one outcome variable simultaneously. This should not be confused with multiple or multivariable regression, where only one outcome and several predictor variables are analyzed.

For example, suppose we conduct a three-arm clinical trial among 15 patients, with treatments 1 and 2 involving drug 1 and drug 2, respectively, and treatment 3 involving a placebo. We randomize 5 patients to each of the three arms and measure the primary outcome variable at baseline (time 0, before taking any drug/placebo) and follow up at 30 minutes, 60 minutes, and 90 minutes after taking the drug/placebo.

The wide form of the first 9 patients’ data looks like:

If we use linear mixed model to analyse the data, we should make the data as long form which look like

In this format, each row represents a single measurement for a patient at a specific time point, under a specific treatment arm, with the corresponding outcome value

2 Use R and SAS to analyse the data

I will use a the dataset phlebitis.csv from the PennState STATA 510 to show how to analyse a three-arm clinical trial with repeated measures.

I copy the following paragraph from the website for the introduction of the dataset.

An experiment is done to examine ways to detect phlebitis during the intravenous administration of a particular drug. Phlebitis is an in ammation of a blood vein. Three intravenous treatments were administered. 15 test animals were randomly divided into three groups of n = 5. Each group is given a different treatment. The treatments are:

Treatment 1: The drug in a solution designed to carry the drug

Treatment 2:* The carrier solution only (no drug)

Treatment 3: Saline solution

The treatments were administered to one ear of the test animal. The y variable = difference in temperature between the treated ear and the untreated ear. This is measured at times 0, 30 minutes, 60 minutes, and 90 minutes after the treatment is administered. It is believed that increased temperature in the treated ear may be an early sign of phlebitis.

We use the following R code to show what the data looks like

library(nlme)
phlebitisdata<- read.csv("https://online.stat.psu.edu/stat510/sites/stat510/files/data/phlebitis.csv", header=T, sep=",")
interaction.plot (phlebitisdata$Time, factor(phlebitisdata$Treatment), phlebitisdata$Y, lty=c(1:3),lwd=2,ylab="mean of Y", xlab="time", trace.label="Treatment")

The STATA 510 uses Generalized Least Squares (GLS) to analyze the data. Since the dataset is balanced (i.e., with equal sample sizes for each group and no missing data), both the GLS method and the mixed method will yield the same results.

For this study design, the outcome variable is a continuous variable. Many people believe that the baseline value of the outcome variable should be treated as a covariate variable in the mixed model. However, in STATA 510, no adjustment was made for the baseline value of the outcome variable. For the linear mixed model we will adjust the baseline outcome values in the mixed model, therefore, the covariates are “Treatment,” “Time,”, “Treatment*Time” interaction and baseline outcome value. We will perform a random-intercept model that allows the intercept to vary between each animal (subject).

We use the following R code to merge outcome baseline values to the dataset.

baseline<-subset(phlebitisdata,Time==0,select=c("Animal","Y")) # subet phlebitis data
names(baseline)[2] <- "base_Y" #change variable name "Y" to "base_Y"
ph_with_base<-merge(phlebitisdata,baseline,by="Animal",all.x=TRUE) #left joint to the original dataset.
head(ph_with_base,10)
##    Animal Treatment Time    Y base_Y
## 1       1         1    0 -0.3   -0.3
## 2       1         1   30 -0.2   -0.3
## 3       1         1   60  1.2   -0.3
## 4       1         1   90  3.1   -0.3
## 5       2         1    0 -0.5   -0.5
## 6       2         1   30  2.2   -0.5
## 7       2         1   60  3.3   -0.5
## 8       2         1   90  3.7   -0.5
## 9       3         1    0 -1.1   -1.1
## 10      3         1   30  2.4   -1.1

For a repeated measures trial to be analysed using a random intercept mixed model, we can specify the linear mixed model as follow:

yijk=μ+αj+dij+τk+(ατ)jk+eijk where yijk is the ith subject’s response for treatment j at time k, for the above experiment, i=1,2,...15, since there are 15 animals in total, j=1,2,3 since there are three treatment groups, and there are four time points k=0,30,60,90.

μ is the intercept stands for an overall mean.

αj is a fixed effect of treatment j

dij is a random effect of subject i in treatment group j, note we can put (μ+dij) together, then we think the intercept has some randomness.

τk is a fixed effect of time k.

(ατ)jk is a fixed interaction effect of treatment j with time k.

In term of matrix form of the general linear mixed model, the vector β contains the fixed effect parameters μ,αj,τk and (ατ)jk. The random vector U contains the between-subject residual errors djk and e contains the within-subject residual eijk

For the above study design, there were 3 treatment groups and four time points. We treat both treatment and time point as categorical variables, so there will be two dummy variables for treatment and three dummy variables for time point. In R, we use the factor() function to create these dummy variables. Therefore, our linear mixed model should produce 13 coefficients: one for the intercept, 2 for the two dummy treatment groups, 3 dummy variables for the 4 time points, 6 (2 × 3) coefficients for the treatment and time interaction, and one extra coefficient for the baseline outcome score.

We use the following R code to perform the linear mixed model analysis:

library(lme4)
library(lmerTest)
library(lsmeans)
m1 <- lmer(Y ~ factor(Treatment)*factor(Time) + base_Y +(1|Animal),data=ph_with_base)
summary(m1)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Y ~ factor(Treatment) * factor(Time) + base_Y + (1 | Animal)
##    Data: ph_with_base
## 
## REML criterion at convergence: 134.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.9249 -0.5582  0.1390  0.6393  1.6488 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Animal   (Intercept) 0.07472  0.2734  
##  Residual             0.57831  0.7605  
## Number of obs: 60, groups:  Animal, 15
## 
## Fixed effects:
##                                   Estimate Std. Error      df t value Pr(>|t|)
## (Intercept)                        -0.1865     0.3641 43.9502  -0.512  0.61112
## factor(Treatment)2                 -0.2641     0.5149 43.9485  -0.513  0.61052
## factor(Treatment)3                 -0.3418     0.5175 43.6120  -0.661  0.51234
## factor(Time)30                      1.6200     0.4810 36.0000   3.368  0.00181
## factor(Time)60                      2.1200     0.4810 36.0000   4.408 9.02e-05
## factor(Time)90                      2.8200     0.4810 36.0000   5.863 1.06e-06
## base_Y                              0.2231     0.1839 11.0000   1.214  0.25035
## factor(Treatment)2:factor(Time)30  -1.5600     0.6802 36.0000  -2.294  0.02776
## factor(Treatment)3:factor(Time)30  -0.7000     0.6802 36.0000  -1.029  0.31028
## factor(Treatment)2:factor(Time)60  -1.6000     0.6802 36.0000  -2.352  0.02424
## factor(Treatment)3:factor(Time)60  -1.8600     0.6802 36.0000  -2.735  0.00963
## factor(Treatment)2:factor(Time)90  -2.2400     0.6802 36.0000  -3.293  0.00223
## factor(Treatment)3:factor(Time)90  -1.9200     0.6802 36.0000  -2.823  0.00771
##                                      
## (Intercept)                          
## factor(Treatment)2                   
## factor(Treatment)3                   
## factor(Time)30                    ** 
## factor(Time)60                    ***
## factor(Time)90                    ***
## base_Y                               
## factor(Treatment)2:factor(Time)30 *  
## factor(Treatment)3:factor(Time)30    
## factor(Treatment)2:factor(Time)60 *  
## factor(Treatment)3:factor(Time)60 ** 
## factor(Treatment)2:factor(Time)90 ** 
## factor(Treatment)3:factor(Time)90 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lsmeans(m1,pairwise~Treatment|Time, ddf="Satterthwaite",adjust = "none")
## $lsmeans
## Time =  0:
##  Treatment  lsmean    SE   df lower.CL upper.CL
##          1 -0.2980 0.365 43.9   -1.033   0.4367
##          2 -0.5622 0.362 44.4   -1.291   0.1666
##          3 -0.6398 0.363 44.2   -1.371   0.0915
## 
## Time = 30:
##  Treatment  lsmean    SE   df lower.CL upper.CL
##          1  1.3220 0.365 43.9    0.587   2.0567
##          2 -0.5022 0.362 44.4   -1.231   0.2266
##          3  0.2802 0.363 44.2   -0.451   1.0115
## 
## Time = 60:
##  Treatment  lsmean    SE   df lower.CL upper.CL
##          1  1.8220 0.365 43.9    1.087   2.5567
##          2 -0.0422 0.362 44.4   -0.771   0.6866
##          3 -0.3798 0.363 44.2   -1.111   0.3515
## 
## Time = 90:
##  Treatment  lsmean    SE   df lower.CL upper.CL
##          1  2.5220 0.365 43.9    1.787   3.2567
##          2  0.0178 0.362 44.4   -0.711   0.7466
##          3  0.2602 0.363 44.2   -0.471   0.9915
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $contrasts
## Time =  0:
##  contrast                estimate    SE   df t.ratio p.value
##  Treatment1 - Treatment2   0.2641 0.515 44.0   0.513  0.6105
##  Treatment1 - Treatment3   0.3418 0.517 43.6   0.661  0.5123
##  Treatment2 - Treatment3   0.0777 0.511 44.4   0.152  0.8799
## 
## Time = 30:
##  contrast                estimate    SE   df t.ratio p.value
##  Treatment1 - Treatment2   1.8241 0.515 44.0   3.543  0.0010
##  Treatment1 - Treatment3   1.0418 0.517 43.6   2.013  0.0503
##  Treatment2 - Treatment3  -0.7823 0.511 44.4  -1.530  0.1332
## 
## Time = 60:
##  contrast                estimate    SE   df t.ratio p.value
##  Treatment1 - Treatment2   1.8641 0.515 44.0   3.620  0.0008
##  Treatment1 - Treatment3   2.2018 0.517 43.6   4.255  0.0001
##  Treatment2 - Treatment3   0.3377 0.511 44.4   0.660  0.5125
## 
## Time = 90:
##  contrast                estimate    SE   df t.ratio p.value
##  Treatment1 - Treatment2   2.5041 0.515 44.0   4.863  <.0001
##  Treatment1 - Treatment3   2.2618 0.517 43.6   4.371  0.0001
##  Treatment2 - Treatment3  -0.2423 0.511 44.4  -0.474  0.6380
## 
## Degrees-of-freedom method: kenward-roger

The following SAS code will produce the same results as the R code above

filename phleb url 'https://online.stat.psu.edu/stat510/sites/stat510/files/data/phlebitis.csv';
proc import datafile=phleb dbms=csv out=phleb2 replace;
run;

data base(keep=animal y rename=(y=base_score));
set phleb2;
where time=0;
run;
proc sort data=phleb2;
by animal;
run;
proc sort data=base;
by animal;
run;

data pleb_with_base;
merge phleb2 base;
by animal;
run;
proc mixed data=pleb_with_base method=REML ;
  class animal treatment(ref='1') time(ref='0');
  model Y = treatment time treatment*time base_score/solution DDFM=SATTERTHWAITE;
  random intercept / subject=animal;  /*we use default variance structure which is Variance Components*/
  lsmeans treatment*time/diff /*ADJUST=TUKEY, not to use*/;
run;

We obtained the following SAS results, which are the same as the R results above.

Based on above results we can write the linear mixed effect model as follow:

Y=β0+β1Treatment2+β2Treatment3+β3Time30+β4Time60+β5Time90+β6(Treatment2Time30)+β7(Treatment3Time30)+β8(Treatment2Time60)+β9(Treatment3Time60)+β10(Treatment2Time90)+β11(Treatment3Time90)+β12BaselineY=0.18650.2641Treatment20.3418Treatment3+1.6200Time30+2.1200Time60+2.8200Time901.5600(Treatment2Time30)0.7000(Treatment3Time30)1.6000(Treatment2Time60)1.8600(Treatment3Time60)2.2400(Treatment2Time90)1.9200(Treatment3Time90)+0.2231BaselineY For the above equation, Treatment2, Treatment3, Time30, Time60, Time90, (Treatment2Time30), (Treatment3Time30), (Treatment2Time60), (Treatment3Time60), (Treatment2Time90), (Treatment3Time90) are all dummy variables i.e have values 0 or 1. The reference group are Treatment1 and Time0.

3. Explain the results from the mixed model

Next, I will show how to use the equation (2) to calculate the lsmeans at different times and mean different between treatment groups at different time points manually, by this way we could have a better understanding of R or SAS results.

Least square means (lsmean) are model predicted means. From R results, we can see the lsmeans at time 0 are :-0.2980,-0.5622 and -0.6398 for treatment group 1, 2 and 3, respectively. These values are not equal to means of actual values observed at time 0 (when you try to do two sample t test). We can use the follow code to calculate the actual means at time 0 for treatment 1, 2 and 3.

actural_mean01<-mean(ph_with_base[ph_with_base$Treatment==1 & ph_with_base$Time==0,]$Y)
actural_mean02<-mean(ph_with_base[ph_with_base$Treatment==2& ph_with_base$Time==0,]$Y)
actural_mean03<-mean(ph_with_base[ph_with_base$Treatment==3& ph_with_base$Time==0,]$Y)
c(actural_mean01,actural_mean02,actural_mean03)
## [1] -0.24 -0.58 -0.68

We see the lsmeans (-0.2980,-0.5622 and -0.6398) and actual means (-0.24 -0.58 -0.68) are not exactly the same. Next let me show how these lsmeans are calcualted from the equation (2)

In clinical trials we are more interested in the treatment differences between treatment groups at different time point. SAS and R give those results above, here we will calculate these differences manually, through equation (2).

Notes for the mixed model

  1. For the lsmeans, we set the degree of freedom (df) be estimate by “Satterthwaite” method, there are other methods to estimate the df in the mixed model, such as ““Kenward-Roger”. In SAS we use DDF=“SATTERTHWAITE” option.

  2. Since there are three arms, R will do multiple comparisons adjustment using Tukey method, we set adjust =‘none’, not to adjust multiple comparison. The default setting for SAS is not to adjust the multiple comparison.