In this topic, we will consider two different methods to carry out analyses of repeated measures data. Repeated measures data is data for which we have more than one measurement for each individual.
In Topic 7 and the associated Computer Lab, we learnt about one-way ANOVA. Since one-way ANOVA allows us to test for a difference in means between two or more independent groups, we can think of it as an extension of the independent samples \(t\)-test, which tests for differences in means between two independent groups.
In a similar way, Repeated Measures ANOVA can be thought of as an extension to the paired \(t\)-tests. Whereas the paired \(t\)-test tests for mean differences between two dependent groups, Repeated Measures ANOVA allows us to test for differences in means between two or more dependent groups.
In this question, we will work through an example in R.
The data set we will be using today is from the datarium package (Kassambara 2019). We will also need to use the tidyr package to convert the data set into the correct format. If you have not installed the datarium and tidyr packages on your computer before, run the following code to install them:
install.packages("datarium")
install.packages("tidyr")
Run the following code to load the datarium and tidyr packages into your current R session:
library(datarium)
library(tidyr)
We will be using a data set called selfesteem. This data set records the self-esteem score of ten individuals three times each, i.e., at three different time points. Later on, we will be using repeated measures ANOVA to see whether there was a significant change in the average self-esteem scores over time.
First of all, run the following code to view the selfesteem data set:
selfesteem
## # A tibble: 10 x 4
## id t1 t2 t3
## <int> <dbl> <dbl> <dbl>
## 1 1 4.01 5.18 7.11
## 2 2 2.56 6.91 6.31
## 3 3 3.24 4.44 9.78
## 4 4 3.42 4.71 8.35
## 5 5 2.87 3.91 6.46
## 6 6 2.05 5.34 6.65
## 7 7 3.53 5.58 6.84
## 8 8 3.18 4.37 7.82
## 9 9 3.51 4.40 8.47
## 10 10 3.04 4.49 8.58
As we can see, the data set contains the following variables:
id : the ID of the individual (ranges from 1 to 10)t1 : self-esteem score at time-point 1t2 : self-esteem score at time-point 2t3 : self-esteem score at time-point 3So, there is one row for each individual in the data set. This means data set is currently in “wide format.”
In order to carry out a repeated measures ANOVA analysis in R, we will need to convert the data set from “wide format” to “long format.” To do so, run the following code:
# Convert data from wide to long
selfesteem.long <- gather(selfesteem, time, score, t1, t2, t3)
# Convert id variable to factor: this is required for the aov function later.
selfesteem.long$id <- as.factor(selfesteem.long$id)
# Sort by id first, then by time
selfesteem.long <- selfesteem.long[order(selfesteem.long$id, selfesteem.long$time), ]
# Convert to data frame
selfesteem.long <- as.data.frame(selfesteem.long)
We have now created a data set called selfesteem.long, which is the data set we will be using for the remainder of this computer lab. To view the fist six rows of the data set, run the following code:
head(selfesteem.long)
## id time score
## 1 1 t1 4.005027
## 2 1 t2 5.182286
## 3 1 t3 7.107831
## 4 2 t1 2.558124
## 5 2 t2 6.912915
## 6 2 t3 6.308434
As we can see, there are now three columns in the long version of the data set:
id : The ID of the individual (ranges from 1 to 10)time : Time-point at which the self esteem score was measured. Possible values are t1 (time-point 1), t2 (time-point 2), and t3 (time-point 3)score : self-esteem scoreNow, there is more than one row per individual (in fact, there are three rows per individual), but importantly, one row per self-esteem score measurement.
Now that the data has been correctly formatted, a useful first step will be to visualise the data. Create a boxplot of the of the self-esteem scores, separated by time.
The hypotheses for a repeated measures ANOVA can be set up as follows:
\[ H_0 : \mu_1 = \mu_2 = \ldots = \mu_k \text{ versus } H_1 : \text{not all } \mu_i \text{'s are equal,}\] where:
In our example, we wish test for a difference in average self-esteem scores across time-points. In this example,
Adapt the following code to carry out a repeated measures ANOVA analysis in R. You will need to enter in the dependent and independent variables in the spots indicated.
anova.selfesteem <- aov(**ENTER DEPENDENT VARIABLE HERE** ~ **ENTER INDEPENDENT VARIABLE HERE** + Error(id), data = selfesteem.long)
summary(anova.selfesteem)
In the above code, we have done the following:
aov function to carry out the ANOVA analysis+ Error(id) to tell R that this is a repeated measures ANOVA. This syntax tells R that for each value of id, there is more than one self-esteem measurementanova.selfesteemsummary(anova.selfesteem) to display a summary of the resultsOnce you have successfully run the repeated measures analysis, your output should look like this:
##
## Error: id
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 9 4.57 0.5078
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## time 2 102.46 51.23 55.47 2.01e-08 ***
## Residuals 18 16.62 0.92
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the repeated measures analysis, we can note the following:
Pr(>F) column) is close to 0, i.e. \(p < .001\)F value) is \(F = 55.47\)Using this information, answer the following questions:
In the previous question, we established that there was a significant difference in the mean self-esteem score across time. However, we do not know which time-point(s), or how many time-points, are significantly different from each other. In this question, we will carry out post-hoc tests to find out.
Run the following code to carry out pairwise \(t\)-tests:
pairwise.t.test(selfesteem.long$score, selfesteem.long$time, paired = TRUE,
p.adjust.method = "bonferroni")
In the above code, we have done the following:
pairwise.t.test to carry out the pairwise \(t\)-testsselfesteem.long$score as our dependent variableselfesteem.long$time as our independent variablepaired = TRUE to tell R that this was a repeated measures analysis (i.e., we are working with dependent rather than independent groups)"bonferroni" adjustment to adjust the \(p\)-values to account for the increased chance of making a Type I error due to multiple \(t\)-testsBased on your output, which time-points were significantly different from each other? Justify your answer with appropriate \(p\)-values.
Instead of using ANOVA, another way to carry out a repeated measures analysis is to use a linear mixed effects model. In this question, we will consider a brief introduction to this topic.
In this question, we will be using the nlme package. If you have not installed the nlme package on your computer before, run the following code to install it:
install.packages("nlme")
Run the following code to load the nlme package into your current R session:
library(nlme)
In Computer Lab 9, we learnt how to fit a simple linear regression model using the lm function. We can think of a linear mixed effects model as an extension of a simple linear regression model. The difference is that it allows for repeated measurements of individuals. That is, it allows for some dependency of observations.
We will carry out a linear mixed effects analysis to test for a difference in mean self-esteem scores across time. Rather than using the lm function, we use the lme function. Adapt the following code to estimate the linear effects model:
lme.selfesteem <- lme(**ENTER DEPENDENT VARIABLE HERE** ~ **ENTER INDEPENDENT VARIABLE HERE**, random = ~1|id, data = selfesteem.long)
summary(lme.selfesteem)
In the above code, we have done the following:
lme function to carry out the linear mixed effects (LME) analysisrandom = ~1|id to tell R that this is a repeated measures analysis. This syntax tells R that for each value of id, there is more than one self-esteem measurement (this means that id is what’s known as a “random effect”)lme.selfesteemsummary(lme.selfesteem) to display a summary of the resultsOnce you have successfully run the LME anaylsis, your output should look like this:
## Linear mixed-effects model fit by REML
## Data: selfesteem.long
## AIC BIC logLik
## 86.99346 93.47264 -38.49673
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 2.772272e-05 0.8859851
##
## Fixed effects: score ~ time
## Value Std.Error DF t-value p-value
## (Intercept) 3.140122 0.2801731 18 11.207793 0e+00
## timet2 1.793820 0.3962246 18 4.527281 3e-04
## timet3 4.496220 0.3962246 18 11.347655 0e+00
## Correlation:
## (Intr) timet2
## timet2 -0.707
## timet3 -0.707 0.500
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.4987927 -0.6279054 -0.0321792 0.4530803 2.4177254
##
## Number of Observations: 30
## Number of Groups: 10
The part we are interested in is:
## Value Std.Error DF t-value p-value
## (Intercept) 3.140122 0.2801731 18 11.207793 1.502987e-09
## timet2 1.793820 0.3962246 18 4.527281 2.608300e-04
## timet3 4.496220 0.3962246 18 11.347655 1.234335e-09
Note that although the \(p\)-values in the latter display are displayed to a higher degree of accuracy, both outputs display the same information.
To understand the above output, we will consider one row at a time.
(intercept) row corresponds to our ‘base’ time-point: time-point 1
Value column tells us that the mean estimated self-esteem score at time-point 1 is 3.14p-value column tells us whether or not this value is significantly different from zero.time2 row corresponds to time-point 2 (t2)
Value column tells us the estimated difference in mean self-esteem score between time-point 1 and time-point 2p-value column tells us whether or not this difference is significantly different from zero.time3 row corresponds to time-point 3 (t3)
Value column tells us the estimated difference in mean self-esteem score between time-point 1 and time-point 3p-value column tells us whether or not this difference is significantly different from zero.This week, we considered a brief introduction to repeated measures analysis using two different but related methods: Repeated Measures ANOVA, and Linear Mixed Effects Models. Although we don’t have time in this subject, there is still much more to do using these techniques, including checking relevant assumptions, including additional independent variables, and making use of the added flexibility that linear mixed effects models in particular afford us. If you use statistics in your future studies or career, you may have a chance to learn more about these methods. But for now, that’s all for this week!
These notes have been prepared by Amanda Shaker. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.