Welcome to this R demo session! Here, I will demonstrate how to use R to perform ANCOVA.
Here, let’s use the puppy example from Field.
dose <- factor(c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3))
happy <- c(3,2,5,2,2,2,7,2,4,7,5,3,4,4,7,5,4,9,2,6,3,4,4,4,6,4,6,2,8,5)
puppylove <- c(4,1,5,1,2,2,7,4,5,5,3,1,2,2,6,4,2,1,3,5,4,3,3,2,0,1,3,0,1,0)
puppy<-data.frame(dose,happy,puppylove)
Run the basic ANOVA first.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Compute the descriptive statistics
group_by(puppy,dose) %>%
summarize(count = n(),
mean = mean(happy, na.rm = TRUE),
sd = sd(happy, na.rm = TRUE))
## # A tibble: 3 × 4
## dose count mean sd
## <fct> <int> <dbl> <dbl>
## 1 1 9 3.22 1.79
## 2 2 8 4.88 1.46
## 3 3 13 4.85 2.12
# Run the ANOVA model
output1<-aov(happy~dose,data=puppy)
summary(output1)
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 2 16.84 8.422 2.416 0.108
## Residuals 27 94.12 3.486
These results indicate no evidence of an effect of dose. Let’s take a look at some post-hoc tests.
pairwise.t.test(puppy$happy,puppy$dose,p.adjust.method = "BH")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: puppy$happy and puppy$dose
##
## 1 2
## 2 0.12 -
## 3 0.12 0.97
##
## P value adjustment method: BH
pairwise.t.test(puppy$happy,puppy$dose,p.adjust.method = "none")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: puppy$happy and puppy$dose
##
## 1 2
## 2 0.080 -
## 3 0.055 0.973
##
## P value adjustment method: none
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
##
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
ph<-glht(output1,linfct=mcp(dose="Tukey"))
summary(ph)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = happy ~ dose, data = puppy)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## 2 - 1 == 0 1.65278 0.90724 1.822 0.181
## 3 - 1 == 0 1.62393 0.80963 2.006 0.130
## 3 - 2 == 0 -0.02885 0.83899 -0.034 0.999
## (Adjusted p values reported -- single-step method)
Still nothing there. But maybe puppy love is a confounder, or at least it is associated with the outcome so including it as a covariate may reduce within variance.
First, let’s check if the covariate is related to the outcome, which is necessary for including the covariate to be worth it.
cor.test(puppy$happy,puppy$puppylove)
##
## Pearson's product-moment correlation
##
## data: puppy$happy and puppy$puppylove
## t = 1.345, df = 28, p-value = 0.1894
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1250150 0.5571688
## sample estimates:
## cor
## 0.2463496
group_by(puppy,dose) %>%
summarize(count = n(),
mean = mean(puppylove, na.rm = TRUE),
sd = sd(puppylove, na.rm = TRUE))
## # A tibble: 3 × 4
## dose count mean sd
## <fct> <int> <dbl> <dbl>
## 1 1 9 3.44 2.07
## 2 2 8 3.12 1.73
## 3 3 13 2 1.63
The correlation is not significant, but the correlation is not trivial and the sample size is low.
There may also be a relation between puppy love and the independent variable, dose. This is not necessary for ANCOVA to be useful, but it is more likely to be useful if the covariate is related to the IV. Let’s check if there is a statistical significant relation using an ANOVA with the covariate as the outcome.
output2<-aov(puppylove~dose,data=puppy)
summary(output2)
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 2 12.77 6.385 1.979 0.158
## Residuals 27 87.10 3.226
Nope.
These explorations of our data suggest the covariate might be useful,
but there isn’t strong evidence. Let’s run with it anyway. To add a
covariate, just add the variable to the right side of the ~ formula in
aov, with a + separating predictors.
output3<-aov(happy~puppylove+dose,data=puppy)
summary(output3)
## Df Sum Sq Mean Sq F value Pr(>F)
## puppylove 1 6.73 6.734 2.215 0.1487
## dose 2 25.19 12.593 4.142 0.0274 *
## Residuals 26 79.05 3.040
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
After controlling for puppylove, dose is a significant predictor of happiness. The covariate did its job!
ph<-glht(output3,linfct=mcp(dose="Tukey"))
summary(ph)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = happy ~ puppylove + dose, data = puppy)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## 2 - 1 == 0 1.7857 0.8494 2.102 0.1090
## 3 - 1 == 0 2.2249 0.8028 2.771 0.0265 *
## 3 - 2 == 0 0.4392 0.8112 0.541 0.8516
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
You can do post-hoc tests with ANCOVA. They work just like post-hoc tests we saw before. Here, there is a significant difference between dose = 1 and dose = 3 when using the Tukey post-hoc test.
Note that the order of variables in the aov function
matters.
output4<-aov(happy~dose+puppylove,data=puppy)
summary(output4)
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 2 16.84 8.422 2.770 0.0812 .
## puppylove 1 15.08 15.076 4.959 0.0348 *
## Residuals 26 79.05 3.040
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The order of predictors doesn’t affect post-hoc tests.
ph<-glht(output4,linfct=mcp(dose="Tukey"))
summary(ph)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = happy ~ dose + puppylove, data = puppy)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## 2 - 1 == 0 1.7857 0.8494 2.102 0.1089
## 3 - 1 == 0 2.2249 0.8028 2.771 0.0267 *
## 3 - 2 == 0 0.4392 0.8112 0.541 0.8516
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
Effect sizes in ANOVA and ANCOVA are specific to predictor.
library(sjstats)
knitr::kable(anova_stats(output3)) %>%
kableExtra::kable_styling()
| etasq | partial.etasq | omegasq | partial.omegasq | epsilonsq | cohens.f | term | sumsq | df | meansq | statistic | p.value | power | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| puppylove | 0.061 | 0.079 | 0.032 | 0.039 | 0.033 | 0.292 | puppylove | 6.734 | 1 | 6.734 | 2.215 | 0.149 | 0.319 |
| dose | 0.227 | 0.242 | 0.168 | 0.173 | 0.172 | 0.564 | dose | 25.185 | 2 | 12.593 | 4.142 | 0.027 | 0.730 |
| 1 | NA | NA | NA | NA | NA | NA | Residuals | 79.047 | 26 | 3.040 | NA | NA | NA |
\(\omega^2\) for dose is .168, which is a medium to large effect.
To check homogeneity of regression, include the interaction between
the predictor and the covariate by changing the + to a
*. We’ll learn more about interactions soon.
output5<-aov(happy~dose*puppylove,data=puppy)
summary(output5)
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 2 16.84 8.422 3.448 0.0483 *
## puppylove 1 15.08 15.076 6.172 0.0203 *
## dose:puppylove 2 20.43 10.213 4.181 0.0277 *
## Residuals 24 58.62 2.443
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The interaction is marked with a : between the two
variables from which the interaction is formed.
Here, the interaction between dose and puppylove is significant, which means the effect of puppylove on happiness depends on dose. This is a violation of homogeneity of regression. We should not really use ANCOVA because there is no isolated effect of the group variable (dose). That is, if the effect of puppylove on happiness depends on dose, then it is equivalently true that the effect of dose on happiness depends on puppylove. This means that there is no single effect of dose; rather, it depends on level of puppylove.