Introduction

Welcome to this R demo session! Here, I will demonstrate how to use R to perform ANCOVA.

ANCOVA

Here, let’s use the puppy example from Field.

dose <- factor(c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3))
happy <- c(3,2,5,2,2,2,7,2,4,7,5,3,4,4,7,5,4,9,2,6,3,4,4,4,6,4,6,2,8,5)
puppylove <- c(4,1,5,1,2,2,7,4,5,5,3,1,2,2,6,4,2,1,3,5,4,3,3,2,0,1,3,0,1,0)
puppy<-data.frame(dose,happy,puppylove)

Run the basic ANOVA first.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Compute the descriptive statistics
group_by(puppy,dose) %>%
  summarize(count = n(),
            mean = mean(happy, na.rm = TRUE),
            sd = sd(happy, na.rm = TRUE))
## # A tibble: 3 × 4
##   dose  count  mean    sd
##   <fct> <int> <dbl> <dbl>
## 1 1         9  3.22  1.79
## 2 2         8  4.88  1.46
## 3 3        13  4.85  2.12
# Run the ANOVA model
output1<-aov(happy~dose,data=puppy)
summary(output1)
##             Df Sum Sq Mean Sq F value Pr(>F)
## dose         2  16.84   8.422   2.416  0.108
## Residuals   27  94.12   3.486

These results indicate no evidence of an effect of dose. Let’s take a look at some post-hoc tests.

pairwise.t.test(puppy$happy,puppy$dose,p.adjust.method = "BH")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  puppy$happy and puppy$dose 
## 
##   1    2   
## 2 0.12 -   
## 3 0.12 0.97
## 
## P value adjustment method: BH
pairwise.t.test(puppy$happy,puppy$dose,p.adjust.method = "none")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  puppy$happy and puppy$dose 
## 
##   1     2    
## 2 0.080 -    
## 3 0.055 0.973
## 
## P value adjustment method: none
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
## 
##     geyser
ph<-glht(output1,linfct=mcp(dose="Tukey"))
summary(ph)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: aov(formula = happy ~ dose, data = puppy)
## 
## Linear Hypotheses:
##            Estimate Std. Error t value Pr(>|t|)
## 2 - 1 == 0  1.65278    0.90724   1.822    0.181
## 3 - 1 == 0  1.62393    0.80963   2.006    0.130
## 3 - 2 == 0 -0.02885    0.83899  -0.034    0.999
## (Adjusted p values reported -- single-step method)

Still nothing there. But maybe puppy love is a confounder, or at least it is associated with the outcome so including it as a covariate may reduce within variance.

First, let’s check if the covariate is related to the outcome, which is necessary for including the covariate to be worth it.

cor.test(puppy$happy,puppy$puppylove)
## 
##  Pearson's product-moment correlation
## 
## data:  puppy$happy and puppy$puppylove
## t = 1.345, df = 28, p-value = 0.1894
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1250150  0.5571688
## sample estimates:
##       cor 
## 0.2463496
group_by(puppy,dose) %>%
  summarize(count = n(),
            mean = mean(puppylove, na.rm = TRUE),
            sd = sd(puppylove, na.rm = TRUE))
## # A tibble: 3 × 4
##   dose  count  mean    sd
##   <fct> <int> <dbl> <dbl>
## 1 1         9  3.44  2.07
## 2 2         8  3.12  1.73
## 3 3        13  2     1.63

The correlation is not significant, but the correlation is not trivial and the sample size is low.

There may also be a relation between puppy love and the independent variable, dose. This is not necessary for ANCOVA to be useful, but it is more likely to be useful if the covariate is related to the IV. Let’s check if there is a statistical significant relation using an ANOVA with the covariate as the outcome.

output2<-aov(puppylove~dose,data=puppy)
summary(output2)
##             Df Sum Sq Mean Sq F value Pr(>F)
## dose         2  12.77   6.385   1.979  0.158
## Residuals   27  87.10   3.226

Nope.

These explorations of our data suggest the covariate might be useful, but there isn’t strong evidence. Let’s run with it anyway. To add a covariate, just add the variable to the right side of the ~ formula in aov, with a + separating predictors.

output3<-aov(happy~puppylove+dose,data=puppy)
summary(output3)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## puppylove    1   6.73   6.734   2.215 0.1487  
## dose         2  25.19  12.593   4.142 0.0274 *
## Residuals   26  79.05   3.040                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

After controlling for puppylove, dose is a significant predictor of happiness. The covariate did its job!

ph<-glht(output3,linfct=mcp(dose="Tukey"))
summary(ph)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: aov(formula = happy ~ puppylove + dose, data = puppy)
## 
## Linear Hypotheses:
##            Estimate Std. Error t value Pr(>|t|)  
## 2 - 1 == 0   1.7857     0.8494   2.102   0.1090  
## 3 - 1 == 0   2.2249     0.8028   2.771   0.0265 *
## 3 - 2 == 0   0.4392     0.8112   0.541   0.8516  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

You can do post-hoc tests with ANCOVA. They work just like post-hoc tests we saw before. Here, there is a significant difference between dose = 1 and dose = 3 when using the Tukey post-hoc test.

Note that the order of variables in the aov function matters.

output4<-aov(happy~dose+puppylove,data=puppy)
summary(output4)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## dose         2  16.84   8.422   2.770 0.0812 .
## puppylove    1  15.08  15.076   4.959 0.0348 *
## Residuals   26  79.05   3.040                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The order of predictors doesn’t affect post-hoc tests.

ph<-glht(output4,linfct=mcp(dose="Tukey"))
summary(ph)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: aov(formula = happy ~ dose + puppylove, data = puppy)
## 
## Linear Hypotheses:
##            Estimate Std. Error t value Pr(>|t|)  
## 2 - 1 == 0   1.7857     0.8494   2.102   0.1089  
## 3 - 1 == 0   2.2249     0.8028   2.771   0.0267 *
## 3 - 2 == 0   0.4392     0.8112   0.541   0.8516  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Effect sizes in ANOVA and ANCOVA are specific to predictor.

library(sjstats)
knitr::kable(anova_stats(output3)) %>%
  kableExtra::kable_styling()
etasq partial.etasq omegasq partial.omegasq epsilonsq cohens.f term sumsq df meansq statistic p.value power
puppylove 0.061 0.079 0.032 0.039 0.033 0.292 puppylove 6.734 1 6.734 2.215 0.149 0.319
dose 0.227 0.242 0.168 0.173 0.172 0.564 dose 25.185 2 12.593 4.142 0.027 0.730
1 NA NA NA NA NA NA Residuals 79.047 26 3.040 NA NA NA

\(\omega^2\) for dose is .168, which is a medium to large effect.

To check homogeneity of regression, include the interaction between the predictor and the covariate by changing the + to a *. We’ll learn more about interactions soon.

output5<-aov(happy~dose*puppylove,data=puppy)
summary(output5)
##                Df Sum Sq Mean Sq F value Pr(>F)  
## dose            2  16.84   8.422   3.448 0.0483 *
## puppylove       1  15.08  15.076   6.172 0.0203 *
## dose:puppylove  2  20.43  10.213   4.181 0.0277 *
## Residuals      24  58.62   2.443                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The interaction is marked with a : between the two variables from which the interaction is formed.

Here, the interaction between dose and puppylove is significant, which means the effect of puppylove on happiness depends on dose. This is a violation of homogeneity of regression. We should not really use ANCOVA because there is no isolated effect of the group variable (dose). That is, if the effect of puppylove on happiness depends on dose, then it is equivalently true that the effect of dose on happiness depends on puppylove. This means that there is no single effect of dose; rather, it depends on level of puppylove.