This is an example of how to run an ANCOVA

First of all, we load our packages and the survey dataset

library(haven)#pull data
library(tidyverse)#basic data manipulation
library(psych)#descriptive stats
library(ggpubr)#quick plots
library(sjstats)#summarise ancova
library(emmeans)#calculate adj means
library(car)# levene test
library(knitr)# nice tables
library(rsconnect)
survey = read_sav("~/Psicologia - UFMG/Mestrado/20191/Métodos Quantitativos em Psicologia/survey.sav")

For this example, we will only work with a few variables to make the example easier. In addition, for simplicity, we will only take the complete dataset, removing missing data. For this, we used the na.omit() function. We also created labels for the age group variable. For this we used the factor() function.

survey = survey %>% 
  select(id, tpstress, agegp3, tmast)
survey=na.omit(survey)
survey$agegp3 = factor(survey$agegp3,levels = c(1,2,3),
                       labels = c("18 - 29", "30 - 44", "45+"))

Quick descriptive analysis

Let’s see how the data is distributed.

describe(survey) %>% 
  select(-vars, -n, -trimmed, - mad, -range) %>% 
  data.frame(.) %>% 
  mutate(Variables = rownames(.)) %>% 
  select(Variables, everything()) %>% 
  filter(Variables != "id" & Variables != "agegp3*") %>% 
  kable(., digits = 2)
Variables mean sd median min max skew kurtosis se
tpstress 26.73 5.85 26 12 46 0.24 0.15 0.28
tmast 21.74 3.97 22 8 28 -0.61 0.25 0.19

Now for the categorical variable (i.e., the age group variable).

summary(survey$agegp3)%>% 
  data.frame(.) %>% 
  rename(N = '.') %>% 
  mutate('Age interval' = rownames(.)) %>% 
  select('Age interval', N) %>% 
  kable(., digits = 2)
Age interval N
18 - 29 147
30 - 44 152
45+ 134

For this analysis, we will use the age group as our independent variable, the perceived stress tpstress as the dependent, and the perceived competence tmast as the covariate.

So, let’s describe our data using the age group and the raw mean of the perceveid stress

ggerrorplot(data = survey,
       x = "agegp3",
       y = "tpstress",
       desc_stat = "mean_ci",
       error.plot = "errorbar",
       add = c("mean"),
       add.params = list(color = "black"))

It seems like the main effect of age in the perceived stress is not significant, since all error bars are “touching” each other. But this will be further explored.

ggdensity(data = survey,
       color = "agegp3",
       x = "tpstress",
       fill = "agegp3")

Formal ANOVA

Before running an ANCOVA, we will explore only the main effect of age in the stress.

leveneTest(tpstress ~ agegp3, data = survey) %>% 
  data.frame(.) %>% 
  kable(., digits = 3)
Df F.value Pr..F.
group 2 0.393 0.675
430 NA NA

Levene’s test is not significant (yay!)

Now we run the ANOVA. We will use the lm() function, since it uses the type 3 sum of squares. We could also use other popular packages to this, such as the ‘Afex’ or the ‘ez’ packages. However, we will keep it simple.

ano = lm(tpstress ~ agegp3, data = survey)
anova_stats(ano) %>% 
  data.frame(.) %>% 
  select(-partial.etasq, -omegasq, 
         -partial.omegasq, -power,
         -epsilonsq, -cohens.f) %>% 
  kable(.)
term df sumsq meansq statistic p.value etasq
agegp3 agegp3 2 159.945 79.973 2.353 0.096 0.011
…2 Residuals 430 14611.898 33.981 NA NA NA

As we saw in the descriptives, the main effect was not significant.

Formal ANCOVA

Now we will run the ANCOVA model with competence as the covariate. We simply repeat the code above adding the new variable in the model.

anc = lm(tpstress ~ tmast +agegp3, data = survey)
anova_stats(anc) %>% 
  data.frame(.) %>% 
  select(-partial.etasq, -omegasq, 
         -partial.omegasq, -power,
         -epsilonsq, -cohens.f) %>% 
  kable(.)
term df sumsq meansq statistic p.value etasq
tmast tmast 1 5526.475 5526.475 263.400 0.000 0.374
agegp3 agegp3 2 244.379 122.189 5.824 0.003 0.017
…3 Residuals 429 9000.989 20.981 NA NA NA

Now we can see that, after controlling for competence, there is a main effect of age in the perceived stress.

Post hoc with the adjusted means

Now we can estimate the means for stress in each age group controlling for the covariate (competence). For this, we will use the emmeans package.

emmeans(anc, ~ agegp3) %>% 
  data.frame(.) %>% 
  rename('Age group' = agegp3,
         'Estimated mean' = emmean) %>% 
  kable(., digits = 2)
Age group Estimated mean SE df lower.CL upper.CL
18 - 29 27.67 0.38 429 26.92 28.41
30 - 44 26.63 0.37 429 25.90 27.36
45+ 25.81 0.40 429 25.03 26.58

Using the same package, we can run a simple post hoc test

estimated_means = emmeans(anc, ~ agegp3)
pairs(estimated_means, adjust = "bonferroni") %>% 
  data.frame(.) %>% 
  rename(Comparison = contrast,
         'Difference' = estimate) %>% 
  kable(., digits = 3)
Comparison Difference SE df t.ratio p.value
(18 - 29) - (30 - 44) 1.036 0.530 429 1.955 0.154
(18 - 29) - (45+) 1.860 0.548 429 3.397 0.002
(30 - 44) - (45+) 0.825 0.543 429 1.518 0.389

We can (and should) also provide the effect sizes

eff_size(estimated_means, sigma = sigma(anc), edf = 429) %>% 
  data.frame(.) %>% 
  rename(Comparison = contrast,
         "Std effect size (Cohen's d)" = effect.size) %>% 
  kable(., digits = 2)
Comparison Std effect size (Cohen’s d) SE df lower.CL upper.CL
(18 - 29) - (30 - 44) 0.23 0.12 429 0.00 0.45
(18 - 29) - (45+) 0.41 0.12 429 0.17 0.64
(30 - 44) - (45+) 0.18 0.12 429 -0.05 0.41

Plot the results

Finally, we can also plot the results adjusting for the effects of the covariate.

estimated_means %>% data.frame(.) %>% 
  ggplot(aes(x=agegp3, y = emmean)) +
  geom_line(aes(group = 1))+
  geom_point(size=3)+
  geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
                width = 0.2,position = position_dodge(.2))+
  theme_classic2()

Trying something really cool

We could also make an interactive plot. That’s all.

final_plot = estimated_means %>% data.frame(.) %>% 
  mutate(emmean = round(emmean, digits = 2)) %>% 
  ggplot(aes(x=agegp3, y = emmean, color = agegp3)) +
  geom_line(aes(group = 1))+
  geom_point(size=3)+
  geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
                width = 0.2,position = position_dodge(.2))+
  theme_classic2()
library(plotly)
ggplotly(final_plot, tooltip = c("agegp3", "emmean"))