Example ancova

This is an example of how to run an ANCOVA

First of all, we load our packages and the survey dataset

library(haven)#pull data
library(tidyverse)#basic data manipulation
library(psych)#descriptive stats
library(ggpubr)#quick plots
library(sjstats)#summarise ancova
library(emmeans)#calculate adj means
library(car)# levene test
library(knitr)# nice tables
library(rsconnect)
survey = read_sav("~/Psicologia - UFMG/Mestrado/20191/Métodos Quantitativos em Psicologia/survey.sav")

For this example, we will only work with a few variables to make the example easier. In addition, for simplicity, we will only take the complete dataset, removing missing data. For this, we used the na.omit() function. We also created labels for the age group variable. For this we used the factor() function.

survey = survey %>% 
  select(id, tpstress, agegp3, tmast)
survey=na.omit(survey)
survey$agegp3 = factor(survey$agegp3,levels = c(1,2,3),
                       labels = c("18 - 29", "30 - 44", "45+"))

Quick descriptive analysis

Let’s see how the data is distributed.

describe(survey) %>% 
  select(-vars, -n, -trimmed, - mad, -range) %>% 
  data.frame(.) %>% 
  mutate(Variables = rownames(.)) %>% 
  select(Variables, everything()) %>% 
  filter(Variables != "id" & Variables != "agegp3*") %>% 
  kable(., digits = 2)

Variables	mean	sd	median	min	max	skew	kurtosis	se
tpstress	26.73	5.85	26	12	46	0.24	0.15	0.28
tmast	21.74	3.97	22	8	28	-0.61	0.25	0.19

Now for the categorical variable (i.e., the age group variable).

summary(survey$agegp3)%>% 
  data.frame(.) %>% 
  rename(N = '.') %>% 
  mutate('Age interval' = rownames(.)) %>% 
  select('Age interval', N) %>% 
  kable(., digits = 2)

Age interval	N
18 - 29	147
30 - 44	152
45+	134

For this analysis, we will use the age group as our independent variable, the perceived stress tpstress as the dependent, and the perceived competence tmast as the covariate.

So, let’s describe our data using the age group and the raw mean of the perceveid stress

ggerrorplot(data = survey,
       x = "agegp3",
       y = "tpstress",
       desc_stat = "mean_ci",
       error.plot = "errorbar",
       add = c("mean"),
       add.params = list(color = "black"))

It seems like the main effect of age in the perceived stress is not significant, since all error bars are “touching” each other. But this will be further explored.

ggdensity(data = survey,
       color = "agegp3",
       x = "tpstress",
       fill = "agegp3")

Formal ANOVA

Before running an ANCOVA, we will explore only the main effect of age in the stress.

leveneTest(tpstress ~ agegp3, data = survey) %>% 
  data.frame(.) %>% 
  kable(., digits = 3)

	Df	F.value	Pr..F.
group	2	0.393	0.675
	430	NA	NA

Levene’s test is not significant (yay!)

Now we run the ANOVA. We will use the lm() function, since it uses the type 3 sum of squares. We could also use other popular packages to this, such as the ‘Afex’ or the ‘ez’ packages. However, we will keep it simple.

ano = lm(tpstress ~ agegp3, data = survey)
anova_stats(ano) %>% 
  data.frame(.) %>% 
  select(-partial.etasq, -omegasq, 
         -partial.omegasq, -power,
         -epsilonsq, -cohens.f) %>% 
  kable(.)

	term	df	sumsq	meansq	statistic	p.value	etasq
agegp3	agegp3	2	159.945	79.973	2.353	0.096	0.011
…2	Residuals	430	14611.898	33.981	NA	NA	NA

As we saw in the descriptives, the main effect was not significant.

Formal ANCOVA

Now we will run the ANCOVA model with competence as the covariate. We simply repeat the code above adding the new variable in the model.

anc = lm(tpstress ~ tmast +agegp3, data = survey)
anova_stats(anc) %>% 
  data.frame(.) %>% 
  select(-partial.etasq, -omegasq, 
         -partial.omegasq, -power,
         -epsilonsq, -cohens.f) %>% 
  kable(.)

	term	df	sumsq	meansq	statistic	p.value	etasq
tmast	tmast	1	5526.475	5526.475	263.400	0.000	0.374
agegp3	agegp3	2	244.379	122.189	5.824	0.003	0.017
…3	Residuals	429	9000.989	20.981	NA	NA	NA

Now we can see that, after controlling for competence, there is a main effect of age in the perceived stress.

Post hoc with the adjusted means

Now we can estimate the means for stress in each age group controlling for the covariate (competence). For this, we will use the emmeans package.

emmeans(anc, ~ agegp3) %>% 
  data.frame(.) %>% 
  rename('Age group' = agegp3,
         'Estimated mean' = emmean) %>% 
  kable(., digits = 2)

Age group	Estimated mean	SE	df	lower.CL	upper.CL
18 - 29	27.67	0.38	429	26.92	28.41
30 - 44	26.63	0.37	429	25.90	27.36
45+	25.81	0.40	429	25.03	26.58

Using the same package, we can run a simple post hoc test

estimated_means = emmeans(anc, ~ agegp3)
pairs(estimated_means, adjust = "bonferroni") %>% 
  data.frame(.) %>% 
  rename(Comparison = contrast,
         'Difference' = estimate) %>% 
  kable(., digits = 3)

Comparison	Difference	SE	df	t.ratio	p.value
(18 - 29) - (30 - 44)	1.036	0.530	429	1.955	0.154
(18 - 29) - (45+)	1.860	0.548	429	3.397	0.002
(30 - 44) - (45+)	0.825	0.543	429	1.518	0.389

We can (and should) also provide the effect sizes

eff_size(estimated_means, sigma = sigma(anc), edf = 429) %>% 
  data.frame(.) %>% 
  rename(Comparison = contrast,
         "Std effect size (Cohen's d)" = effect.size) %>% 
  kable(., digits = 2)

Comparison	Std effect size (Cohen’s d)	SE	df	lower.CL	upper.CL
(18 - 29) - (30 - 44)	0.23	0.12	429	0.00	0.45
(18 - 29) - (45+)	0.41	0.12	429	0.17	0.64
(30 - 44) - (45+)	0.18	0.12	429	-0.05	0.41

Plot the results

Finally, we can also plot the results adjusting for the effects of the covariate.

estimated_means %>% data.frame(.) %>% 
  ggplot(aes(x=agegp3, y = emmean)) +
  geom_line(aes(group = 1))+
  geom_point(size=3)+
  geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
                width = 0.2,position = position_dodge(.2))+
  theme_classic2()

Trying something really cool

We could also make an interactive plot. That’s all.

final_plot = estimated_means %>% data.frame(.) %>% 
  mutate(emmean = round(emmean, digits = 2)) %>% 
  ggplot(aes(x=agegp3, y = emmean, color = agegp3)) +
  geom_line(aes(group = 1))+
  geom_point(size=3)+
  geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
                width = 0.2,position = position_dodge(.2))+
  theme_classic2()
library(plotly)
ggplotly(final_plot, tooltip = c("agegp3", "emmean"))