First of all, we load our packages and the survey dataset
library(haven)#pull data
library(tidyverse)#basic data manipulation
library(psych)#descriptive stats
library(ggpubr)#quick plots
library(sjstats)#summarise ancova
library(emmeans)#calculate adj means
library(car)# levene test
library(knitr)# nice tables
library(rsconnect)
survey = read_sav("~/Psicologia - UFMG/Mestrado/20191/Métodos Quantitativos em Psicologia/survey.sav")
For this example, we will only work with a few variables to make the example easier. In addition, for simplicity, we will only take the complete dataset, removing missing data. For this, we used the na.omit() function. We also created labels for the age group variable. For this we used the factor() function.
survey = survey %>%
select(id, tpstress, agegp3, tmast)
survey=na.omit(survey)
survey$agegp3 = factor(survey$agegp3,levels = c(1,2,3),
labels = c("18 - 29", "30 - 44", "45+"))
Let’s see how the data is distributed.
describe(survey) %>%
select(-vars, -n, -trimmed, - mad, -range) %>%
data.frame(.) %>%
mutate(Variables = rownames(.)) %>%
select(Variables, everything()) %>%
filter(Variables != "id" & Variables != "agegp3*") %>%
kable(., digits = 2)
| Variables | mean | sd | median | min | max | skew | kurtosis | se |
|---|---|---|---|---|---|---|---|---|
| tpstress | 26.73 | 5.85 | 26 | 12 | 46 | 0.24 | 0.15 | 0.28 |
| tmast | 21.74 | 3.97 | 22 | 8 | 28 | -0.61 | 0.25 | 0.19 |
Now for the categorical variable (i.e., the age group variable).
summary(survey$agegp3)%>%
data.frame(.) %>%
rename(N = '.') %>%
mutate('Age interval' = rownames(.)) %>%
select('Age interval', N) %>%
kable(., digits = 2)
| Age interval | N |
|---|---|
| 18 - 29 | 147 |
| 30 - 44 | 152 |
| 45+ | 134 |
For this analysis, we will use the age group as our independent variable, the perceived stress tpstress as the dependent, and the perceived competence tmast as the covariate.
So, let’s describe our data using the age group and the raw mean of the perceveid stress
ggerrorplot(data = survey,
x = "agegp3",
y = "tpstress",
desc_stat = "mean_ci",
error.plot = "errorbar",
add = c("mean"),
add.params = list(color = "black"))
It seems like the main effect of age in the perceived stress is not significant, since all error bars are “touching” each other. But this will be further explored.
ggdensity(data = survey,
color = "agegp3",
x = "tpstress",
fill = "agegp3")
Before running an ANCOVA, we will explore only the main effect of age in the stress.
leveneTest(tpstress ~ agegp3, data = survey) %>%
data.frame(.) %>%
kable(., digits = 3)
| Df | F.value | Pr..F. | |
|---|---|---|---|
| group | 2 | 0.393 | 0.675 |
| 430 | NA | NA |
Levene’s test is not significant (yay!)
Now we run the ANOVA. We will use the lm() function, since it uses the type 3 sum of squares. We could also use other popular packages to this, such as the ‘Afex’ or the ‘ez’ packages. However, we will keep it simple.
ano = lm(tpstress ~ agegp3, data = survey)
anova_stats(ano) %>%
data.frame(.) %>%
select(-partial.etasq, -omegasq,
-partial.omegasq, -power,
-epsilonsq, -cohens.f) %>%
kable(.)
| term | df | sumsq | meansq | statistic | p.value | etasq | |
|---|---|---|---|---|---|---|---|
| agegp3 | agegp3 | 2 | 159.945 | 79.973 | 2.353 | 0.096 | 0.011 |
| …2 | Residuals | 430 | 14611.898 | 33.981 | NA | NA | NA |
As we saw in the descriptives, the main effect was not significant.
Now we will run the ANCOVA model with competence as the covariate. We simply repeat the code above adding the new variable in the model.
anc = lm(tpstress ~ tmast +agegp3, data = survey)
anova_stats(anc) %>%
data.frame(.) %>%
select(-partial.etasq, -omegasq,
-partial.omegasq, -power,
-epsilonsq, -cohens.f) %>%
kable(.)
| term | df | sumsq | meansq | statistic | p.value | etasq | |
|---|---|---|---|---|---|---|---|
| tmast | tmast | 1 | 5526.475 | 5526.475 | 263.400 | 0.000 | 0.374 |
| agegp3 | agegp3 | 2 | 244.379 | 122.189 | 5.824 | 0.003 | 0.017 |
| …3 | Residuals | 429 | 9000.989 | 20.981 | NA | NA | NA |
Now we can see that, after controlling for competence, there is a main effect of age in the perceived stress.
Now we can estimate the means for stress in each age group controlling for the covariate (competence). For this, we will use the emmeans package.
emmeans(anc, ~ agegp3) %>%
data.frame(.) %>%
rename('Age group' = agegp3,
'Estimated mean' = emmean) %>%
kable(., digits = 2)
| Age group | Estimated mean | SE | df | lower.CL | upper.CL |
|---|---|---|---|---|---|
| 18 - 29 | 27.67 | 0.38 | 429 | 26.92 | 28.41 |
| 30 - 44 | 26.63 | 0.37 | 429 | 25.90 | 27.36 |
| 45+ | 25.81 | 0.40 | 429 | 25.03 | 26.58 |
Using the same package, we can run a simple post hoc test
estimated_means = emmeans(anc, ~ agegp3)
pairs(estimated_means, adjust = "bonferroni") %>%
data.frame(.) %>%
rename(Comparison = contrast,
'Difference' = estimate) %>%
kable(., digits = 3)
| Comparison | Difference | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|
| (18 - 29) - (30 - 44) | 1.036 | 0.530 | 429 | 1.955 | 0.154 |
| (18 - 29) - (45+) | 1.860 | 0.548 | 429 | 3.397 | 0.002 |
| (30 - 44) - (45+) | 0.825 | 0.543 | 429 | 1.518 | 0.389 |
We can (and should) also provide the effect sizes
eff_size(estimated_means, sigma = sigma(anc), edf = 429) %>%
data.frame(.) %>%
rename(Comparison = contrast,
"Std effect size (Cohen's d)" = effect.size) %>%
kable(., digits = 2)
| Comparison | Std effect size (Cohen’s d) | SE | df | lower.CL | upper.CL |
|---|---|---|---|---|---|
| (18 - 29) - (30 - 44) | 0.23 | 0.12 | 429 | 0.00 | 0.45 |
| (18 - 29) - (45+) | 0.41 | 0.12 | 429 | 0.17 | 0.64 |
| (30 - 44) - (45+) | 0.18 | 0.12 | 429 | -0.05 | 0.41 |
Finally, we can also plot the results adjusting for the effects of the covariate.
estimated_means %>% data.frame(.) %>%
ggplot(aes(x=agegp3, y = emmean)) +
geom_line(aes(group = 1))+
geom_point(size=3)+
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
width = 0.2,position = position_dodge(.2))+
theme_classic2()
We could also make an interactive plot. That’s all.
final_plot = estimated_means %>% data.frame(.) %>%
mutate(emmean = round(emmean, digits = 2)) %>%
ggplot(aes(x=agegp3, y = emmean, color = agegp3)) +
geom_line(aes(group = 1))+
geom_point(size=3)+
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
width = 0.2,position = position_dodge(.2))+
theme_classic2()
library(plotly)
ggplotly(final_plot, tooltip = c("agegp3", "emmean"))