Useful resource
https://towardsdatascience.com/doing-and-reporting-your-first-anova-and-ancova-in-r-1d820940f2ef
library(tidyverse)
library(emmeans)
library(car)
library(afex)
library(rstatix)
library(datarium)
library(report)
library(kableExtra)
options(scipen = 999) # no scientific notation
The anxiety dataset (from datarium) has a between subjects factor for exercise group and a within subjects factor for time of testing. The DV is anxiety score.
anxiety <- anxiety
glimpse(anxiety)
## Rows: 45
## Columns: 5
## $ id <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1…
## $ group <fct> grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1…
## $ t1 <dbl> 14.1, 14.5, 15.7, 16.0, 16.5, 16.9, 17.0, 17.0, 17.3, 17.3, 17.8…
## $ t2 <dbl> 14.4, 14.6, 15.2, 15.5, 15.8, 16.5, 16.8, 17.1, 16.9, 17.1, 17.7…
## $ t3 <dbl> 14.1, 14.3, 14.9, 15.3, 15.7, 16.2, 16.5, 16.6, 16.5, 16.7, 17.3…
car packageI am interested in whether there is a main effect of group on post test anxiety, controlling for pre test anxiety.
First selecting just the pre and post test. Checking data types.
anx <- anxiety %>%
select(id, group, pre = t1, post = t3)
glimpse(anx)
## Rows: 45
## Columns: 4
## $ id <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1…
## $ group <fct> grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1…
## $ pre <dbl> 14.1, 14.5, 15.7, 16.0, 16.5, 16.9, 17.0, 17.0, 17.3, 17.3, 17.8…
## $ post <dbl> 14.1, 14.3, 14.9, 15.3, 15.7, 16.2, 16.5, 16.6, 16.5, 16.7, 17.3…
Here I am using the aov() function along with summary() to get p values, and emmeans to get contrasts.
There is a main effect of group on post test anxiety, but we want to control for potential differences at baseline.
test1 <- aov(post ~ group, data = anx)
summary(test1)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 67.56 33.78 13.8 0.0000248 ***
## Residuals 42 102.83 2.45
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans(test1, pairwise ~ group)
## $emmeans
## group emmean SE df lower.CL upper.CL
## grp1 16.5 0.404 42 15.7 17.3
## grp2 15.5 0.404 42 14.7 16.3
## grp3 13.6 0.404 42 12.7 14.4
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## grp1 - grp2 0.98 0.571 42 1.715 0.2115
## grp1 - grp3 2.95 0.571 42 5.157 <.0001
## grp2 - grp3 1.97 0.571 42 3.442 0.0037
##
## P value adjustment: tukey method for comparing a family of 3 estimates
We can use the same aov () function and add the additional
covariates, but we need to use car::Anova() rather than
stats::summary(). By default base R and the summary() function uses Type
I errors. That is fine for simple anova, but when there is more than 1
variable, it is important to use Type III errors, so we need to use
Anova() from car.
# adding covariates before the factor we care about
test2 <- aov(post ~ pre + group, data = anx)
Anova(test2, type = "III")
## Anova Table (Type III tests)
##
## Response: post
## Sum Sq Df F value Pr(>F)
## (Intercept) 0.040 1 0.1739 0.6789
## pre 93.366 1 404.2903 <0.0000000000000002 ***
## group 69.865 2 151.2636 <0.0000000000000002 ***
## Residuals 9.468 41
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here we would say that there were significant effects of pre test anx on post test anx, and that there was an effect of group on post test, even after controlling for pre test.
Note that running summary() which uses Type I errors gives different (incorrect) results.
summary(test2)
## Df Sum Sq Mean Sq F value Pr(>F)
## pre 1 91.06 91.06 394.3 <0.0000000000000002 ***
## group 2 69.87 34.93 151.3 <0.0000000000000002 ***
## Residuals 41 9.47 0.23
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
One blog I found also suggested using lm() so lets check whether that gives the same result.
test3 <- lm(post ~ pre + group , data = anx)
test3result <- Anova(test3, type= "III")
kable(test3result)
| Sum Sq | Df | F value | Pr(>F) | |
|---|---|---|---|---|
| (Intercept) | 0.0401564 | 1 | 0.1738835 | 0.6788586 |
| pre | 93.3661886 | 1 | 404.2902904 | 0.0000000 |
| group | 69.8651634 | 2 | 151.2635763 | 0.0000000 |
| Residuals | 9.4684780 | 41 | NA | NA |
Ahhh, so it doesn’t matter whether you use aov() or lm() initially.
afex packagehttps://github.com/singmann/afexc https://cran.r-project.org/web/packages/afex/vignettes/afex_anova_example.html
aov_ez with covariate- need to include factorize = FALSE
# template code
aov_ez("id", "dv", data = df,
within = c("w1"),
between = c("b1", "b2"),
covariate = "c1",
observed = "b2", "c1"), factorize = FALSE)
ez_test <- aov_ez(id = "id", "post", data = anx,
between = "group",
observed = "pre", # observed are variables that are not manipulated
covariate = "pre", factorize = FALSE)
## Warning: Numerical variables NOT centered on 0 (matters if variable in interaction):
## pre
## Contrasts set to contr.sum for the following variables: group
ez_test
## Anova Table (Type 3 tests)
##
## Response: post
## Effect df MSE F ges p.value
## 1 group 2, 41 0.23 151.26 *** .405 <.001
## 2 pre 1, 41 0.23 404.29 *** .908 <.001
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
Ok for Fs and ps for ex_test are the same as for Anova() above. Seems it doesn’t matter about the order of variables.
Other options from ez, make a nice() table that is a dataframe. Use kable for nice formatting in your report. pvalues print as <.001 rather than the actual values (which has MANY zeros)
df_ez_test <- nice(ez_test) # make a dataframe
kable(df_ez_test) %>%
kable_styling() # print a nice table
| Effect | df | MSE | F | ges | p.value |
|---|---|---|---|---|---|
| group | 2, 41 | 0.23 | 151.26 *** | .405 | <.001 |
| pre | 1, 41 | 0.23 | 404.29 *** | .908 | <.001 |
rstatix packageOrder matters, put covariate in first, no interaction. Interesting that output is Type II tests (same as TypeIII?).
test1 <- anx %>%
anova_test(post ~ pre + group)
get_anova_table(test1) %>%
kable()
| Effect | DFn | DFd | F | p | p<.05 | ges |
|---|---|---|---|---|---|---|
| pre | 1 | 41 | 404.290 | 0 | * | 0.908 |
| group | 2 | 41 | 151.264 | 0 | * | 0.881 |
pwc <- anx %>%
emmeans_test(
post ~ group, covariate = pre,
p.adjust.method = "bonferroni"
)
kable(pwc)
| term | .y. | group1 | group2 | df | statistic | p | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|---|---|
| pre*group | post | grp1 | grp2 | 41 | 3.087303 | 0.0036146 | 0.0108439 | * |
| pre*group | post | grp1 | grp3 | 41 | 16.376613 | 0.0000000 | 0.0000000 | **** |
| pre*group | post | grp2 | grp3 | 41 | 13.200247 | 0.0000000 | 0.0000000 | **** |
get_emmeans(pwc) %>%
kable()
| pre | group | emmean | se | df | conf.low | conf.high | method |
|---|---|---|---|---|---|---|---|
| 16.91556 | grp1 | 16.33782 | 0.1243640 | 41 | 16.08667 | 16.58898 | Emmeans test |
| 16.91556 | grp2 | 15.79199 | 0.1247799 | 41 | 15.53999 | 16.04399 | Emmeans test |
| 16.91556 | grp3 | 13.46352 | 0.1241730 | 41 | 13.21275 | 13.71429 | Emmeans test |
Plot scatter for each group
anx %>%
ggplot(aes(x = pre, y = post, colour = group)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~group)
## `geom_smooth()` using formula = 'y ~ x'
Include interaction term between covariate and group- if interaction is not significant, slopes are equal.
int_model <- aov(post ~ group * pre, data = anx)
summary(int_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 67.56 33.78 145.541 <0.0000000000000002 ***
## pre 1 93.37 93.37 402.297 <0.0000000000000002 ***
## group:pre 2 0.42 0.21 0.899 0.415
## Residuals 39 9.05 0.23
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Shapiro Wilk test and Q-Q plot
model <- aov(post ~ pre + group, data = anx)
#shapiro.test(model) # not sure why this doesn't work
qqnorm(residuals(model))
#qqline(residuals(model)) # also this not working
Levene test non sig means equal variance
leveneTest(post ~ group, data = anx)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.3351 0.7172
## 42
Test with ANOVA effect of group on covariate, non sig indicates independence
covariate <- aov(pre ~ group, data = anx)
summary(covariate)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 1.67 0.8336 0.365 0.696
## Residuals 42 95.89 2.2831
Look at standardized residuals, more than 3 indicates outlier
standardized_residuals <- rstandard(model)
plot(standardized_residuals)
plot scatter, there is a linear relation between outcome and covariate for each group.
anx %>%
ggplot(aes(x = pre, y = post, colour = group)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ group)
## `geom_smooth()` using formula = 'y ~ x'
checking no sig interaction between covariate and grouping variable
anx %>%
anova_test(post ~ group*pre)
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 group 2 39 150.518 0.000000000000000000458000 * 0.885
## 2 pre 1 39 402.297 0.000000000000000000000377 * 0.912
## 3 group:pre 2 39 0.899 0.414999999999999980015986 0.044
Fit model with covariate first, use augment() from broom to add fitted values/residuals to model_metrics. Use shapiro_test to check normality of residuals, not sig = normal.
# fit the model covariate first
model <- lm(post ~ pre + group, data = anx)
# look at model diagnostic metrics
model_metrics <- augment(model) %>%
select(-.hat, -.sigma, -.fitted) # remove columns
head(model_metrics, 3)
## # A tibble: 3 × 6
## post pre group .resid .cooksd .std.resid
## <dbl> <dbl> <fct> <dbl> <dbl> <dbl>
## 1 14.1 14.1 grp1 0.540 0.0715 1.23
## 2 14.3 14.5 grp1 0.346 0.0237 0.774
## 3 14.9 15.7 grp1 -0.238 0.00640 -0.519
# check normality of residuals with shapiro wilk test
shapiro_test(model_metrics$.resid)
## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 model_metrics$.resid 0.961 0.136
Use Levenes to check variance of residuals is equal for all groups
model_metrics %>%
levene_test(.resid ~ group)
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 2 42 0.155 0.857
Standardised residuals greater than 3 in absolute value might indicate outlier
model_metrics %>%
mutate(abs_std_resid = abs(.std.resid)) %>%
filter(abs_std_resid > 3)
## # A tibble: 1 × 7
## post pre group .resid .cooksd .std.resid abs_std_resid
## <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 17.3 19.4 grp1 -1.49 0.382 -3.31 3.31