Useful resource

https://towardsdatascience.com/doing-and-reporting-your-first-anova-and-ancova-in-r-1d820940f2ef

library(tidyverse)
library(emmeans)
library(car)
library(afex)
library(rstatix)
library(datarium)
library(report)
library(kableExtra)

options(scipen = 999) # no scientific notation

The anxiety dataset (from datarium) has a between subjects factor for exercise group and a within subjects factor for time of testing. The DV is anxiety score.

group (1 = low, 2 = moderate, 3 = high)
time (t1 = pre, t2 = mid, t3 = post)

anxiety <- anxiety 

glimpse(anxiety)

## Rows: 45
## Columns: 5
## $ id    <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1…
## $ group <fct> grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1…
## $ t1    <dbl> 14.1, 14.5, 15.7, 16.0, 16.5, 16.9, 17.0, 17.0, 17.3, 17.3, 17.8…
## $ t2    <dbl> 14.4, 14.6, 15.2, 15.5, 15.8, 16.5, 16.8, 17.1, 16.9, 17.1, 17.7…
## $ t3    <dbl> 14.1, 14.3, 14.9, 15.3, 15.7, 16.2, 16.5, 16.6, 16.5, 16.7, 17.3…

Option 1 `car` package

I am interested in whether there is a main effect of group on post test anxiety, controlling for pre test anxiety.

First selecting just the pre and post test. Checking data types.

anx <- anxiety %>%
  select(id, group, pre = t1, post = t3)

glimpse(anx)

## Rows: 45
## Columns: 4
## $ id    <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1…
## $ group <fct> grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1, grp1…
## $ pre   <dbl> 14.1, 14.5, 15.7, 16.0, 16.5, 16.9, 17.0, 17.0, 17.3, 17.3, 17.8…
## $ post  <dbl> 14.1, 14.3, 14.9, 15.3, 15.7, 16.2, 16.5, 16.6, 16.5, 16.7, 17.3…

Here I am using the aov() function along with summary() to get p values, and emmeans to get contrasts.

There is a main effect of group on post test anxiety, but we want to control for potential differences at baseline.

test1 <- aov(post ~ group, data = anx)

summary(test1)

##             Df Sum Sq Mean Sq F value    Pr(>F)    
## group        2  67.56   33.78    13.8 0.0000248 ***
## Residuals   42 102.83    2.45                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

emmeans(test1, pairwise ~ group)

## $emmeans
##  group emmean    SE df lower.CL upper.CL
##  grp1    16.5 0.404 42     15.7     17.3
##  grp2    15.5 0.404 42     14.7     16.3
##  grp3    13.6 0.404 42     12.7     14.4
## 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast    estimate    SE df t.ratio p.value
##  grp1 - grp2     0.98 0.571 42   1.715  0.2115
##  grp1 - grp3     2.95 0.571 42   5.157  <.0001
##  grp2 - grp3     1.97 0.571 42   3.442  0.0037
## 
## P value adjustment: tukey method for comparing a family of 3 estimates

We can use the same aov () function and add the additional covariates, but we need to use car::Anova() rather than stats::summary(). By default base R and the summary() function uses Type I errors. That is fine for simple anova, but when there is more than 1 variable, it is important to use Type III errors, so we need to use Anova() from car.

# adding covariates before the factor we care about

test2 <- aov(post ~ pre + group, data = anx) 
 Anova(test2, type = "III")

## Anova Table (Type III tests)
## 
## Response: post
##             Sum Sq Df  F value              Pr(>F)    
## (Intercept)  0.040  1   0.1739              0.6789    
## pre         93.366  1 404.2903 <0.0000000000000002 ***
## group       69.865  2 151.2636 <0.0000000000000002 ***
## Residuals    9.468 41                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we would say that there were significant effects of pre test anx on post test anx, and that there was an effect of group on post test, even after controlling for pre test.

Note that running summary() which uses Type I errors gives different (incorrect) results.

summary(test2)

##             Df Sum Sq Mean Sq F value              Pr(>F)    
## pre          1  91.06   91.06   394.3 <0.0000000000000002 ***
## group        2  69.87   34.93   151.3 <0.0000000000000002 ***
## Residuals   41   9.47    0.23                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

One blog I found also suggested using lm() so lets check whether that gives the same result.

test3 <- lm(post ~ pre + group , data = anx)

test3result <- Anova(test3, type= "III")



kable(test3result)

	Sum Sq	Df	F value	Pr(>F)
(Intercept)	0.0401564	1	0.1738835	0.6788586
pre	93.3661886	1	404.2902904	0.0000000
group	69.8651634	2	151.2635763	0.0000000
Residuals	9.4684780	41	NA	NA

Ahhh, so it doesn’t matter whether you use aov() or lm() initially.

Option 2 `afex` package

https://github.com/singmann/afexc https://cran.r-project.org/web/packages/afex/vignettes/afex_anova_example.html

aov_ez with covariate- need to include factorize = FALSE

# template code 

aov_ez("id", "dv", data = df, 
        within = c("w1"), 
        between = c("b1", "b2"), 
        covariate = "c1", 
        observed = "b2", "c1"), factorize = FALSE)

ez_test <- aov_ez(id = "id", "post", data = anx, 
       between = "group", 
       observed = "pre", # observed are variables that are not manipulated
        covariate = "pre", factorize = FALSE)

## Warning: Numerical variables NOT centered on 0 (matters if variable in interaction):
##    pre

## Contrasts set to contr.sum for the following variables: group

ez_test

## Anova Table (Type 3 tests)
## 
## Response: post
##   Effect    df  MSE          F  ges p.value
## 1  group 2, 41 0.23 151.26 *** .405   <.001
## 2    pre 1, 41 0.23 404.29 *** .908   <.001
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Ok for Fs and ps for ex_test are the same as for Anova() above. Seems it doesn’t matter about the order of variables.

Other options from ez, make a nice() table that is a dataframe. Use kable for nice formatting in your report. pvalues print as <.001 rather than the actual values (which has MANY zeros)

df_ez_test <- nice(ez_test) # make a dataframe

kable(df_ez_test) %>%
  kable_styling() # print a nice table

Effect	df	MSE	F	ges	p.value
group	2, 41	0.23	151.26 ***	.405	<.001
pre	1, 41	0.23	404.29 ***	.908	<.001

Option 3 `rstatix` package

Order matters, put covariate in first, no interaction. Interesting that output is Type II tests (same as TypeIII?).

test1 <- anx %>% 
  anova_test(post ~ pre + group)

get_anova_table(test1) %>%
  kable()

Effect	DFn	DFd	F	p	p<.05	ges
pre	1	41	404.290	0	*	0.908
group	2	41	151.264	0	*	0.881

Pairwise comparisons

pwc <- anx %>%
  emmeans_test(
    post ~ group, covariate = pre, 
    p.adjust.method = "bonferroni"
  )

kable(pwc)

term	.y.	group1	group2	df	statistic	p	p.adj	p.adj.signif
pre*group	post	grp1	grp2	41	3.087303	0.0036146	0.0108439	*
pre*group	post	grp1	grp3	41	16.376613	0.0000000	0.0000000	****
pre*group	post	grp2	grp3	41	13.200247	0.0000000	0.0000000	****

get_emmeans(pwc) %>%
  kable()

pre	group	emmean	se	df	conf.low	conf.high	method
16.91556	grp1	16.33782	0.1243640	41	16.08667	16.58898	Emmeans test
16.91556	grp2	15.79199	0.1247799	41	15.53999	16.04399	Emmeans test
16.91556	grp3	13.46352	0.1241730	41	13.21275	13.71429	Emmeans test

Assumption testing (aov)

1. Linearity between covariate and dependent variable

Plot scatter for each group

anx %>%
  ggplot(aes(x = pre, y = post, colour = group)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~group)

## `geom_smooth()` using formula = 'y ~ x'

2. Homogeneity of regression slopes

Include interaction term between covariate and group- if interaction is not significant, slopes are equal.

int_model <- aov(post ~ group * pre, data = anx)

summary(int_model)

##             Df Sum Sq Mean Sq F value              Pr(>F)    
## group        2  67.56   33.78 145.541 <0.0000000000000002 ***
## pre          1  93.37   93.37 402.297 <0.0000000000000002 ***
## group:pre    2   0.42    0.21   0.899               0.415    
## Residuals   39   9.05    0.23                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3. Normality of residuals

Shapiro Wilk test and Q-Q plot

model <- aov(post ~ pre + group, data = anx)

#shapiro.test(model) # not sure why this doesn't work

qqnorm(residuals(model))

#qqline(residuals(model)) # also this not working

4. Homogenity of variances

Levene test non sig means equal variance

leveneTest(post ~ group, data = anx)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.3351 0.7172
##       42

5. Independence of covariate and treatment

Test with ANOVA effect of group on covariate, non sig indicates independence

covariate <- aov(pre ~ group, data = anx)

summary(covariate)

##             Df Sum Sq Mean Sq F value Pr(>F)
## group        2   1.67  0.8336   0.365  0.696
## Residuals   42  95.89  2.2831

6. No outliers

Look at standardized residuals, more than 3 indicates outlier

standardized_residuals <- rstandard(model)

plot(standardized_residuals)

Assumption testing (rstatix)

linearity

plot scatter, there is a linear relation between outcome and covariate for each group.

anx %>%
  ggplot(aes(x = pre, y = post, colour = group)) +
           geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~ group)

## `geom_smooth()` using formula = 'y ~ x'

homogeneity of regression slopes

checking no sig interaction between covariate and grouping variable

anx %>%
  anova_test(post ~ group*pre)

## ANOVA Table (type II tests)
## 
##      Effect DFn DFd       F                          p p<.05   ges
## 1     group   2  39 150.518 0.000000000000000000458000     * 0.885
## 2       pre   1  39 402.297 0.000000000000000000000377     * 0.912
## 3 group:pre   2  39   0.899 0.414999999999999980015986       0.044

normality of residuals

Fit model with covariate first, use augment() from broom to add fitted values/residuals to model_metrics. Use shapiro_test to check normality of residuals, not sig = normal.

# fit the model covariate first
model <- lm(post ~ pre + group, data = anx)

# look at model diagnostic metrics

model_metrics <- augment(model) %>%
  select(-.hat, -.sigma, -.fitted) # remove columns

head(model_metrics, 3)

## # A tibble: 3 × 6
##    post   pre group .resid .cooksd .std.resid
##   <dbl> <dbl> <fct>  <dbl>   <dbl>      <dbl>
## 1  14.1  14.1 grp1   0.540 0.0715       1.23 
## 2  14.3  14.5 grp1   0.346 0.0237       0.774
## 3  14.9  15.7 grp1  -0.238 0.00640     -0.519

# check normality of residuals with shapiro wilk test

shapiro_test(model_metrics$.resid)

## # A tibble: 1 × 3
##   variable             statistic p.value
##   <chr>                    <dbl>   <dbl>
## 1 model_metrics$.resid     0.961   0.136

homogeneity of variances

Use Levenes to check variance of residuals is equal for all groups

model_metrics %>%
  levene_test(.resid ~ group)

## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     2    42     0.155 0.857

outliers

Standardised residuals greater than 3 in absolute value might indicate outlier

model_metrics %>%
  mutate(abs_std_resid = abs(.std.resid)) %>%
filter(abs_std_resid > 3)

## # A tibble: 1 × 7
##    post   pre group .resid .cooksd .std.resid abs_std_resid
##   <dbl> <dbl> <fct>  <dbl>   <dbl>      <dbl>         <dbl>
## 1  17.3  19.4 grp1   -1.49   0.382      -3.31          3.31

ANCOVA options

2024-07-01

Option 1 `car` package

Option 2 `afex` package

Option 3 `rstatix` package

Pairwise comparisons

Assumption testing (aov)

1. Linearity between covariate and dependent variable

2. Homogeneity of regression slopes

3. Normality of residuals

4. Homogenity of variances

5. Independence of covariate and treatment

6. No outliers

Assumption testing (rstatix)

linearity

homogeneity of regression slopes

normality of residuals

homogeneity of variances

outliers

ANCOVA options

2024-07-01

Option 1 car package

Option 2 afex package

Option 3 rstatix package

Pairwise comparisons

Assumption testing (aov)

1. Linearity between covariate and dependent variable

2. Homogeneity of regression slopes

3. Normality of residuals

4. Homogenity of variances

5. Independence of covariate and treatment

6. No outliers

Assumption testing (rstatix)

linearity

homogeneity of regression slopes

normality of residuals

homogeneity of variances

outliers

Option 1 `car` package

Option 2 `afex` package

Option 3 `rstatix` package