Example #1: ANCOVA with an experimental design and pretest as covariate

Load in some helpful packages

library(tidyverse)
library(haven)

Load in the dataset

write_ancova <- read_dta("Writing_Tech.dta")

Explore your data

glimpse(write_ancova)
Rows: 30
Columns: 3
$ Treatment  <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3,...
$ Cog_Engage <dbl> 3, 2, 5, 2, 2, 2, 7, 2, 4, 7, 5, 3, 4, 4, 7, 5, 4, 9, 2, ...
$ Pre_Engage <dbl> 4, 1, 5, 1, 2, 2, 7, 4, 5, 5, 3, 1, 2, 2, 6, 4, 2, 1, 3, ...

Clean your data

write_ancova.clean <- write_ancova %>%
mutate(.,
  treat.fac = as_factor(Treatment),
  cog.fac = as_factor(Cog_Engage),
  pre.fac = as_factor(Pre_Engage))

Check out descriptive statistics for the treatment factor and continuous variables

summary(write_ancova.clean)
   Treatment       Cog_Engage      Pre_Engage            treat.fac     cog.fac 
 Min.   :1.000   Min.   :2.000   Min.   :0.000   Paper/ Pencil: 9   4      :8  
 1st Qu.:1.000   1st Qu.:3.000   1st Qu.:1.000   Laptop       : 8   2      :7  
 Median :2.000   Median :4.000   Median :2.500   Tablet       :13   5      :4  
 Mean   :2.133   Mean   :4.367   Mean   :2.733                      3      :3  
 3rd Qu.:3.000   3rd Qu.:5.750   3rd Qu.:4.000                      6      :3  
 Max.   :3.000   Max.   :9.000   Max.   :7.000                      7      :3  
                                                                    (Other):2  
    pre.fac 
 1      :6  
 2      :6  
 3      :5  
 4      :4  
 5      :4  
 0      :3  
 (Other):2  

Correlate pre and post test results- how similar are they?

Hopefully, somewhat similar! If not, there may not be systematic change over time. The cor.test function in base R works well for simple correlations.

cor.test(write_ancova.clean$Cog_Engage, write_ancova.clean$Pre_Engage)

    Pearson's product-moment correlation

data:  write_ancova.clean$Cog_Engage and write_ancova.clean$Pre_Engage
t = 1.345, df = 28, p-value = 0.1894
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.1250150  0.5571688
sample estimates:
      cor 
0.2463496 

Run a basic ANCOVA

Here, we have hap_post as the outcome, treat as the factor, and hap_pre as the covariate. Since we cleaned our data at the beginning, R recognizes that treat is a factor and hap_pre is continuous so it will run as an ANCOVA and not a two-way ANOVA.

ancova1<-aov(Cog_Engage ~ treat.fac + Pre_Engage, data = write_ancova.clean)
summary(ancova1)
            Df Sum Sq Mean Sq F value Pr(>F)  
treat.fac    2  16.84   8.422   2.770 0.0812 .
Pre_Engage   1  15.08  15.076   4.959 0.0348 *
Residuals   26  79.05   3.040                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

If you had more than 2 groups, you could do post-hoc comparisons:

TukeyHSD(ancova1)
non-factors ignored: Pre_EngageError in rep.int(n, length(means)) : unimplemented type 'NULL' in 'rep3'

Get measures of effect size:

library(effectsize)
package 㤼㸱effectsize㤼㸲 was built under R version 4.0.3
eta_squared(ancova1)
Parameter  | Eta2 (partial) |       90% CI
------------------------------------------
treat.fac  |           0.18 | [0.00, 0.37]
Pre_Engage |           0.16 | [0.01, 0.37]
omega_squared(ancova1)
Parameter  | Omega2 (partial) |       90% CI
--------------------------------------------
treat.fac  |             0.11 | [0.00, 0.28]
Pre_Engage |             0.12 | [0.00, 0.32]

What if we didn’t have the pretest score?

Now, see the difference without the covariate- just an ANOVA for treat:

anova1<-aov(Cog_Engage ~ treat.fac, write_ancova.clean)
summary(anova1)
            Df Sum Sq Mean Sq F value Pr(>F)
treat.fac    2  16.84   8.422   2.416  0.108
Residuals   27  94.12   3.486               

And, of course- how to run this as a multiple regression!

regression1 <- lm(hap_post ~ treat.fac + hap_pre, data = hap_ancova.clean)
summary(regression1)

Example #3: Repeated Measures ANOVA

Reshaping data from wide to long

Hint for this week’s content review!

sleep <- read_dta("sleep_wide.dta")
glimpse(sleep)
summary(sleep)

Using pivot_longer to reshape from wide to long

sleep.long <- sleep %>%
pivot_longer(.,
              cols = starts_with("sleep"),
              names_to = "month",
              values_to = "sleep",
              names_prefix = "sleep",
              )
glimpse(sleep.long)

Reshape back to wide using pivot_wider

sleep.wide <- sleep.long %>%
  pivot_wider(.,
              id_cols = c("id"),
              names_from = month,
              values_from = sleep,
              names_prefix = "sleep",
              )

glimpse(sleep.wide)

Density Plot for Month 1

kdensity1 <- ggplot(sleep, aes(x=sleep1)) + 
  geom_density() + 
  stat_function(fun = dnorm, n = 100, args = list(mean = 350, sd = 30), color = "red", linetype = "dashed") +
  labs(title="Distribution of Sleep in Month 1",x="Sleep (minutes per night)", y = "Density", caption = "N = 100 participants. The red line indicates a normal distribution. The black lines represents the observed distribution.")
kdensity1

Density Plot for Month 2

kdensity2 <- ggplot(sleep, aes(x=sleep2)) + 
  geom_density() + 
  stat_function(fun = dnorm, n = 100, args = list(mean = 370, sd = 30), color = "red", linetype = "dashed") +
  labs(title="Distribution of Sleep in Month 2",x="Sleep (minutes per night)", y = "Density", caption = "N = 100 participants. The red line indicates a normal distribution. The black lines represents the observed distribution.")
kdensity2

Density Plot for Month 3

kdensity3 <- ggplot(sleep, aes(x=sleep3)) + 
  geom_density() + 
  stat_function(fun = dnorm, n = 100, args = list(mean = 370, sd = 30), color = "red", linetype = "dashed") +
  labs(title="Distribution of Sleep in Month 3",x="Sleep (minutes per night)", y = "Density", caption = "N = 100 participants. The red line indicates a normal distribution. The black lines represents the observed distribution.")
kdensity3

Using data in long form, we can facet by month to include all three graphs in one table

all_sleep<-ggplot(sleep.long, aes(x=sleep))+
  geom_density()+facet_wrap(month ~ ., ncol = 1)
all_sleep

Summary Statistics of Sleep by Month

library(rstatix)
library(ggpubr)
sleep.long %>%
  group_by(month) %>%
  get_summary_stats(sleep, type = "mean_sd")

Code for basic repeated measures ANOVA

Note:id is participant id in this case.

sleep.aov <- anova_test(data = sleep.long, dv = sleep, wid = id, within = month)
summary(sleep.aov)

Here is the basic ANOVA table:

get_anova_table(sleep.aov)

Here are the corrected p-values with the sphericity adjustments:

sleep.aov$`Sphericity Corrections`

Post-hoc comparisons by month, using Tukey’s method:

pairwise_t_test(
    formula = sleep ~ month, paired = TRUE,
    p.adjust.method = "bonferroni",
    data = sleep.long
    )

Check for sphericity and multivariate normality

Start with univariate normality, plots and statistics

library(MVN)
sleep_univariate <- mvn(data = sleep.wide, mvnTest = NULL, univariatePlot = "qqplot")

sleep_univariate2 <- mvn(data = sleep.wide, mvnTest = NULL, univariateTest = "SW", desc = TRUE)
sleep_univariate2$univariateNormality

Tests for MVN

Doornik-Hansen Test

The last column indicates whether dataset follows a multivariate normality or not (i.e, YES or NO) at significance level 0.05.

sleep_mvn <- mvn(data = sleep.wide, mvnTest = "dh")
sleep_mvn$multivariateNormality

Mardia’s MVN Test

This function performs multivariate skewness and kurtosis tests at the same time and combines test results for multivariate normality. If both tests indicate multivariate normality, then data follows a multivariate normal distribution at the 0.05 significance level.

sleep_mvn2 <- mvn(data = sleep.wide, mvnTest = "mardia")
sleep_mvn2$multivariateNormality

Check the sphericity assumption

sleep.aov$`Mauchly's Test for Sphericity`
---
title: "MVS Module 4: Advanced ANOVA"
author: "Jake Reynolds"
output: html_notebook
---

# Example #1: ANCOVA with an experimental design and pretest as covariate
## Load in some helpful packages
```{r}
library(tidyverse)
library(haven)
```

## Load in the dataset
```{r}
write_ancova <- read_dta("Writing_Tech.dta")
```

## Explore your data
```{r}
glimpse(write_ancova)
```
## Clean your data
```{r}
write_ancova.clean <- write_ancova %>%
mutate(.,
  treat.fac = as_factor(Treatment),
  cog.fac = as_factor(Cog_Engage),
  pre.fac = as_factor(Pre_Engage))
```

## Check out descriptive statistics for the treatment factor and continuous variables 
```{r}
summary(write_ancova.clean)
```

## Correlate pre and post test results- how similar are they? 
Hopefully, somewhat similar! If not, there may not be systematic change over time. The `cor.test` function in base `R` works well for simple correlations.
```{r}
cor.test(write_ancova.clean$Cog_Engage, write_ancova.clean$Pre_Engage)
```

## Run a basic ANCOVA 
Here, we have `hap_post` as the outcome, `treat` as the factor, and `hap_pre` as the covariate. Since we cleaned our data at the beginning, `R` recognizes that `treat` is a factor and `hap_pre` is continuous so it will run as an ANCOVA and not a two-way ANOVA.
```{r}
ancova1<-aov(Cog_Engage ~ treat.fac + Pre_Engage, data = write_ancova.clean)
summary(ancova1)
```

## If you had more than 2 groups, you could do post-hoc comparisons:
```{r}
TukeyHSD(ancova1)
```

## Get measures of effect size:
```{r}
library(effectsize)
eta_squared(ancova1)
omega_squared(ancova1)
```

## What if we didn't have the pretest score?
Now, see the difference without the covariate- just an ANOVA for `treat`:
```{r}
anova1<-aov(Cog_Engage ~ treat.fac, write_ancova.clean)
summary(anova1)
```

## And, of course- how to run this as a multiple regression!
```{r}
regression1 <- lm(hap_post ~ treat.fac + hap_pre, data = hap_ancova.clean)
summary(regression1)
```

# Example #2: ANCOVA with experimental design and related covariates (not pretest)
It might be better for external validity to use related measures as covariates, instead of a pretest.
## Load in the dataset:
```{r}
hap_ancova2 <- read_dta("hap-ancova2.dta")
```

## Explore your data:
```{r}
glimpse(hap_ancova2)
```

## Clean your data:
```{r}
hap_ancova2.clean <- hap_ancova2 %>%
  mutate(.,
         treat.fac = as_factor(treat))
```

## Explore your data
```{r}
summary(hap_ancova2.clean)
```

## How much variance does our set of covariates explain?
```{r}
regression2 <- lm(hap ~ satfam + health + socfriend, data = hap_ancova2.clean)
summary(regression2)
```

## Now, plug them into the ANCOVA model:
```{r}
ancova2 <- aov(hap ~ treat.fac + satfam + health + socfriend, data = hap_ancova2.clean)
summary(ancova2)
```

## Regression model with the variable `treat.fac`
```{r}
regression3 <- lm(hap ~ treat.fac + satfam + health + socfriend, data = hap_ancova2.clean)
summary(regression3)
```

# Example #3: Repeated Measures ANOVA
## Reshaping data from wide to long 
Hint for this week's content review!
```{r}
sleep <- read_dta("sleep_wide.dta")
```

```{r}
glimpse(sleep)
```

```{r}
summary(sleep)
```

## Using `pivot_longer` to reshape from *wide* to *long*
```{r}
sleep.long <- sleep %>%
pivot_longer(.,
              cols = starts_with("sleep"),
              names_to = "month",
              values_to = "sleep",
              names_prefix = "sleep",
              )
glimpse(sleep.long)
```

## Reshape back to wide using `pivot_wider`
```{r}
sleep.wide <- sleep.long %>%
  pivot_wider(.,
              id_cols = c("id"),
              names_from = month,
              values_from = sleep,
              names_prefix = "sleep",
              )

glimpse(sleep.wide)
```

## Density Plot for Month 1
```{r}
kdensity1 <- ggplot(sleep, aes(x=sleep1)) + 
  geom_density() + 
  stat_function(fun = dnorm, n = 100, args = list(mean = 350, sd = 30), color = "red", linetype = "dashed") +
  labs(title="Distribution of Sleep in Month 1",x="Sleep (minutes per night)", y = "Density", caption = "N = 100 participants. The red line indicates a normal distribution. The black lines represents the observed distribution.")
kdensity1
```
## Density Plot for Month 2
```{r}
kdensity2 <- ggplot(sleep, aes(x=sleep2)) + 
  geom_density() + 
  stat_function(fun = dnorm, n = 100, args = list(mean = 370, sd = 30), color = "red", linetype = "dashed") +
  labs(title="Distribution of Sleep in Month 2",x="Sleep (minutes per night)", y = "Density", caption = "N = 100 participants. The red line indicates a normal distribution. The black lines represents the observed distribution.")
kdensity2
```

## Density Plot for Month 3
```{r}
kdensity3 <- ggplot(sleep, aes(x=sleep3)) + 
  geom_density() + 
  stat_function(fun = dnorm, n = 100, args = list(mean = 370, sd = 30), color = "red", linetype = "dashed") +
  labs(title="Distribution of Sleep in Month 3",x="Sleep (minutes per night)", y = "Density", caption = "N = 100 participants. The red line indicates a normal distribution. The black lines represents the observed distribution.")
kdensity3
```

## Using data in *long* form, we can facet by month to include all three graphs in one table
```{r}
all_sleep<-ggplot(sleep.long, aes(x=sleep))+
  geom_density()+facet_wrap(month ~ ., ncol = 1)
all_sleep
```

## Summary Statistics of Sleep by Month
```{r}
library(rstatix)
library(ggpubr)
sleep.long %>%
  group_by(month) %>%
  get_summary_stats(sleep, type = "mean_sd")
```

## Code for basic repeated measures ANOVA 
Note:`id` is participant id in this case.
```{r}
sleep.aov <- anova_test(data = sleep.long, dv = sleep, wid = id, within = month)
summary(sleep.aov)
```
Here is the basic ANOVA table:
```{r}
get_anova_table(sleep.aov)
```
Here are the corrected p-values with the sphericity adjustments:
```{r}
sleep.aov$`Sphericity Corrections`
```

## Post-hoc comparisons by month, using Tukey's method:
```{r}
pairwise_t_test(
    formula = sleep ~ month, paired = TRUE,
    p.adjust.method = "bonferroni",
    data = sleep.long
    )
```

# Check for sphericity and multivariate normality
###  Start with univariate normality, plots and statistics
```{r}
library(MVN)
sleep_univariate <- mvn(data = sleep.wide, mvnTest = NULL, univariatePlot = "qqplot")

sleep_univariate2 <- mvn(data = sleep.wide, mvnTest = NULL, univariateTest = "SW", desc = TRUE)
sleep_univariate2$univariateNormality
```
### Tests for MVN
#### Doornik-Hansen Test 
The last column indicates whether dataset follows a multivariate normality or not (i.e, YES or NO) at significance level 0.05.
```{r}
sleep_mvn <- mvn(data = sleep.wide, mvnTest = "dh")
sleep_mvn$multivariateNormality
```

#### Mardia's MVN Test 
This function performs multivariate skewness and kurtosis tests at the same time and combines test results for multivariate normality. If both tests indicate multivariate normality, then data follows a multivariate normal distribution at the 0.05 significance level.
```{r}
sleep_mvn2 <- mvn(data = sleep.wide, mvnTest = "mardia")
sleep_mvn2$multivariateNormality
```
## Check the sphericity assumption
```{r}
sleep.aov$`Mauchly's Test for Sphericity`
```
