Example #1: ANCOVA with an experimental design and pretest as covariate
Load in some helpful packages
library(tidyverse)
library(haven)
Load in the dataset
write_ancova <- read_dta("Writing_Tech.dta")
Explore your data
glimpse(write_ancova)
Rows: 30
Columns: 3
$ Treatment <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3,...
$ Cog_Engage <dbl> 3, 2, 5, 2, 2, 2, 7, 2, 4, 7, 5, 3, 4, 4, 7, 5, 4, 9, 2, ...
$ Pre_Engage <dbl> 4, 1, 5, 1, 2, 2, 7, 4, 5, 5, 3, 1, 2, 2, 6, 4, 2, 1, 3, ...
Clean your data
write_ancova.clean <- write_ancova %>%
mutate(.,
treat.fac = as_factor(Treatment),
cog.fac = as_factor(Cog_Engage),
pre.fac = as_factor(Pre_Engage))
Check out descriptive statistics for the treatment factor and continuous variables
summary(write_ancova.clean)
Treatment Cog_Engage Pre_Engage treat.fac cog.fac
Min. :1.000 Min. :2.000 Min. :0.000 Paper/ Pencil: 9 4 :8
1st Qu.:1.000 1st Qu.:3.000 1st Qu.:1.000 Laptop : 8 2 :7
Median :2.000 Median :4.000 Median :2.500 Tablet :13 5 :4
Mean :2.133 Mean :4.367 Mean :2.733 3 :3
3rd Qu.:3.000 3rd Qu.:5.750 3rd Qu.:4.000 6 :3
Max. :3.000 Max. :9.000 Max. :7.000 7 :3
(Other):2
pre.fac
1 :6
2 :6
3 :5
4 :4
5 :4
0 :3
(Other):2
Correlate pre and post test results- how similar are they?
Hopefully, somewhat similar! If not, there may not be systematic change over time. The cor.test function in base R works well for simple correlations.
cor.test(write_ancova.clean$Cog_Engage, write_ancova.clean$Pre_Engage)
Pearson's product-moment correlation
data: write_ancova.clean$Cog_Engage and write_ancova.clean$Pre_Engage
t = 1.345, df = 28, p-value = 0.1894
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.1250150 0.5571688
sample estimates:
cor
0.2463496
Run a basic ANCOVA
Here, we have hap_post as the outcome, treat as the factor, and hap_pre as the covariate. Since we cleaned our data at the beginning, R recognizes that treat is a factor and hap_pre is continuous so it will run as an ANCOVA and not a two-way ANOVA.
ancova1<-aov(Cog_Engage ~ treat.fac + Pre_Engage, data = write_ancova.clean)
summary(ancova1)
Df Sum Sq Mean Sq F value Pr(>F)
treat.fac 2 16.84 8.422 2.770 0.0812 .
Pre_Engage 1 15.08 15.076 4.959 0.0348 *
Residuals 26 79.05 3.040
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If you had more than 2 groups, you could do post-hoc comparisons:
TukeyHSD(ancova1)
non-factors ignored: Pre_EngageError in rep.int(n, length(means)) : unimplemented type 'NULL' in 'rep3'
Get measures of effect size:
library(effectsize)
package 㤼㸱effectsize㤼㸲 was built under R version 4.0.3
eta_squared(ancova1)
Parameter | Eta2 (partial) | 90% CI
------------------------------------------
treat.fac | 0.18 | [0.00, 0.37]
Pre_Engage | 0.16 | [0.01, 0.37]
omega_squared(ancova1)
Parameter | Omega2 (partial) | 90% CI
--------------------------------------------
treat.fac | 0.11 | [0.00, 0.28]
Pre_Engage | 0.12 | [0.00, 0.32]
What if we didn’t have the pretest score?
Now, see the difference without the covariate- just an ANOVA for treat:
anova1<-aov(Cog_Engage ~ treat.fac, write_ancova.clean)
summary(anova1)
Df Sum Sq Mean Sq F value Pr(>F)
treat.fac 2 16.84 8.422 2.416 0.108
Residuals 27 94.12 3.486
And, of course- how to run this as a multiple regression!
regression1 <- lm(hap_post ~ treat.fac + hap_pre, data = hap_ancova.clean)
summary(regression1)
Example #3: Repeated Measures ANOVA
Reshaping data from wide to long
Hint for this week’s content review!
sleep <- read_dta("sleep_wide.dta")
glimpse(sleep)
summary(sleep)
Using pivot_longer to reshape from wide to long
sleep.long <- sleep %>%
pivot_longer(.,
cols = starts_with("sleep"),
names_to = "month",
values_to = "sleep",
names_prefix = "sleep",
)
glimpse(sleep.long)
Reshape back to wide using pivot_wider
sleep.wide <- sleep.long %>%
pivot_wider(.,
id_cols = c("id"),
names_from = month,
values_from = sleep,
names_prefix = "sleep",
)
glimpse(sleep.wide)
Density Plot for Month 1
kdensity1 <- ggplot(sleep, aes(x=sleep1)) +
geom_density() +
stat_function(fun = dnorm, n = 100, args = list(mean = 350, sd = 30), color = "red", linetype = "dashed") +
labs(title="Distribution of Sleep in Month 1",x="Sleep (minutes per night)", y = "Density", caption = "N = 100 participants. The red line indicates a normal distribution. The black lines represents the observed distribution.")
kdensity1
Density Plot for Month 2
kdensity2 <- ggplot(sleep, aes(x=sleep2)) +
geom_density() +
stat_function(fun = dnorm, n = 100, args = list(mean = 370, sd = 30), color = "red", linetype = "dashed") +
labs(title="Distribution of Sleep in Month 2",x="Sleep (minutes per night)", y = "Density", caption = "N = 100 participants. The red line indicates a normal distribution. The black lines represents the observed distribution.")
kdensity2
Density Plot for Month 3
kdensity3 <- ggplot(sleep, aes(x=sleep3)) +
geom_density() +
stat_function(fun = dnorm, n = 100, args = list(mean = 370, sd = 30), color = "red", linetype = "dashed") +
labs(title="Distribution of Sleep in Month 3",x="Sleep (minutes per night)", y = "Density", caption = "N = 100 participants. The red line indicates a normal distribution. The black lines represents the observed distribution.")
kdensity3
Summary Statistics of Sleep by Month
library(rstatix)
library(ggpubr)
sleep.long %>%
group_by(month) %>%
get_summary_stats(sleep, type = "mean_sd")
Code for basic repeated measures ANOVA
Note:id is participant id in this case.
sleep.aov <- anova_test(data = sleep.long, dv = sleep, wid = id, within = month)
summary(sleep.aov)
Here is the basic ANOVA table:
get_anova_table(sleep.aov)
Here are the corrected p-values with the sphericity adjustments:
sleep.aov$`Sphericity Corrections`
Post-hoc comparisons by month, using Tukey’s method:
pairwise_t_test(
formula = sleep ~ month, paired = TRUE,
p.adjust.method = "bonferroni",
data = sleep.long
)
---
title: "MVS Module 4: Advanced ANOVA"
author: "Jake Reynolds"
output: html_notebook
---

# Example #1: ANCOVA with an experimental design and pretest as covariate
## Load in some helpful packages
```{r}
library(tidyverse)
library(haven)
```

## Load in the dataset
```{r}
write_ancova <- read_dta("Writing_Tech.dta")
```

## Explore your data
```{r}
glimpse(write_ancova)
```
## Clean your data
```{r}
write_ancova.clean <- write_ancova %>%
mutate(.,
  treat.fac = as_factor(Treatment),
  cog.fac = as_factor(Cog_Engage),
  pre.fac = as_factor(Pre_Engage))
```

## Check out descriptive statistics for the treatment factor and continuous variables 
```{r}
summary(write_ancova.clean)
```

## Correlate pre and post test results- how similar are they? 
Hopefully, somewhat similar! If not, there may not be systematic change over time. The `cor.test` function in base `R` works well for simple correlations.
```{r}
cor.test(write_ancova.clean$Cog_Engage, write_ancova.clean$Pre_Engage)
```

## Run a basic ANCOVA 
Here, we have `hap_post` as the outcome, `treat` as the factor, and `hap_pre` as the covariate. Since we cleaned our data at the beginning, `R` recognizes that `treat` is a factor and `hap_pre` is continuous so it will run as an ANCOVA and not a two-way ANOVA.
```{r}
ancova1<-aov(Cog_Engage ~ treat.fac + Pre_Engage, data = write_ancova.clean)
summary(ancova1)
```

## If you had more than 2 groups, you could do post-hoc comparisons:
```{r}
TukeyHSD(ancova1)
```

## Get measures of effect size:
```{r}
library(effectsize)
eta_squared(ancova1)
omega_squared(ancova1)
```

## What if we didn't have the pretest score?
Now, see the difference without the covariate- just an ANOVA for `treat`:
```{r}
anova1<-aov(Cog_Engage ~ treat.fac, write_ancova.clean)
summary(anova1)
```

## And, of course- how to run this as a multiple regression!
```{r}
regression1 <- lm(hap_post ~ treat.fac + hap_pre, data = hap_ancova.clean)
summary(regression1)
```

# Example #2: ANCOVA with experimental design and related covariates (not pretest)
It might be better for external validity to use related measures as covariates, instead of a pretest.
## Load in the dataset:
```{r}
hap_ancova2 <- read_dta("hap-ancova2.dta")
```

## Explore your data:
```{r}
glimpse(hap_ancova2)
```

## Clean your data:
```{r}
hap_ancova2.clean <- hap_ancova2 %>%
  mutate(.,
         treat.fac = as_factor(treat))
```

## Explore your data
```{r}
summary(hap_ancova2.clean)
```

## How much variance does our set of covariates explain?
```{r}
regression2 <- lm(hap ~ satfam + health + socfriend, data = hap_ancova2.clean)
summary(regression2)
```

## Now, plug them into the ANCOVA model:
```{r}
ancova2 <- aov(hap ~ treat.fac + satfam + health + socfriend, data = hap_ancova2.clean)
summary(ancova2)
```

## Regression model with the variable `treat.fac`
```{r}
regression3 <- lm(hap ~ treat.fac + satfam + health + socfriend, data = hap_ancova2.clean)
summary(regression3)
```

# Example #3: Repeated Measures ANOVA
## Reshaping data from wide to long 
Hint for this week's content review!
```{r}
sleep <- read_dta("sleep_wide.dta")
```

```{r}
glimpse(sleep)
```

```{r}
summary(sleep)
```

## Using `pivot_longer` to reshape from *wide* to *long*
```{r}
sleep.long <- sleep %>%
pivot_longer(.,
              cols = starts_with("sleep"),
              names_to = "month",
              values_to = "sleep",
              names_prefix = "sleep",
              )
glimpse(sleep.long)
```

## Reshape back to wide using `pivot_wider`
```{r}
sleep.wide <- sleep.long %>%
  pivot_wider(.,
              id_cols = c("id"),
              names_from = month,
              values_from = sleep,
              names_prefix = "sleep",
              )

glimpse(sleep.wide)
```

## Density Plot for Month 1
```{r}
kdensity1 <- ggplot(sleep, aes(x=sleep1)) + 
  geom_density() + 
  stat_function(fun = dnorm, n = 100, args = list(mean = 350, sd = 30), color = "red", linetype = "dashed") +
  labs(title="Distribution of Sleep in Month 1",x="Sleep (minutes per night)", y = "Density", caption = "N = 100 participants. The red line indicates a normal distribution. The black lines represents the observed distribution.")
kdensity1
```
## Density Plot for Month 2
```{r}
kdensity2 <- ggplot(sleep, aes(x=sleep2)) + 
  geom_density() + 
  stat_function(fun = dnorm, n = 100, args = list(mean = 370, sd = 30), color = "red", linetype = "dashed") +
  labs(title="Distribution of Sleep in Month 2",x="Sleep (minutes per night)", y = "Density", caption = "N = 100 participants. The red line indicates a normal distribution. The black lines represents the observed distribution.")
kdensity2
```

## Density Plot for Month 3
```{r}
kdensity3 <- ggplot(sleep, aes(x=sleep3)) + 
  geom_density() + 
  stat_function(fun = dnorm, n = 100, args = list(mean = 370, sd = 30), color = "red", linetype = "dashed") +
  labs(title="Distribution of Sleep in Month 3",x="Sleep (minutes per night)", y = "Density", caption = "N = 100 participants. The red line indicates a normal distribution. The black lines represents the observed distribution.")
kdensity3
```

## Using data in *long* form, we can facet by month to include all three graphs in one table
```{r}
all_sleep<-ggplot(sleep.long, aes(x=sleep))+
  geom_density()+facet_wrap(month ~ ., ncol = 1)
all_sleep
```

## Summary Statistics of Sleep by Month
```{r}
library(rstatix)
library(ggpubr)
sleep.long %>%
  group_by(month) %>%
  get_summary_stats(sleep, type = "mean_sd")
```

## Code for basic repeated measures ANOVA 
Note:`id` is participant id in this case.
```{r}
sleep.aov <- anova_test(data = sleep.long, dv = sleep, wid = id, within = month)
summary(sleep.aov)
```
Here is the basic ANOVA table:
```{r}
get_anova_table(sleep.aov)
```
Here are the corrected p-values with the sphericity adjustments:
```{r}
sleep.aov$`Sphericity Corrections`
```

## Post-hoc comparisons by month, using Tukey's method:
```{r}
pairwise_t_test(
    formula = sleep ~ month, paired = TRUE,
    p.adjust.method = "bonferroni",
    data = sleep.long
    )
```

# Check for sphericity and multivariate normality
###  Start with univariate normality, plots and statistics
```{r}
library(MVN)
sleep_univariate <- mvn(data = sleep.wide, mvnTest = NULL, univariatePlot = "qqplot")

sleep_univariate2 <- mvn(data = sleep.wide, mvnTest = NULL, univariateTest = "SW", desc = TRUE)
sleep_univariate2$univariateNormality
```
### Tests for MVN
#### Doornik-Hansen Test 
The last column indicates whether dataset follows a multivariate normality or not (i.e, YES or NO) at significance level 0.05.
```{r}
sleep_mvn <- mvn(data = sleep.wide, mvnTest = "dh")
sleep_mvn$multivariateNormality
```

#### Mardia's MVN Test 
This function performs multivariate skewness and kurtosis tests at the same time and combines test results for multivariate normality. If both tests indicate multivariate normality, then data follows a multivariate normal distribution at the 0.05 significance level.
```{r}
sleep_mvn2 <- mvn(data = sleep.wide, mvnTest = "mardia")
sleep_mvn2$multivariateNormality
```
## Check the sphericity assumption
```{r}
sleep.aov$`Mauchly's Test for Sphericity`
```
