ANOVA in R Cookbook

class: center, middle, inverse, title-slide

# ANOVA in R Cookbook
### Chris Ives

---

# Contents

* Foundations
  + Importing Data
  + Descriptives
  + Plots

* ANOVA
  + One-way ANOVA
  + Pairwise Comparisons
  + Planned Contrasts

---
## Installing the `needs` Package

- Can install packages and load them
- No longer need `install.packages()` and `library()`
- It will ask if you would like to load it every time RStudio opens. Select yes.

```r
install.packages("needs")
```

---
## Importing the Data
Normally we would use the `import()` function to import our data. However, with the SPSS `.sav` files we don't get factor labels.

```r
data_import <- import(here("Lab 2/Lab2_Vocab.sav"))
head(data_import)
```

```
##   idnum vocab instruct
## 1     1    53        1
## 2     2    49        1
## 3     3    47        1
## 4     4    42        1
## 5     5    51        1
## 6     6    34        1
```

Notice how we have 1s down the instruct column.

---
## The Fix
For now, when working with `.sav` files we will use the `read.sav` function from the `misty` package. `use.value.labels = TRUE` tells it to use the labels as the cell values.

```r
needs(misty)
l2_data <- misty::read.sav(here("Lab 2/Lab2_Vocab.sav"), 
                        use.value.labels = TRUE)
head(l2_data)
```

```
##   idnum vocab         instruct
## 1     1    53 physical science
## 2     2    49 physical science
## 3     3    47 physical science
## 4     4    42 physical science
## 5     5    51 physical science
## 6     6    34 physical science
```

---
## Checking The Assumptions of ANOVA

- Independence of observations
- Normality
- Homogeneity of variance

---
## Descriptive Statistics

Make sure the `psych` package is installed and loaded:

```r
needs(psych)
```

`describe` from the `psych` package is one of the more popular functions for descriptive statistics.

```r
describe(l2_data)
```

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["vars"],"name":[1],"type":["int"],"align":["right"]},{"label":["n"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["mean"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["sd"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["median"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["trimmed"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["mad"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["min"],"name":[8],"type":["dbl"],"align":["right"]},{"label":["max"],"name":[9],"type":["dbl"],"align":["right"]},{"label":["range"],"name":[10],"type":["dbl"],"align":["right"]},{"label":["skew"],"name":[11],"type":["dbl"],"align":["right"]},{"label":["kurtosis"],"name":[12],"type":["dbl"],"align":["right"]},{"label":["se"],"name":[13],"type":["dbl"],"align":["right"]}],"data":[{"1":"1","2":"36","3":"18.5","4":"10.5356538","5":"18.5","6":"18.5","7":"13.3434","8":"1","9":"36","10":"35","11":"0.0000000","12":"-1.3003629","13":"1.7559423","_rn_":"idnum"},{"1":"2","2":"36","3":"33.5","4":"12.8652355","5":"35.5","6":"34.2","7":"12.6021","8":"6","9":"53","10":"47","11":"-0.5359932","12":"-0.8726857","13":"2.1442059","_rn_":"vocab"},{"1":"3","2":"36","3":"2.0","4":"0.8280787","5":"2.0","6":"2.0","7":"1.4826","8":"1","9":"3","10":"2","11":"0.0000000","12":"-1.5821759","13":"0.1380131","_rn_":"instruct*"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

---
## Independence of Observations
 - Do participants cross groups?
 - We know this if we know the study design
 - In this study, each student experienced one and only one lecture

---
## Normality
- We can check the skew and kurtosis values from our `describe()` output.
- 0 ± 2 is a good rule of thumb for a tenable assumption of normality.

- Instruct is a categorical variable, so we can ignore the skew and kurtosis on that.

---
## Visual Inspection for Normality
- For the most part, plots will be wrapped using the `ggplot()` function.

### Histogram

```r
ggplot(data = l2_data, aes(x = vocab)) +
  geom_histogram()
```

- `aes()` refers to aesthetics. What are the variables we want represented in our plots? Since we just want counts of a single continuous variable, we just need to specify our `x` (i.e., `x = vocab`).

---
## Histograms

```r
ggplot(data = l2_data, aes(x = vocab)) +
  geom_histogram()
```

---
## Boxplot
- To get boxplots, we just substitute `geom_histogram()` for `geom_boxplot()` and modify our aesthetics.

```r
ggplot(data = l2_data, aes(x = instruct, y = vocab)) +
  geom_boxplot()
```

---
## Homogeneity of Variance
Homogeneity test is a separate analysis. We'll just use the `leveneTest` function for the `car` package. Formula is the same as for the ANOVA we want to run. Specify `center = "mean"` (function's default is median) to match SPSS results.

```r
needs(car)
car::leveneTest(vocab ~ instruct, data = l2_data, center = "mean")
```

```
## Levene's Test for Homogeneity of Variance (center = "mean")
##       Df F value   Pr(>F)   
## group  2  7.5054 0.002058 **
##       33                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Our significant result shows error variance around the *mean* is not equal across groups.
---
class: inverse-blue middle
# Running ANOVAs

---

Example here is from Lab 1 data.

ANOVAs will be run using the `anova_test()` function. Formula is specified as `DV ~ IV`. My convention is to number my model objects (e.g., m1, m2, etc.)

```r
needs(rstatix)
m1 <- anova_test(data = l1_data, formula = vocab ~ instruct, detailed = TRUE)
```

```
## Coefficient covariances computed by hccm()
```

```r
m1
```

```
## ANOVA Table (type II tests)
## 
##     Effect  SSn  SSd DFn DFd     F     p p<.05   ges
## 1 instruct 1176 3824   1  22 6.766 0.016     * 0.235
```

---
## Effect Sizes
Effect sizes can be specified using `effect.size = ____` in the `anova.test()` function. Use `effect.size = "ges"` for generalized eta squared or `effect.size = "pes"` for partial eta squared. Default is "`ges"`.

```r
anova_test(data = l1_data, formula = vocab ~ instruct,
           effect.size = "ges")
```

---

```r
anova_test(data = l1_data, formula = vocab ~ instruct,
           effect.size = "ges")
```

```
## Coefficient covariances computed by hccm()
```

```
## ANOVA Table (type II tests)
## 
##     Effect DFn DFd     F     p p<.05   ges
## 1 instruct   1  22 6.766 0.016     * 0.235
```

```r
anova_test(data = l1_data, formula = vocab ~ instruct,
           effect.size = "pes")
```

```
## Coefficient covariances computed by hccm()
```

```
## ANOVA Table (type II tests)
## 
##     Effect DFn DFd     F     p p<.05   pes
## 1 instruct   1  22 6.766 0.016     * 0.235
```

Notice they are the same for a one-way between subjects ANOVA.
---
## Posthoc Comparisons | Tukey
If you write out `"data = l2data"` the function will produce an error since it designed for multiple types of objects. Here we just type "l2data".

```r
rstatix::tukey_hsd(l2_data, formula = vocab ~ instruct)
```

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["group1"],"name":[2],"type":["chr"],"align":["left"]},{"label":["group2"],"name":[3],"type":["chr"],"align":["left"]},{"label":["null.value"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["estimate"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["conf.low"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["conf.high"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["p.adj"],"name":[8],"type":["dbl"],"align":["right"]},{"label":["p.adj.signif"],"name":[9],"type":["chr"],"align":["left"]}],"data":[{"1":"instruct","2":"physical science","3":"social science","4":"0","5":"-14.0","6":"-25.825984","7":"-2.174016","8":"0.0174","9":"*","_rn_":"1"},{"1":"instruct","2":"physical science","3":"history","4":"0","5":"-5.5","6":"-17.325984","7":"6.325984","8":"0.4960","9":"ns","_rn_":"2"},{"1":"instruct","2":"social science","3":"history","4":"0","5":"8.5","6":"-3.325984","7":"20.325984","8":"0.1970","9":"ns","_rn_":"3"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

---
## Posthoc Comparisons | Games-Howell
Remember, Games-Howell is used if the assumption of variance homogeneity is violated.

```r
rstatix::games_howell_test(vocab ~ instruct, data = l2_data, 
                           conf.level = 0.95, 
                           detailed = FALSE)
```

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":[".y."],"name":[1],"type":["chr"],"align":["left"]},{"label":["group1"],"name":[2],"type":["chr"],"align":["left"]},{"label":["group2"],"name":[3],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["conf.low"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["conf.high"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["p.adj"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["p.adj.signif"],"name":[8],"type":["chr"],"align":["left"]}],"data":[{"1":"vocab","2":"physical science","3":"social science","4":"-14.0","5":"-27.625565","6":"-0.3744354","7":"0.043","8":"*","_rn_":"1"},{"1":"vocab","2":"physical science","3":"history","4":"-5.5","5":"-15.459428","6":"4.4594280","7":"0.363","8":"ns","_rn_":"2"},{"1":"vocab","2":"social science","3":"history","4":"8.5","5":"-4.350238","6":"21.3502380","7":"0.235","8":"ns","_rn_":"3"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

---
## Posthoc Comparisons | Bonferroni

```r
pairwise_t_test(data = l2_data, vocab ~ instruct, 
                p.adjust.method = "bonferroni")
```

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":[".y."],"name":[1],"type":["chr"],"align":["left"]},{"label":["group1"],"name":[2],"type":["chr"],"align":["left"]},{"label":["group2"],"name":[3],"type":["chr"],"align":["left"]},{"label":["n1"],"name":[4],"type":["int"],"align":["right"]},{"label":["n2"],"name":[5],"type":["int"],"align":["right"]},{"label":["p"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["p.signif"],"name":[7],"type":["chr"],"align":["left"]},{"label":["p.adj"],"name":[8],"type":["dbl"],"align":["right"]},{"label":["p.adj.signif"],"name":[9],"type":["chr"],"align":["left"]}],"data":[{"1":"vocab","2":"physical science","3":"social science","4":"12","5":"12","6":"0.00651","7":"**","8":"0.0195","9":"*","_rn_":"1"},{"1":"vocab","2":"physical science","3":"history","4":"12","5":"12","6":"0.26200","7":"ns","8":"0.7860","9":"ns","_rn_":"2"},{"1":"vocab","2":"social science","3":"history","4":"12","5":"12","6":"0.08700","7":"ns","8":"0.2610","9":"ns","_rn_":"3"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

---
class: inverse-green middle
# Specified Contrasts
* They are not automatic in R and are a bit difficult. It is not important that you understand the reasoning behind all of the following steps.
---
## Specified Contrasts

Check the order of your levels.

```r
levels(l3_data$instruct)
```

```
## [1] "physical science" "social science"   "history"
```

Make sure they are in the order that you would like and code your contrasts accordingly. As an example, If they need to be reordered use:

```r
l3_data$instruct <- factor(l3_data$instruct, levels = c("social science", "history", "physical science"))
```
Note: We will be using the original order in the following contrasts.

---
**Step 1**: Set your contrasts. We'll do Helmert here.

```r
contrast1 <- c(1, -.5, -.5) # physical science vs others
contrast2 <- c(0, 1, -1) #social science vs history
```

**Step 2**: Bind the vectors into a temporary matrix. Constant should be equal to 1/(length of your vectors).

```r
mat.temp <- rbind(constant=1/3, contrast1, contrast2)
mat.temp
```

```
##                [,1]       [,2]       [,3]
## constant  0.3333333  0.3333333  0.3333333
## contrast1 1.0000000 -0.5000000 -0.5000000
## contrast2 0.0000000  1.0000000 -1.0000000
```

---
**Step 3**: Take the inverse of the matrix using the `solve()` function. Then we are dropping the first column with the constants.

```r
mat <- solve(mat.temp)
mat
```

```
##      constant  contrast1 contrast2
## [1,]        1  0.6666667       0.0
## [2,]        1 -0.3333333       0.5
## [3,]        1 -0.3333333      -0.5
```

```r
mat<- mat[ , -1]
mat
```

```
##       contrast1 contrast2
## [1,]  0.6666667       0.0
## [2,] -0.3333333       0.5
## [3,] -0.3333333      -0.5
```

---
**Step 4**: Run your model formula using `lm()` and set contrasts. Here we are linking our "instruct" variable with our contrast matrix. Remember:
* contrast1 = physical science vs others
* contrast2 = social science vs history

```r
m_contrasts <- lm(vocab ~ instruct, data=l3_data, contrasts = list(instruct = mat))

summary(m_contrasts)
```

```
## 
## Call:
## lm(formula = vocab ~ instruct, data = l3_data, contrasts = list(instruct = mat))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.000 -10.000   1.750   9.375  21.000 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         33.500      1.968  17.026   <2e-16 ***
## instructcontrast1    9.750      4.174   2.336   0.0257 *  
## instructcontrast2   -8.500      4.819  -1.764   0.0870 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.81 on 33 degrees of freedom
## Multiple R-squared:  0.2061,	Adjusted R-squared:  0.158 
## F-statistic: 4.284 on 2 and 33 DF,  p-value: 0.02218
```

---
## What about my ANOVA results?
To get the anova output, just run `rstatix::anova_test()` on your `m_contrasts` object. SPSS uses Type III SS by default, so I am matching it here.

```r
m4 <- anova_test(m_contrasts, 
                 type="III",
                 effect.size = "ges",
                 detailed = TRUE)
```

```
## Coefficient covariances computed by hccm()
```

```r
m4
```

```
## ANOVA Table (type tests)
## 
##        Effect   SSn  SSd DFn DFd       F        p p<.05   ges
## 1 (Intercept) 40401 4599   1  33 289.896 6.57e-18     * 0.898
## 2    instruct  1194 4599   2  33   4.284 2.20e-02     * 0.206
```