In this guide, we’ll explore Factorial ANOVA using R, a method used to examine the effect of two or more independent variables (also known as factors) on a dependent variable, particularly when these factors are categorical. It’s especially valuable when you’re interested not just in the individual effects of these factors, but also in how they interact with each other.
cones <- c(0,1,0,2,1,2,3,0,3,1,4,6,3,2,3,0,0,3,2,0,4,5,2,3,2,7,9,6,4,6)
speed <- factor(c(rep("slow",15),rep("fast",15)))
alcohol <- factor(c(rep("1",5),rep("3",5),rep("5",5),rep("1",5),rep("3",5),rep("5",5)))
alcdata <- data.frame(cones,speed,alcohol)
In aov
- and in most analyses in R - you can put in
variables and all of their interactions by connecting them with a
*
. By default, aov
uses Type I SS, so the
order you enter predictors matters. Take a look at these two
aov
functions and their outputs.
output<-aov(cones ~ speed*alcohol,data=alcdata)
summary(output)
## Df Sum Sq Mean Sq F value Pr(>F)
## speed 1 16.13 16.13 8.274 0.00831 **
## alcohol 2 85.40 42.70 21.897 3.87e-06 ***
## speed:alcohol 2 8.47 4.23 2.171 0.13595
## Residuals 24 46.80 1.95
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
output<-aov(cones ~ alcohol*speed,data=alcdata)
summary(output)
## Df Sum Sq Mean Sq F value Pr(>F)
## alcohol 2 85.40 42.70 21.897 3.87e-06 ***
## speed 1 16.13 16.13 8.274 0.00831 **
## alcohol:speed 2 8.47 4.23 2.171 0.13595
## Residuals 24 46.80 1.95
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results are exactly the same. Wait, what’s going on here? The results are exactly the same even though this is Type I SS, so the order matters? Why? Think about it for a minute. If you come up with an answer, go to the end of this document to read my answer. If you don’t come up with an answer, go through the rest of the script to see the results for the writing score example before jumping to the end for the answer.
No matter what, the interaction is considered after the main effects, so the results for the interaction are accounting for the (main) effects of alcohol and speed.
You can change the results to Type III with the Anova
function from the car
package. This might make more sense,
depending on the particular analysis and scientific context. Certainly
Type III is the most common form.
Note: capital A Anova. The lowercase anova
function is
different. There might even be an ANOVA function; I don’t know. Don’t
forget - R is case sensitive.
car::Anova(output,type="III")
## Anova Table (Type III tests)
##
## Response: cones
## Sum Sq Df F value Pr(>F)
## (Intercept) 5.000 1 2.5641 0.1224
## alcohol 73.733 2 18.9060 1.174e-05 ***
## speed 0.100 1 0.0513 0.8228
## alcohol:speed 8.467 2 2.1709 0.1360
## Residuals 46.800 24
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now with Type III SS, speed is no longer significant. That is, after
accounting for the interaction, speed is no longer significant. But
neither is the interaction. Maybe you choose to drop the interaction?
There isn’t an obviously correct answer here. Make your choices, state
why you made them, and, if appropriate, state how the conclusions are
impacted by your choices. You can also switch to Type II with the
Anova
function.
car::Anova(output,type="II")
## Anova Table (Type II tests)
##
## Response: cones
## Sum Sq Df F value Pr(>F)
## alcohol 85.400 2 21.8974 3.874e-06 ***
## speed 16.133 1 8.2735 0.008308 **
## alcohol:speed 8.467 2 2.1709 0.135954
## Residuals 46.800 24
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results here are the same as for Type I SS, for the same reason as why the Type I SS results are the same regardless of order.
The effect sizes for factorial ANOVA are the same as with one-way ANOVA
library(sjstats)
library(knitr)
library(kableExtra)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::group_rows() masks kableExtra::group_rows()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
kable(anova_stats(output)) %>%
kable_styling()
etasq | partial.etasq | omegasq | partial.omegasq | epsilonsq | cohens.f | term | sumsq | df | meansq | statistic | p.value | power | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
alcohol | 0.545 | 0.646 | 0.513 | 0.582 | 0.520 | 1.351 | alcohol | 85.400 | 2 | 42.700 | 21.897 | 0.000 | 1.000 |
speed | 0.103 | 0.256 | 0.089 | 0.195 | 0.090 | 0.587 | speed | 16.133 | 1 | 16.133 | 8.274 | 0.008 | 0.819 |
alcohol:speed | 0.054 | 0.153 | 0.029 | 0.072 | 0.029 | 0.425 | alcohol:speed | 8.467 | 2 | 4.233 | 2.171 | 0.136 | 0.443 |
1 | NA | NA | NA | NA | NA | NA | Residuals | 46.800 | 24 | 1.950 | NA | NA | NA |
Looking at \(\omega^2\), alcohol has a very large effect. Speed has a medium effect. The interaction has a small effect. This offers further justification for dropping the interaction.
You will get a warning message with post-hoc tests because they are of questionable validity when there are interactions.
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
##
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
ph<-glht(output,linfct=mcp(alcohol="Tukey"))
## Warning in mcp2matrix(model, linfct = linfct): covariate interactions found --
## default contrast might be inappropriate
summary(ph)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = cones ~ alcohol * speed, data = alcdata)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## 3 - 1 == 0 2.2000 0.8832 2.491 0.05076 .
## 5 - 1 == 0 5.4000 0.8832 6.114 < 0.001 ***
## 5 - 3 == 0 3.2000 0.8832 3.623 0.00361 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
Let’s try dropping the interaction since it was not significant and the effect size was small.
output<-aov(cones ~ alcohol + speed,data=alcdata)
summary(output)
## Df Sum Sq Mean Sq F value Pr(>F)
## alcohol 2 85.40 42.70 20.09 5.32e-06 ***
## speed 1 16.13 16.13 7.59 0.0106 *
## Residuals 26 55.27 2.13
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
But by default, aov is uses Type I SS. That doesn’t make sense here; the predictors have no natural order.
car::Anova(output,type="3")
## Anova Table (Type III tests)
##
## Response: cones
## Sum Sq Df F value Pr(>F)
## (Intercept) 20.008 1 9.4128 0.004986 **
## alcohol 85.400 2 20.0881 5.315e-06 ***
## speed 16.133 1 7.5899 0.010576 *
## Residuals 55.267 26
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Without the interaction, post-hoc tests make more sense.
ph<-glht(output,linfct=mcp(alcohol="Tukey"))
summary(ph)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = cones ~ alcohol + speed, data = alcdata)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## 3 - 1 == 0 1.600 0.652 2.454 0.05352 .
## 5 - 1 == 0 4.100 0.652 6.288 < 0.001 ***
## 5 - 3 == 0 2.500 0.652 3.834 0.00197 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
Why are we only getting post-hoc tests for alcohol? Because speed only has two levels, so the F test for speed is a pairwise comparison. Post-hoc tests don’t have anything to add.
# Load the dataset
dta_rw <- read.csv("readwrite2.csv")
# Add one more IV: gender
dta_rw$gender <- sample(c("female", "male"), nrow(dta_rw), replace = TRUE)
library(tidyverse)
library(dplyr)
# Convert all IVs to factor variables
dta_rw <- dta_rw %>%
mutate(gender = as.factor(gender),
level = as.factor(level))
library(dplyr)
# by proficiency levels
dta_rw %>%
group_by(level) %>%
summarize(count = n(),
mean = mean(writing),
sd = sd(writing))
## # A tibble: 5 × 4
## level count mean sd
## <fct> <int> <dbl> <dbl>
## 1 1 226 10.3 2.13
## 2 2 342 13.3 2.19
## 3 3 524 16.5 2.28
## 4 4 853 20.3 2.32
## 5 5 52 23.6 0.721
# by gender
dta_rw %>%
group_by(gender) %>%
summarize(count = n(),
mean = mean(writing),
sd = sd(writing))
## # A tibble: 2 × 4
## gender count mean sd
## <fct> <int> <dbl> <dbl>
## 1 female 982 17.1 4.22
## 2 male 1015 17.0 4.31
# by proficiency level & gender
dta_rw %>%
group_by(level, gender) %>%
summarize(count = n(),
mean = mean(writing),
sd = sd(writing))
## `summarise()` has grouped output by 'level'. You can override using the
## `.groups` argument.
## # A tibble: 10 × 5
## # Groups: level [5]
## level gender count mean sd
## <fct> <fct> <int> <dbl> <dbl>
## 1 1 female 100 10.2 2.20
## 2 1 male 126 10.4 2.08
## 3 2 female 169 13.3 2.13
## 4 2 male 173 13.2 2.24
## 5 3 female 262 16.5 2.29
## 6 3 male 262 16.4 2.27
## 7 4 female 419 20.2 2.33
## 8 4 male 434 20.4 2.32
## 9 5 female 32 23.6 0.712
## 10 5 male 20 23.6 0.754
A couple preliminary analyses before the factorial ANOVA. First, let’s get the sample sizes in each cell.
table(dta_rw$level,dta_rw$gender)
##
## female male
## 1 100 126
## 2 169 173
## 3 262 262
## 4 419 434
## 5 32 20
Let’s check if there is any relation between the two categorical predictors. This uses a chi-square test of independence. We will learn more about this analysis method later this semester.
chisq.test(table(dta_rw$level,dta_rw$gender))
##
## Pearson's Chi-squared test
##
## data: table(dta_rw$level, dta_rw$gender)
## X-squared = 5.5271, df = 4, p-value = 0.2374
The chi-square value (labeled X-squared) is not zero, so there is an observed relation between the predictors. However, it is not significant, which means there is no evidence that these variables are related in the population. This makes it easier to interpret the ANOVA results, because we can interpret the effect of proficiency level without worrying about gender and vice versa.
A friendly reminder, The CrossTable
function in the
gmodels
package gives more detailed information about
categorical variables.
library(gmodels)
CrossTable(dta_rw$level,dta_rw$gender,fisher=TRUE,chisq = TRUE,expected=TRUE)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 1997
##
##
## | dta_rw$gender
## dta_rw$level | female | male | Row Total |
## -------------|-----------|-----------|-----------|
## 1 | 100 | 126 | 226 |
## | 111.133 | 114.867 | |
## | 1.115 | 1.079 | |
## | 0.442 | 0.558 | 0.113 |
## | 0.102 | 0.124 | |
## | 0.050 | 0.063 | |
## -------------|-----------|-----------|-----------|
## 2 | 169 | 173 | 342 |
## | 168.174 | 173.826 | |
## | 0.004 | 0.004 | |
## | 0.494 | 0.506 | 0.171 |
## | 0.172 | 0.170 | |
## | 0.085 | 0.087 | |
## -------------|-----------|-----------|-----------|
## 3 | 262 | 262 | 524 |
## | 257.671 | 266.329 | |
## | 0.073 | 0.070 | |
## | 0.500 | 0.500 | 0.262 |
## | 0.267 | 0.258 | |
## | 0.131 | 0.131 | |
## -------------|-----------|-----------|-----------|
## 4 | 419 | 434 | 853 |
## | 419.452 | 433.548 | |
## | 0.000 | 0.000 | |
## | 0.491 | 0.509 | 0.427 |
## | 0.427 | 0.428 | |
## | 0.210 | 0.217 | |
## -------------|-----------|-----------|-----------|
## 5 | 32 | 20 | 52 |
## | 25.570 | 26.430 | |
## | 1.617 | 1.564 | |
## | 0.615 | 0.385 | 0.026 |
## | 0.033 | 0.020 | |
## | 0.016 | 0.010 | |
## -------------|-----------|-----------|-----------|
## Column Total | 982 | 1015 | 1997 |
## | 0.492 | 0.508 | |
## -------------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 5.527131 d.f. = 4 p = 0.2373549
##
##
##
## Fisher's Exact Test for Count Data
## ------------------------------------------------------------
## Alternative hypothesis: two.sided
## p = 0.2389056
##
##
aov
uses Type I SS by default, so order matters.
output <- aov(writing ~ level*gender, data = dta_rw)
summary(output)
## Df Sum Sq Mean Sq F value Pr(>F)
## level 4 26340 6585 1309.427 <2e-16 ***
## gender 1 1 1 0.192 0.661
## level:gender 4 10 3 0.512 0.727
## Residuals 1987 9993 5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
output <- aov(writing ~ gender*level, data = dta_rw)
summary(output)
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 15 15 3.065 0.0802 .
## level 4 26326 6581 1308.709 <2e-16 ***
## gender:level 4 10 3 0.512 0.7272
## Residuals 1987 9993 5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here, the order impacts the results. Again, see the point at the end of the document to see why order matters here but not with the alcohol example.
Type I SS are not appropriate here because there is no scientifically meaningful ordering of the predictors. There are plausible scientific reasons why a particular order might be chosen- maybe we’re controlling for gender before considering proficiency levels - but here I don’t have any order.
Type II would therefore be more appropriate: simultaneous for main effects, sequential for interaction. That is, the effect of the interaction is considered only after main effects are accounted for.
car::Anova(output,type="2")
## Anova Table (Type II tests)
##
## Response: writing
## Sum Sq Df F value Pr(>F)
## gender 1.0 1 0.1918 0.6615
## level 26326.0 4 1308.7092 <2e-16 ***
## gender:level 10.3 4 0.5116 0.7272
## Residuals 9992.6 1987
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now there is a significant difference in writing scores across proficiency levels, but no significant differences in writing scores across gender There is no evidence of an interaction, so the main effects are directly interpretable.
We might also go with the most common SS, Type III.
car::Anova(output,type="3")
## Anova Table (Type III tests)
##
## Response: writing
## Sum Sq Df F value Pr(>F)
## (Intercept) 10465.3 1 2080.9898 <2e-16 ***
## gender 2.4 1 0.4728 0.4918
## level 12582.4 4 625.4949 <2e-16 ***
## gender:level 10.3 4 0.5116 0.7272
## Residuals 9992.6 1987
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now only the proficiency level effect is significant. What’s the best answer? There isn’t a clear right or wrong, and you can justify Type II or Type III.
kable(anova_stats(output)) %>%
kable_styling()
etasq | partial.etasq | omegasq | partial.omegasq | epsilonsq | cohens.f | term | sumsq | df | meansq | statistic | p.value | power | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gender | 0.000 | 0.002 | 0.000 | 0.001 | 0.000 | 0.039 | gender | 15.414 | 1 | 15.414 | 3.065 | 0.080 | 0.417 |
level | 0.724 | 0.725 | 0.724 | 0.724 | 0.724 | 1.623 | level | 26325.975 | 4 | 6581.494 | 1308.709 | 0.000 | 1.000 |
gender:level | 0.000 | 0.001 | 0.000 | -0.001 | 0.000 | 0.032 | gender:level | 10.292 | 4 | 2.573 | 0.512 | 0.727 | 0.175 |
1 | NA | NA | NA | NA | NA | NA | Residuals | 9992.616 | 1987 | 5.029 | NA | NA | NA |
The gender and interaction effect sizes are negligible. Proficiency level is large.
Given all of this evidence, I would follow this up by dropping the interaction, since it is non-significant and of negligible magnitude.
output <- aov(writing ~ level + gender, data = dta_rw)
car::Anova(output,type="3")
## Anova Table (Type III tests)
##
## Response: writing
## Sum Sq Df F value Pr(>F)
## (Intercept) 21095 1 4198.801 <2e-16 ***
## level 26326 4 1309.995 <2e-16 ***
## gender 1 1 0.192 0.6613
## Residuals 10003 1991
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
OK, now comes decision time. Which result provides the best scientific information? I would go with the main effects only model (i.e., with no interaction) using Type III SS. I don’t have a strong reason for this, except that it offers the most parsimonious description of what’s going on.
We should have tested for homogeneity of variance before doing the
analyses. Let’s do it now. If we do the obvious test- e.g.,
bartlett.test(writing~level*gender,data = dta_rw)
- we’ll
get an error. You can’t look at homogeneity of variance with more than 1
group predictor. We’re going to need to do some manipulation of the
categories. A nice package to do this is forcats.
The
fct_cross
function creates a new variable that combines two
categorical variables. Then, we can look at homogeneity of variance
across all 10 gender X level
combinations.
library(forcats)
# Note that all IVs need to be a factor variable for the fct_cross() to work properly
dta_rw$levelgen <- fct_cross(dta_rw$level,
dta_rw$gender,
sep="-")
unique(dta_rw$levelgen)
## [1] 3-male 3-female 2-female 4-female 2-male 4-male 1-female 5-female
## [9] 1-male 5-male
## 10 Levels: 1-female 2-female 3-female 4-female 5-female 1-male ... 5-male
Now we have 10 groups in a single factor variable (5 level * 2 gender) instead of two factor variables. This allows for testing homogeneity of variance.
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
bartlett.test(writing~levelgen,data = dta_rw)
##
## Bartlett test of homogeneity of variances
##
## data: writing by levelgen
## Bartlett's K-squared = 72.209, df = 9, p-value = 5.608e-12
leveneTest(writing~levelgen,data = dta_rw)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 9 4.9026 1.591e-06 ***
## 1987
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fligner.test(writing~levelgen,data = dta_rw)
##
## Fligner-Killeen test of homogeneity of variances
##
## data: writing by levelgen
## Fligner-Killeen:med chi-squared = 39.779, df = 9, p-value = 8.331e-06
All of the tests are significant. This means that the assumption of homogeneity of variance does not hold, and all our results so far cannot be trusted.
In such a case, the WRS2
package offers a two-way ANOVA
that is robust to heterogeneity of variance, but isn’t well-established
in the statistical literature. In other words, we should not be 100%
confident in the results from this package, but if there are major
concerns with heterogeneity of variance, then this should be considered
more trustworthy than the results with the assumption of homogeneity of
variance.
library(WRS2)
t2way(writing ~ level*gender, data=dta_rw)
## Call:
## t2way(formula = writing ~ level * gender, data = dta_rw)
##
## value p.value
## level 7380.2014 0.001
## gender 0.5752 0.449
## level:gender 3.7356 0.451
Another option is to do this as a regression using heteroskedasticity robust SEs.
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(sandwich)
output<-lm(writing ~ level*gender, data=dta_rw)
summary(output)
##
## Call:
## lm(formula = writing ~ level * gender, data = dta_rw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.364 -1.364 0.400 1.603 6.603
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.23000 0.22425 45.618 <2e-16 ***
## level2 3.09544 0.28293 10.941 <2e-16 ***
## level3 6.30053 0.26360 23.902 <2e-16 ***
## level4 9.96809 0.24958 39.939 <2e-16 ***
## level5 13.36375 0.45546 29.341 <2e-16 ***
## gendermale 0.20651 0.30034 0.688 0.492
## level2:gendermale -0.29496 0.38604 -0.764 0.445
## level3:gendermale -0.34010 0.35860 -0.948 0.343
## level4:gendermale -0.04054 0.33733 -0.120 0.904
## level5:gendermale -0.20026 0.70626 -0.284 0.777
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.243 on 1987 degrees of freedom
## Multiple R-squared: 0.7251, Adjusted R-squared: 0.7238
## F-statistic: 582.2 on 9 and 1987 DF, p-value: < 2.2e-16
coeftest(output, vcov = vcovHC(output, type = "HC0"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.230000 0.218566 46.8051 <2e-16 ***
## level2 3.095444 0.273055 11.3363 <2e-16 ***
## level3 6.300534 0.260283 24.2065 <2e-16 ***
## level4 9.968091 0.246290 40.4730 <2e-16 ***
## level5 13.363750 0.251240 53.1913 <2e-16 ***
## gendermale 0.206508 0.286309 0.7213 0.4708
## level2:gendermale -0.294958 0.370952 -0.7951 0.4266
## level3:gendermale -0.340096 0.348750 -0.9752 0.3296
## level4:gendermale -0.040543 0.327481 -0.1238 0.9015
## level5:gendermale -0.200258 0.352595 -0.5680 0.5701
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This approach works very differently than the t2way
function approach. I have no idea which is better. It may be that one
approach works better in some circumstances, and the other works better
in other circumstances.
You’ve reached the end. Let’s answer the question of why the order of predictors with Type I SS didn’t matter for the alcohol example but did (albeit not strongly) for the writing scores example.
The reason for the alcohol example is that the sample sizes were the same for all cells. That means the alcohol and speed factors are independent. That is, if you know which level of alcohol a participant is in, it offers no information about which level of speed the participant is in.
This has a small interpretational benefit: the effect of alcohol on cones hit is interpretable without considering the level of speed. Normally, we interpret an effect of a variable, whether in an ANOVA or a regression, as “the effect of X on Y, controlling for all other predictors” which is the same as saying “the effect of X on Y, holding all other predictors constant”, although we usually abbreviate this interpretation by just saying “the effect of X on Y” with controlling for all other predictors implied.
But here, the other predictors (speed) don’t matter for X (alcohol), so the interpretation doesn’t involve ‘holding all other predictors constant’.
In my opinion, this is a very minor benefit - the interpretation of any predictor even when it is correlated with other predictors is no big deal.
Since alcohol and speed are independent, the SS for alcohol is the same whether we control for speed or not - although it’s important to note that the F-ratio will be different for alcohol if we include speed in the ANOVA because the SSwithin and dfwithin will be different.
output<-aov(cones ~ speed + alcohol,data=alcdata)
summary(output)
## Df Sum Sq Mean Sq F value Pr(>F)
## speed 1 16.13 16.13 7.59 0.0106 *
## alcohol 2 85.40 42.70 20.09 5.32e-06 ***
## Residuals 26 55.27 2.13
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
output<-aov(cones ~ alcohol,data=alcdata)
summary(output)
## Df Sum Sq Mean Sq F value Pr(>F)
## alcohol 2 85.4 42.70 16.15 2.44e-05 ***
## Residuals 27 71.4 2.64
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the writing score example, the order mattered.
This is because a student’s proficiency level should be randomly determined (unlike the experimental assignment). Within the general population, there’s no inherent link between gender and proficiency levels. However, it’s important to remember that our data is a sample, not the entire population. In sampling, random allocation can sometimes lead to unexpected patterns. In this case, our sample displayed a slight, though not statistically significant (as indicated by the chi-square test of independence), relationship between gender and proficiency level. This relationship, it should be noted, is likely a product of random chance rather than any underlying dependency between these variables.