Chapter 1. Introduction to Factorial ANOVA

  • Condition: (a) Two independent variables (b) One continuous dependent variable

(1) Data Import and Exploration with a barplot

driving_errors <- read.csv("Data Files/driving_errors.csv")
attach(driving_errors)
colnames(driving_errors) <- c("subject", "conversation", "driving", "errors")
str(driving_errors)
## 'data.frame':    120 obs. of  4 variables:
##  $ subject     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ conversation: Factor w/ 3 levels "High demand",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ driving     : Factor w/ 2 levels "Difficult","Easy": 2 2 2 2 2 2 2 2 2 2 ...
##  $ errors      : int  20 19 31 27 31 17 23 26 11 15 ...
  • Variables
    • subject: unique identifier for each subject (int)
    • conversation: level of conversation difficulty (factor - independent)
    • driving : level of driving difficulty in the simulator (factor - independent)
    • errors: number of driving errors made (int - dependent)
# Create 6 subgroups
subgroups <- tapply(driving_errors$errors, list(driving_errors$driving, driving_errors$conversation), sum)

# Make the required barplot
barplot(subgroups, beside = TRUE, 
        col = c("orange", "blue"), 
        main = "Driving Errors", 
        xlab = "Conversation Demands", 
        ylab = "Errors")

# Add the legend
legend("topright", c("Difficult", "Easy"), 
    title = "Driving",
    fill = c("orange", "blue"))

(2) Interpretation of Barplot

  • Do you expect that the driving errors made during different driving conditions are influenced by the level of conversation demand? In other words, do you think that the driving conditions will have a different effect on the number of errors made, depending on the level of conversation demand?

  • The answer is “Yes”. It is indeed visually clear that subjects tend to make more driving errors in difficult driving conditions. These effects seem to get stronger when subjects are having a difficult conversation. In what follows, you will learn how to formally assess the statistical significance of these intuitive observations.

Chapter 2. Hypotheses, F-Ratios and its effects

(1) Concept 1. Hypotheses

  • The three possible hypotheses can be made on the basis of two independent variables
    • Are there more errors in the difficult simulator? (driving)
    • Are there more errors in more demanding conversations? (conversation)
    • Are there more errors in the interaction of driving difficulty and conversation demand? (driving + conversation)

(2) Concept 2. F-Ratios

  • Now, we talk this, statiscally.
    • F(A): 1st independent variable; driving difficulty
    • F(B): 2nd independent variable; conversation demand
    • F(A x B)

(3) Concept 3. Effects

  • Main Effect: Effect of one independent variable ignoring the other one.
  • Interaction Effect: Effect of one independent variable depends on the other.
  • Simple Effect: Effect of one independent variable at a particular level of the other.

Chapter 3. Actual Test, Example

(1) Step 1. The homogeneity of variance

  • Before performing factorial ANOVA, let’s test the homogeneity of the variance assumption.
  • If p value is higher than alpha value, then the homogeneity of variance assumption can hold up.
  • The test function in R is leveneTest() function.
    • formula is like this “leveneTest(dependent_var ~ independent_var1 * independent_var2)”
library(car)
## Warning: package 'car' was built under R version 3.4.1
attach(driving_errors)
## The following objects are masked from driving_errors (pos = 4):
## 
##     errors, subject
leveneTest(errors ~ conversation * driving)
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   5  0.5206 0.7602
##       114
  • Interpretaion 1. The result of p-value (0.7602) is much higher than alpha value(0.05). Thus, the homogeneity of variance still holds up.
  • Interpretaion 2. There is not a significant difference in the variances across the 6 groups. This means that it is valid to pool all error variances from the groups to get one estimate of error, and to use that in the three F-ratios in the factorial ANOVA.

  • df(a) = a - 1, a represents the number of groups
  • df(s/a) = a(n-1), a represents the number of groups, n represents the number of subjects in each group.

(2) Step 2. The Factorial ANOVA: Model, Main Effect.

  • If you have tested the homogeneity of variance assumption, it’s time to run the factorial ANOVA.
  • The formula is as followed:
    • aov(dependent_variable ~ independent_var1 * independent_var2)
attach(driving_errors)
## The following objects are masked from driving_errors (pos = 3):
## 
##     conversation, driving, errors, subject
## The following objects are masked from driving_errors (pos = 5):
## 
##     errors, subject
# Factorial ANOVA
anova.error_model <- aov(errors ~ driving * conversation)

# Get the summary table
summary(anova.error_model)
##                       Df Sum Sq Mean Sq F value   Pr(>F)    
## driving                1   5782    5782   94.64  < 2e-16 ***
## conversation           2   4416    2208   36.14 6.98e-13 ***
## driving:conversation   2   1639     820   13.41 5.86e-06 ***
## Residuals            114   6965      61                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Interpreataion, Since All p-values indicate significance, the main effect for driving difficulty, the main effect for conversation difficulty and the interaction effect are all significant.

(3-1) Step 3. The interaction effect

  • With an interaction effect, it is quite similar. Conduct a simple effects analysis of the variable “conversation” on the outcome variable “errors” at each level of “driving” (easy & difficult)
# step 1. create easy & difficult tables using subset()
easy.driving <- subset(driving_errors, driving == "Easy")
difficult.driving <- subset(driving_errors, driving == "Difficult")


# step 2. Perform one-way ANOVA for both subsets
easy.aov <- aov(errors ~ conversation, data = easy.driving)
difficult.aov <- aov(errors ~ conversation, data = difficult.driving)

# step 3. call summary
summary(easy.aov)
##              Df Sum Sq Mean Sq F value Pr(>F)  
## conversation  2  504.7   252.3   4.928 0.0106 *
## Residuals    57 2918.5    51.2                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(difficult.aov)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## conversation  2   5551    2776   39.09 2.05e-11 ***
## Residuals    57   4047      71                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Interpretation of easy.aov: The p-value is 0.01 so there is still a significant effect of conversation even for the easy driving condition.

  • Interpretation of difficult.aov: The p-value is almost zero. Notice that you have a much larger F-value here compared to the easy driving condition in the previous exercise. So since both ANOVAs have the same degrees of freedom, the magnitude of effect size is larger for the difficult driving condition. However, that is not what the F-ratio is designed to tell you. Let’s investigate this further…

(3-2) Step 4. The effect sizes

  • The definition of an interaction effect states that the effect of one variable changes across levels of the other variable.
  • Unfortunately, it is not quite that simple. In order to really understand the different effect sizes, you should make use of the etaSquared() function:
library(lsr)
# Calculate the etaSquared for the easy driving case
etaSquared(easy.aov, anova = TRUE)
##                eta.sq eta.sq.part      SS df        MS        F          p
## conversation 0.147433    0.147433  504.70  2 252.35000 4.928458 0.01061116
## Residuals    0.852567          NA 2918.55 57  51.20263       NA         NA
# Calculate the etaSquared for the difficult driving case
etaSquared(difficult.aov, anova = TRUE)
##                 eta.sq eta.sq.part       SS df         MS        F
## conversation 0.5783571   0.5783571 5551.033  2 2775.51667 39.09275
## Residuals    0.4216429          NA 4046.900 57   70.99825       NA
##                         p
## conversation 2.046097e-11
## Residuals              NA
  • Interpretation of easy.aov: Almost 15% of the variance is explained by the conversation variable.
  • Interpretation of difficult.aov: Almost 58% of the variance is explained by the conversation variable.
  • You see why there is an interaction effect? The effect size in the difficult driving case (58%) is substantially larger than in the easy driving case (15%).

(4) Step 5. Pairwise comparisons, Simple Effect

  • For simple effect, Pairwise comparisons for the simple effects. You can do this with the Tukey post-hoc test: TukeyHSD(anova_object).
# Tukey for easy driving
TukeyHSD(easy.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = errors ~ conversation, data = easy.driving)
## 
## $conversation
##                         diff        lwr        upr     p adj
## Low demand-High demand -6.05 -11.495243 -0.6047574 0.0260458
## None-High demand       -6.25 -11.695243 -0.8047574 0.0207614
## None-Low demand        -0.20  -5.645243  5.2452426 0.9957026
# Tukey for difficult driving
TukeyHSD(difficult.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = errors ~ conversation, data = difficult.driving)
## 
## $conversation
##                          diff       lwr        upr     p adj
## Low demand-High demand  -9.75 -16.16202  -3.337979 0.0015849
## None-High demand       -23.45 -29.86202 -17.037979 0.0000000
## None-Low demand        -13.70 -20.11202  -7.287979 0.0000103
  • The question is here: how many of the mean differences in terms of number of errors are significant?

  • Interpretation of easy.aov: The mean difference in terms of number of errors for no conversation and low demand is not significantly different from zero.

  • Interpretation of difficult.aov: All mean differences are significantly different from zero and all mean differences are larger in this case than for the easy driving condition. This again points out that an interaction effect is presents.