ANOVA Lab

Author

Elliot Wilson

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests

Importing Data

 d <- read.csv(file="Data/labdata.csv", header=T)

# new code! this adds a column with a number for each row. it acts as an identifier for our participations
d$row_id <- 1:nrow(d)

State Your Hypothesis - PART OF YOUR WRITEUP

Note: You can chose to run either a one-way ANOVA (a single IV with at least 3 levels) or a two-way/factorial ANOVA (at least two IVs) for the homework. You will need to specify your hypothesis and customize your code based on the choice you make. I will run both versions of the test here for illustrative purposes.

One-way ANOVA: Cat owners will be significantly higher in depression (measured by the PHQ) when compared to dog owners and people without pets.

Two-way ANOVA: Pet owners will have significantly lower depression scores (as measured by the PHQ) than non-pet owners. Participants without mental health diagnoses will have significantly lower depression scores than participantswith mental health diagnoses. Pet owners with mental health diagnoses will report significantly lower depression scores than non-pet owners with mental health diagnoses.

Pet ownership IV: Pet ownder vs non-pet owners Mental health IV: People with diagnosis vs people without diagnosis

State your hypotheses. Remember, your DV will be a continuous variable. For your IV, you need either one categorical variable with three levels, or two categorical variables with at least two levels each.

Check Your Assumptions

ANOVA Assumptions

  • Independence of observations (confirmed by data report)
  • All levels of the IVs should have equal number of cases (ideally; in the real world, this varies) and there should be no empty cells. Cells with low numbers increase chance of Type II error. (we will check this below)
  • Homogeneity of variance should be assured (we will check this below)
  • Outliers should be identified and removed (we will check this below)
  • DV should be normally distributed for each level of the IV (we will check this below)

Check levels of IVs

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
table(d$pet, useNA = "always")

                 bird                   cat           cat and dog 
                    5                   211                   136 
                  dog                  fish multiple types of pet 
                  246                    35                   104 
              no pets                 other                  <NA> 
                  396                    68                     0 
d2 <- d

# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
cross_cases(d2, pet, mhealth)
 mhealth 
 anxiety disorder   bipolar   depression   eating disorders   none or NA   obsessive compulsive disorder   other   ptsd 
 pet 
   bird  1 1 3
   cat  29 1 4 3 164 4 3 3
   cat and dog  19 1 6 99 4 5 2
   dog  32 3 12 6 176 4 5 8
   fish  3 1 30 1
   multiple types of pet  12 3 3 76 2 4 4
   no pets  25 1 8 9 326 9 14 4
   other  7 2 53 2 3 1
   #Total cases  128 5 31 28 927 26 34 22
# if you need to recode
# to drop levels from your variable
# this subsets the data and says that any participant who is coded as 'LEVEL BAD' should be removed
# if you don't need this for the homework, comment it out (add a # at the beginning of the line)
d <- subset(d, pet != "bird")
d <- subset(d, pet != "cat and dog")
d <- subset(d, pet != "fish")
d <- subset(d, pet != "multiple types of pet")
d <- subset(d, pet != "other")


# 
# # to combine levels
# # this says that where any participant is coded as 'LEVEL BAD' it should be replaced by 'LEVEL GOOD'
# # you can repeat this as needed, changing 'LEVEL BAD' if you have multiple levels that you want to combine into a single level
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
d2$pet_rc[d2$pet == "cat"] <- "pet owner"
d2$pet_rc[d2$pet == "bird"] <- "pet owner"
d2$pet_rc[d2$pet == "cat and dog"] <- "pet owner"
d2$pet_rc[d2$pet == "dog"] <- "pet owner"
d2$pet_rc[d2$pet == "multiple types of pet"] <- "pet owner"
d2$pet_rc[d2$pet == "other"] <- "pet owner"
d2$pet_rc[d2$pet == "fish"] <- "pet owner"

d2$pet_rc[d2$pet == "no pets"] <- "no pets"

d2$mhealth_rc <- "diagnosis"
d2$mhealth_rc[d2$mhealth == "none or NA"] <- "no diagnosis"
# 
# # preview your changes and make sure everything is correct
table(d$pet, useNA = "always")

    cat     dog no pets    <NA> 
    211     246     396       0 
# # or
cross_cases(d2, pet_rc, mhealth_rc)
 mhealth_rc 
 diagnosis   no diagnosis 
 pet_rc 
   no pets  70 326
   pet owner  204 601
   #Total cases  274 927
# 
# # check your variable types
str(d)
'data.frame':   853 obs. of  7 variables:
 $ pet    : chr  "cat" "cat" "dog" "no pets" ...
 $ mhealth: chr  "none or NA" "anxiety disorder" "none or NA" "none or NA" ...
 $ iou    : num  3.19 4 1.59 3.37 1.11 ...
 $ rse    : num  2.3 1.6 3.9 1.7 1.8 2.6 3 2.5 3.4 2 ...
 $ phq    : num  1.33 3.33 1 2.33 2.33 ...
 $ pss    : num  3.25 3.75 1 3.25 4 2.5 2.5 3.75 2.75 3 ...
 $ row_id : int  1 2 3 4 6 7 8 10 11 12 ...
str(d2)
'data.frame':   1201 obs. of  9 variables:
 $ pet       : chr  "cat" "cat" "dog" "no pets" ...
 $ mhealth   : chr  "none or NA" "anxiety disorder" "none or NA" "none or NA" ...
 $ iou       : num  3.19 4 1.59 3.37 1.7 ...
 $ rse       : num  2.3 1.6 3.9 1.7 3.9 1.8 2.6 3 3.5 2.5 ...
 $ phq       : num  1.33 3.33 1 2.33 1.11 ...
 $ pss       : num  3.25 3.75 1 3.25 2 4 2.5 2.5 2 3.75 ...
 $ row_id    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ pet_rc    : chr  "pet owner" "pet owner" "pet owner" "no pets" ...
 $ mhealth_rc: chr  "no diagnosis" "diagnosis" "no diagnosis" "no diagnosis" ...
# 
# # make sure that your IV is recognized as a factor by R
d$pet <- as.factor(d$pet)
d2$pet_rc <- as.factor(d2$pet_rc)
d2$mhealth_rc <- as.factor(d2$mhealth_rc)

Check homogeneity of variance

# use the leveneTest() command from the car package to test homogeneity of variance
# uses the 'formula' setup: formula is y~x1*x2, where y is our DV and x1 is our first IV and x2 is our second IV

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
leveneTest(phq~pet, data = d)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   2  0.8269 0.4378
      850               
# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
leveneTest(phq~pet_rc*mhealth_rc, data = d2)
Levene's Test for Homogeneity of Variance (center = median)
        Df F value Pr(>F)
group    3  1.9668 0.1172
      1197               

Check for outliers using Cook’s distance and Residuals vs Leverage plot

Run a Regression

# # use this commented out section only if you need to remove outliers
# # to drop a single outlier, remove the # at the beginning of the line and use this code:
# # d <- subset(d, row_id!=c(1108))
# 
# # to drop multiple outliers, remove the # at the beginning of the line and use this code:
# # d <- subset(d, row_id!=c(1108) & row_id!=c(602))
# 
# use the lm() command to run the regression
# formula is y~x1*x2 + c, where y is our DV, x1 is our first IV, x2 is our second IV, and c is our covariate

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
reg_model <- lm(phq ~ pet, data = d) #for one-way

# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
reg_model2 <- lm(phq ~ pet_rc*mhealth_rc, data = d2) #for two-way

Check for outliers

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
# Cook's distance (.5 is the cutoff)
plot(reg_model, 4)

# Residuals vs Leverage (we want the red line to be close to the dotted line)
plot(reg_model, 5)

# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
# Cook's distance
plot(reg_model2, 4)

# Residuals vs Leverage
plot(reg_model2, 5)

Check Your Variables

# we'll use the describeBy() command to view skew and kurtosis across our IVs and make sure the DV is normally distributed across all of the levels
describeBy(d$phq, group = d$pet)

 Descriptive statistics by group 
group: cat
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 211 2.21 0.87   2.11    2.15 0.99   1   4     3 0.46    -0.98 0.06
------------------------------------------------------------ 
group: dog
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 246  2.1 0.88   1.94    2.03 0.91   1   4     3 0.61    -0.88 0.06
------------------------------------------------------------ 
group: no pets
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 396 2.01 0.84   1.78    1.93 0.82   1   4     3 0.67    -0.59 0.04
describeBy(d2$phq, group = d2$pet_rc)

 Descriptive statistics by group 
group: no pets
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 396 2.01 0.84   1.78    1.93 0.82   1   4     3 0.67    -0.59 0.04
------------------------------------------------------------ 
group: pet owner
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 805 2.13 0.87      2    2.05 0.99   1   4     3 0.58    -0.81 0.03
describeBy(d2$phq, group = d2$mhealth_rc)

 Descriptive statistics by group 
group: diagnosis
   vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
X1    1 274 2.62 0.84   2.67    2.64 0.99   1   4     3 -0.09    -1.07 0.05
------------------------------------------------------------ 
group: no diagnosis
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 927 1.93 0.81   1.67    1.83 0.66   1   4     3 0.88    -0.17 0.03

Issues with My Data - PART OF YOUR WRITEUP

One-way ANOVA - Confirmed independence of observations by checking the data report - Checked cell sizes and describe any dropping or combining - Checked homogenaity of variance using Levine’s test and it was non-significant - We checked for outliers using Cook’s distance scores and visual analysis of residuals vs leverage plot (no outliers) - Checked normality of our DV by the levels of our IV and all was good (between -2 and +2)

Briefly describe any issues with your data and how they might impact the interpretation of your results. As usual, this should be written in an appropriate scientific tone.

Run an ANOVA

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
aov_model <- aov_ez(data = d,
                    id = "row_id",
                    between = c("pet"),
                    dv = "phq",
                    anova_table = list(es = "pes"))
Contrasts set to contr.sum for the following variables: pet
# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
aov_model2 <- aov_ez(data = d2,
                    id = "row_id",
                    between = c("pet_rc","mhealth_rc"),
                    dv = "phq",
                    anova_table = list(es = "pes"))
Contrasts set to contr.sum for the following variables: pet_rc, mhealth_rc

View Output

Effect size cutoffs from Cohen (1988):

  • η2 = 0.01 indicates a small effect
  • η2 = 0.06 indicates a medium effect
  • η2 = 0.14 indicates a large effect
# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
nice(aov_model)
Anova Table (Type 3 tests)

Response: phq
  Effect     df  MSE      F  pes p.value
1    pet 2, 850 0.74 3.61 * .008    .028
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
nice(aov_model2)
Anova Table (Type 3 tests)

Response: phq
             Effect      df  MSE          F   pes p.value
1            pet_rc 1, 1197 0.67       0.71 <.001    .400
2        mhealth_rc 1, 1197 0.67 120.95 ***  .092   <.001
3 pet_rc:mhealth_rc 1, 1197 0.67       0.06 <.001    .800
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Visualize Results

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
afex_plot(aov_model, x = "pet")

# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
afex_plot(aov_model2, x = "pet_rc", trace = "mhealth_rc")

afex_plot(aov_model2, x = "mhealth_rc", trace = "pet_rc")

Run Posthoc Tests

Only run posthocs if the test is significant! E.g., only run the posthoc tests on gender if there is a main effect for gender.

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
emmeans(aov_model, specs="pet", adjust="tukey")
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
 pet     emmean     SE  df lower.CL upper.CL
 cat       2.21 0.0594 850     2.06     2.35
 dog       2.10 0.0550 850     1.97     2.23
 no pets   2.01 0.0433 850     1.91     2.11

Confidence level used: 0.95 
Conf-level adjustment: sidak method for 3 estimates 
pairs(emmeans(aov_model, specs="pet", adjust="tukey"))
 contrast      estimate     SE  df t.ratio p.value
 cat - dog        0.103 0.0809 850   1.279  0.4077
 cat - no pets    0.195 0.0735 850   2.660  0.0217
 dog - no pets    0.092 0.0700 850   1.315  0.3872

P value adjustment: tukey method for comparing a family of 3 estimates 
# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
emmeans(aov_model2, specs="pet_rc", adjust="tukey")
NOTE: Results may be misleading due to involvement in interactions
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
 pet_rc    emmean     SE   df lower.CL upper.CL
 no pets     2.24 0.0537 1197     2.12     2.36
 pet owner   2.29 0.0331 1197     2.22     2.37

Results are averaged over the levels of: mhealth_rc 
Confidence level used: 0.95 
Conf-level adjustment: sidak method for 2 estimates 
pairs(emmeans(aov_model2, specs="pet_rc", adjust="tukey"))
NOTE: Results may be misleading due to involvement in interactions
 contrast            estimate     SE   df t.ratio p.value
 no pets - pet owner  -0.0531 0.0631 1197  -0.841  0.4003

Results are averaged over the levels of: mhealth_rc 
emmeans(aov_model2, specs="mhealth_rc", adjust="tukey")
NOTE: Results may be misleading due to involvement in interactions
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
 mhealth_rc   emmean     SE   df lower.CL upper.CL
 diagnosis      2.61 0.0565 1197     2.49     2.74
 no diagnosis   1.92 0.0281 1197     1.86     1.98

Results are averaged over the levels of: pet_rc 
Confidence level used: 0.95 
Conf-level adjustment: sidak method for 2 estimates 
pairs(emmeans(aov_model2, specs="mhealth_rc", adjust="tukey"))
NOTE: Results may be misleading due to involvement in interactions
 contrast                 estimate     SE   df t.ratio p.value
 diagnosis - no diagnosis    0.694 0.0631 1197  10.998  <.0001

Results are averaged over the levels of: pet_rc 
emmeans(aov_model2, specs="pet_rc", by="mhealth_rc", adjust="sidak")
mhealth_rc = diagnosis:
 pet_rc    emmean     SE   df lower.CL upper.CL
 no pets     2.60 0.0975 1197     2.38     2.81
 pet owner   2.63 0.0571 1197     2.50     2.76

mhealth_rc = no diagnosis:
 pet_rc    emmean     SE   df lower.CL upper.CL
 no pets     1.89 0.0452 1197     1.78     1.99
 pet owner   1.95 0.0333 1197     1.88     2.03

Confidence level used: 0.95 
Conf-level adjustment: sidak method for 2 estimates 
pairs(emmeans(aov_model2, specs="pet_rc", by="mhealth_rc", adjust="sidak"))
mhealth_rc = diagnosis:
 contrast            estimate     SE   df t.ratio p.value
 no pets - pet owner  -0.0371 0.1130 1197  -0.328  0.7426

mhealth_rc = no diagnosis:
 contrast            estimate     SE   df t.ratio p.value
 no pets - pet owner  -0.0690 0.0561 1197  -1.230  0.2188
emmeans(aov_model2, specs="mhealth_rc", by="pet_rc", adjust="sidak")
pet_rc = no pets:
 mhealth_rc   emmean     SE   df lower.CL upper.CL
 diagnosis      2.60 0.0975 1197     2.38     2.81
 no diagnosis   1.89 0.0452 1197     1.78     1.99

pet_rc = pet owner:
 mhealth_rc   emmean     SE   df lower.CL upper.CL
 diagnosis      2.63 0.0571 1197     2.50     2.76
 no diagnosis   1.95 0.0333 1197     1.88     2.03

Confidence level used: 0.95 
Conf-level adjustment: sidak method for 2 estimates 
pairs(emmeans(aov_model2, specs="mhealth_rc", by="pet_rc", adjust="sidak"))
pet_rc = no pets:
 contrast                 estimate     SE   df t.ratio p.value
 diagnosis - no diagnosis    0.710 0.1070 1197   6.604  <.0001

pet_rc = pet owner:
 contrast                 estimate     SE   df t.ratio p.value
 diagnosis - no diagnosis    0.678 0.0661 1197  10.254  <.0001

Write Up Results

One-Way ANOVA

Write up your results. Again, make sure to maintain an appropriate tone, and follow APA guidelines for reporting statistical results. I recommend following the below outline:

  • Briefly restate your hypothesis
  • Describe any issues with your data (you can copy/paste from above, just make sure everything flows).
  • Report your results. Make sure to include your F-value, degrees of freedom, p-value, and effect size. Since we are showing our means and standard deviations for the levels of our IV in our plot, you do NOT have to report them in the text (normally, you would report it like this: (M = #.##, SD = .##), repeating for each level).
  • Specify where the differences occur. For instance, one level of the IV may be significantly different than the other two, or there may be multiple differences.
  • Describe your interaction, if you ran a two-way ANOVA and there is one.
  • Interpret your effect size (trivial, small, medium, or large) and include the citation.
  • Make sure to include a reference to Figure 1 (created using the code below)

*F(2, 850) = 3.61, p<.028, ηp2 > .01

  • We did not find a significant main effect for pet ownership (p = .400)
  • We did find a significant main effect of mental health diagnosis on depression score, F(1, 1197) = 120.95, p<.001, ηp2 = .09
  • We did not find a significant interaction for pet ownership and mental health diagnosis (p = .800)

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.