ANOVA HW

Author

Heather Perkins

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests

Importing Data

d <- read.csv(file="Data/mydata.csv", header=T)
 
# new code! this adds a column with a number for each row. it acts as an identifier for our participations
d$row_id <- 1:nrow(d)

State Your Hypothesis - PART OF YOUR WRITEUP

Note: You can chose to run either a one-way ANOVA (a single IV with more than 3 levels) or a two-way/factorial ANOVA (at least two IVs) for the homework. You will need to specify your hypothesis and customize your code based on the choice you make. I will run both versions of the test here for illustrative purposes.

One-way ANOVA: Females will have significantly higher stress levels (measured by the PSS) when compared to males, those who use another term, and those who prefer not to say.

Check Your Assumptions

ANOVA Assumptions

Independence of observations (confirmed by data report)
All levels of the IVs should have equal number of cases (ideally; in the real world, this varies) and there should be no empty cells. Cells with low numbers increase chance of Type II error. (we will check this below)
Homogeneity of variance should be assured (we will check this below)
Outliers should be identified and removed (we will check this below)
DV should be normally distributed for each level of the IV (we will check this below)

Check levels of IVs

# for a one-way ANOVA
# COMMENT THIS SECTION OUT FROM THE HW IF YOU DO NOT USE IT
table(d$gender, useNA = "always")


            female I use another term               male  Prefer not to say 
                97                  2                 17                  2 
              <NA> 
                 0

# d2 <- d

# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
# cross_cases(d2, pet, mhealth)

# if you need to recode
# to drop levels from your variable
# this subsets the data and says that any participant who is coded as 'LEVEL BAD' should be removed
# if you don't need this for the homework, comment it out (add a # at the beginning of the line)
# d <- subset(d, pet != "bird")
# d <- subset(d, pet != "cat and dog")
# d <- subset(d, pet != "fish")
# d <- subset(d, pet != "multiple types of pet")
# d <- subset(d, pet != "other")
 
# to combine levels
# this says that where any participant is coded as 'LEVEL BAD' it should be replaced by 'LEVEL GOOD'
# you can repeat this as needed, changing 'LEVEL BAD' if you have multiple levels that you want to combine into a single level
# if you don't need this for the homework, comment it out (add a # at the beginning of the line)
d$gender_rc[d$gender == "I use another term"] <- "other"
d$gender_rc[d$gender == "Prefer not to say"] <- "other"
# d2$pet_rc[d2$pet == "cat and dog"] <- "pet owner"
# d2$pet_rc[d2$pet == "dog"] <- "pet owner"
# d2$pet_rc[d2$pet == "fish"] <- "pet owner"
# d2$pet_rc[d2$pet == "multiple types of pet"] <- "pet owner"
# d2$pet_rc[d2$pet == "other"] <- "pet owner"

d$gender_rc[d$gender == "female"] <- "female"
d$gender_rc[d$gender == "male"] <- "male"

# d2$mhealth_rc <- "diagnosis"
# d2$mhealth_rc[d2$mhealth == "none or NA"] <- "no diagnosis"

# # preview your changes and make sure everything is correct
table(d$gender_rc, useNA = "always")


female   male  other   <NA> 
    97     17      4      0

# # or
# cross_cases(d2, pet_rc, mhealth_rc)

# check your variable types
str(d)

'data.frame':   118 obs. of  8 variables:
 $ gender              : chr  "female" "male" "female" "female" ...
 $ age                 : chr  "1 under 18" "1 under 18" "1 under 18" "1 under 18" ...
 $ mfq_26              : num  5.1 3.7 4.5 4.05 3.9 4.25 3.45 3.55 4.95 4.1 ...
 $ school_covid_support: num  1.25 1.2 1 1.6 1.33 ...
 $ pss                 : num  4.5 3.25 3.75 3.5 5 4 4.25 4.5 3.75 2.75 ...
 $ support             : num  3.5 1.5 4.17 4.5 1.5 ...
 $ row_id              : int  1 2 3 4 5 6 7 8 9 10 ...
 $ gender_rc           : chr  "female" "male" "female" "female" ...

# str(d2)
 
# make sure that your IV is recognized as a factor by R
d$gender_rc <- as.factor(d$gender_rc)

# d2$pet_rc <- as.factor(d2$pet_rc)
# d2$mhealth_rc <- as.factor(d2$mhealth_rc)

Check homogeneity of variance

# use the leveneTest() command from the car package to test homogeneity of variance
# uses the 'formula' setup: formula is y~x1*x2, where y is our DV and x1 is our first IV and x2 is our second IV

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
leveneTest(pss~gender_rc, data = d)

Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   2  1.3904 0.2531
      115

# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
# leveneTest(phq~pet_rc*mhealth_rc, data = d2)

Check for outliers using Cook’s distance and Residuals vs Leverage plot

Run a Regression

# # use this commented out section only if you need to remove outliers
# # to drop a single outlier, remove the # at the beginning of the line and use this code:
# # d <- subset(d, row_id!=c(1108))
# 
# # to drop multiple outliers, remove the # at the beginning of the line and use this code:
# # d <- subset(d, row_id!=c(1108) & row_id!=c(602))
# 
# use the lm() command to run the regression
# formula is y~x1*x2 + c, where y is our DV, x1 is our first IV, x2 is our second IV, and c is our covariate

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
reg_model <- lm(pss ~ gender_rc, data = d) #for one-way
 
# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
# reg_model2 <- lm(phq ~ pet_rc*mhealth_rc, data = d2) #for two-way

Check for outliers

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
# Cook's distance
plot(reg_model, 4)

# Residuals vs Leverage
plot(reg_model, 5)

# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
# Cook's distance
# plot(reg_model2, 4)
 
# Residuals vs Leverage
# plot(reg_model2, 5)

Check Your Variables

# we'll use the describeBy() command to view skew and kurtosis across our IVs and make sure the DV is normally distributed across all of the levels
describeBy(d$pss, group = d$gender_rc)


 Descriptive statistics by group 
group: female
   vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
X1    1 97  3.4 0.88    3.5    3.41 1.11 1.5   5   3.5 -0.05    -0.88 0.09
------------------------------------------------------------ 
group: male
   vars  n mean   sd median trimmed  mad  min max range skew kurtosis   se
X1    1 17 2.82 0.97      3    2.82 1.11 1.25 4.5  3.25 0.13    -1.28 0.23
------------------------------------------------------------ 
group: other
   vars n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
X1    1 4 3.94 0.47   4.12    3.94 0.19 3.25 4.25     1 -0.62    -1.79 0.24

# describeBy(d2$phq, group = d2$pet_rc)
# describeBy(d2$phq, group = d2$mhealth_rc)

Issues with My Data - PART OF YOUR WRITEUP

One-way ANOVA: Based on the data report, there was independence of observations with 8 different variables and 118 observations. Two gender categories were combined, “Prefer not to say” and “I use another term” merged into “Other.”

Levene’s Test showed non-significance with a p-score of 0.2531. On the Cook’s distance graph, one outlier was shown with a leverage ~1, but no other outliers were present. There was also normality with all DVs with skew and kurtosis between -2 and +2 for all 3 variables.

Run an ANOVA

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
 aov_model <- aov_ez(data = d,
                     id = "row_id",
                     between = c("gender_rc"),
                     dv = "pss",
                     anova_table = list(es = "pes"))

Contrasts set to contr.sum for the following variables: gender_rc

# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
# aov_model2 <- aov_ez(data = d2,
                     # id = "row_id",
                     # between = c("pet_rc","mhealth_rc"),
                     # dv = "phq",
                     # anova_table = list(es = "pes"))

View Output

Effect size cutoffs from Cohen (1988):

ηp2 = 0.01 indicates a small effect
ηp2 = 0.06 indicates a medium effect
ηp2 = 0.14 indicates a large effect

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
nice(aov_model)

Anova Table (Type 3 tests)

Response: pss
     Effect     df  MSE      F  pes p.value
1 gender_rc 2, 115 0.78 4.06 * .066    .020
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
# nice(aov_model2)

Visualize Results

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
afex_plot(aov_model, x = "gender_rc")

# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
# afex_plot(aov_model2, x = "pet_rc", trace = "mhealth_rc")
# afex_plot(aov_model2, x = "mhealth_rc", trace = "pet_rc")

Run Posthoc Tests

Only run posthocs if the test is significant! E.g., only run the posthoc tests on gender if there is a main effect for gender.

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
emmeans(aov_model, specs="gender_rc", adjust="tukey")

Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons

 gender_rc emmean     SE  df lower.CL upper.CL
 female      3.40 0.0897 115     3.18     3.62
 male        2.82 0.2140 115     2.30     3.34
 other       3.94 0.4420 115     2.87     5.01

Confidence level used: 0.95 
Conf-level adjustment: sidak method for 3 estimates

pairs(emmeans(aov_model, specs="gender_rc", adjust="tukey"))

 contrast       estimate    SE  df t.ratio p.value
 female - male     0.579 0.232 115   2.491  0.0374
 female - other   -0.535 0.451 115  -1.188  0.4626
 male - other     -1.114 0.491 115  -2.270  0.0642

P value adjustment: tukey method for comparing a family of 3 estimates

# for a two-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
# emmeans(aov_model2, specs="pet_rc", adjust="tukey")
# pairs(emmeans(aov_model2, specs="pet_rc", adjust="tukey"))
  
# emmeans(aov_model2, specs="mhealth_rc", adjust="tukey")
# pairs(emmeans(aov_model2, specs="mhealth_rc", adjust="tukey"))
  
# emmeans(aov_model2, specs="pet_rc", by="mhealth_rc", adjust="sidak")
# pairs(emmeans(aov_model2, specs="pet_rc", by="mhealth_rc", adjust="sidak"))
 
# emmeans(aov_model2, specs="mhealth_rc", by="pet_rc", adjust="sidak")
# pairs(emmeans(aov_model2, specs="mhealth_rc", by="pet_rc", adjust="sidak"))

Write Up Results

One-Way ANOVA

The above hypothesis showed that females will have significantly higher stress levels (measured by the PSS) when compared to males, those who use another term, and those who prefer not to say.

Based on the data report, minimal issues were found.

There was independence of observations with 8 different variables and 118 observations. Two gender categories were combined, “Prefer not to say” and “I use another term” merged into “Other.”

F(2,115) = 4.06, p = .020, np > .01

Based on Figure 1, there is a significant effect of gender on percieved stress score, F (2, 115) = 4.06, p = .020, np > .01. There was a significant effect between female and male with a p = 0.0374 and female score being higher than males.

Effect size is medium with pes = 0.066 and medium cutoff at pes = 0.06.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.