library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests
ANOVA HW
Loading Libraries
Importing Data
<- read.csv(file="Data/mydata.csv", header=T)
d #
# new code! this adds a column with a number for each row. it acts as an identifier for our participations
$row_id <- 1:nrow(d) d
State Your Hypothesis - PART OF YOUR WRITEUP
One-way ANOVA: Nonbinary participants will report significantly lower levels of perceived social support (measured by the Multidimensional Scale of Perceived Social Support) when compared to male and female participants.
Check Your Assumptions
ANOVA Assumptions
- Independence of observations - how we collected one variable isn’t reliant on how we collected another variable (confirmed by data report)
- All levels of the IVs should have equal number of cases (ideally; in the real world, this varies) and there should be no empty cells. Cells with low numbers increase chance of Type II error. (we will check this below)
- Homogeneity of variance should be assured (we will check this below)
- Outliers should be identified and removed (we will check this below)
- DV should be normally distributed for each level of the IV (we will check this below)
Check levels of IVs
# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
table(d$gender, useNA = "always")
f m nb <NA>
1585 547 31 0
Check homogeneity of variance
# use the leveneTest() command from the car package to test homogeneity of variance
# uses the 'formula' setup: formula is y~x1*x2, where y is our DV and x1 is our first IV and x2 is our second IV
# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
leveneTest(support~gender, data = d)
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 2 0.8495 0.4278
2160
Check for outliers using Cook’s distance and Residuals vs Leverage plot
Run a Regression
# # use this commented out section only if you need to remove outliers
# # to drop a single outlier, remove the # at the beginning of the line and use this code:
# # d <- subset(d, row_id!=c(1108))
#
# # to drop multiple outliers, remove the # at the beginning of the line and use this code:
# # d <- subset(d, row_id!=c(1108) & row_id!=c(602))
#
# use the lm() command to run the regression
# formula is y~x1*x2 + c, where y is our DV, x1 is our first IV, x2 is our second IV, and c is our covariate
# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
<- lm(support ~ gender, data = d) #for one-way reg_model
Check for outliers
# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT ; .5 CUTOFF
# Cook's distance
plot(reg_model, 4)
# Residuals vs Leverage - RED LINE CLOSE TO DASH LINE IS GOOD
plot(reg_model, 5)
Check Your Variables
# we'll use the describeBy() command to view skew and kurtosis across our IVs and make sure the DV is normally distributed across all of the levels
describeBy(d$support, group = d$gender)
Descriptive statistics by group
group: f
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1585 5.57 1.13 5.83 5.7 0.99 0 7 7 -1.15 1.56 0.03
------------------------------------------------------------
group: m
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 547 5.44 1.16 5.67 5.55 1.11 1 7 6 -0.99 0.95 0.05
------------------------------------------------------------
group: nb
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 31 5.15 1.16 5.42 5.19 1.36 2.5 7 4.5 -0.31 -0.94 0.21
Issues with My Data - PART OF YOUR WRITEUP
Regarding potential issues with our data, we confirmed independence of observations by checking the data report. We checked our cell sizes, and in this case, we did not have to drop or combine any cells. Our Levene’s test was non-significant (p = 0.428) and our dependent variable is normally distrubuted (skew and kurtosis between -2 & +2).
Run an ANOVA
# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
<- aov_ez(data = d,
aov_model id = "row_id",
between = c("gender"),
dv = "support",
anova_table = list(es = "pes"))
Converting to factor: gender
Contrasts set to contr.sum for the following variables: gender
View Output
Effect size cutoffs from Cohen (1988):
- ηp2 = 0.01 indicates a small effect
- ηp2 = 0.06 indicates a medium effect
- ηp2 = 0.14 indicates a large effect
# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
nice(aov_model)
Anova Table (Type 3 tests)
Response: support
Effect df MSE F pes p.value
1 gender 2, 2160 1.29 4.49 * .004 .011
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
Visualize Results
# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
afex_plot(aov_model, x = "gender")
Run Posthoc Tests
Only run posthocs if the test is significant! E.g., only run the posthoc tests on gender if there is a main effect for gender.
# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
emmeans(aov_model, specs="gender", adjust="tukey")
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
gender emmean SE df lower.CL upper.CL
f 5.57 0.0285 2160 5.50 5.64
m 5.44 0.0485 2160 5.32 5.56
nb 5.15 0.2040 2160 4.66 5.63
Confidence level used: 0.95
Conf-level adjustment: sidak method for 3 estimates
pairs(emmeans(aov_model, specs="gender", adjust="tukey"))
contrast estimate SE df t.ratio p.value
f - m 0.130 0.0563 2160 2.314 0.0541
f - nb 0.424 0.2060 2160 2.061 0.0984
m - nb 0.294 0.2100 2160 1.403 0.3397
P value adjustment: tukey method for comparing a family of 3 estimates
Write Up Results
One-Way ANOVA
We hypothesized that nonbinary participants will report significantly lower levels of perceived social support compared to male and female participants.
Regarding potential issues with our data, we confirmed independence of observations by checking the data report. We checked our cell sizes, and in this case, we did not have to drop or combine any cells. Our Levene’s test was non-significant (p = 0.428) and our dependent variable came back normally distributed (skew and kurtosis between -2 & +2).
Our ANOVA test indicated significant differences, F(2, 2160) = 4.49, p = .011. Our effect size was small, np2 = .004 (Cohen 1988). However, our post-hoc results indicated no significant differences between any of the groups (See Figure 1).
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.