library(psych) # for the describe() command
library(ggplot2) # to visualize our results
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(expss) # for the cross_cases() command
## Loading required package: maditr
##
## To select rows from data: rows(mtcars, am==0)
##
## Attaching package: 'expss'
## The following object is masked from 'package:ggplot2':
##
## vars
library(car) # for the leveneTest() command
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:expss':
##
## recode
## The following object is masked from 'package:psych':
##
## logit
library(afex) # to run the ANOVA and plot results
## Loading required package: lme4
## Loading required package: Matrix
##
## Attaching package: 'lme4'
## The following object is masked from 'package:expss':
##
## dummy
## ************
## Welcome to afex. For support visit: http://afex.singmann.science/
## - Functions for ANOVAs: aov_car(), aov_ez(), and aov_4()
## - Methods for calculating p-values with mixed(): 'S', 'KR', 'LRT', and 'PB'
## - 'afex_aov' and 'mixed' objects can be passed to emmeans() for follow-up tests
## - Get and set global package options with: afex_options()
## - Set sum-to-zero contrasts globally: set_sum_contrasts()
## - For example analyses see: browseVignettes("afex")
## ************
##
## Attaching package: 'afex'
## The following object is masked from 'package:lme4':
##
## lmer
library(emmeans) # for posthoc tests
# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
# use ARC data
d <- read.csv(file="data/EAMMi2_final.csv", header=T)
# new code! this adds a column with a number for each row. it makes it easier when we drop outliers later
d$row_id <- 1:nrow(d)
Note: You can chose to run either a one-way ANOVA (a single IV with more than 3 levels) or a two-way/factorial ANOVA (at least two IVs) for the homework. You will need to specify your hypothesis and customize your code based on the choice you make. I will run both versions of the test here for illustrative purposes.
I predict that there will be significant effects of gender and race/ethnicity on efficacy. I also predict that race/ethnicity and gender will interact and that women of color will report significantly less efficacy than men of color or white men and women.
# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)
## 'data.frame': 3182 obs. of 7 variables:
## $ race_rc : chr "white" "white" "white" "other" ...
## $ gender : chr "f" "m" "m" "f" ...
## $ stress : num 3.3 3.6 3.3 3.2 3.5 2.9 3.2 3 2.9 3.2 ...
## $ swb : num 4.33 4.17 1.83 5.17 3.67 ...
## $ efficacy: num 3.4 3.4 2.2 2.8 3 2.4 2.3 3 3 3.7 ...
## $ mindful : num 6.6 7.2 6.8 6.8 5.8 ...
## $ row_id : int 1 2 3 4 5 6 7 8 9 10 ...
# make our categorical variables factors
d$gender <- as.factor(d$gender)
d$race_rc <- as.factor(d$race_rc)
d$row_id <- as.factor(d$row_id)
# we're going to recode our race/ethnicity variable into two groups: poc and white
table(d$race_rc)
##
## asian black hispanic multiracial nativeamer other
## 210 249 286 293 12 97
## white
## 2026
d$poc[d$race_rc == "asian"] <- "poc"
d$poc[d$race_rc == "black"] <- "poc"
d$poc[d$race_rc == "mideast"] <- "poc"
d$poc[d$race_rc == "multiracial"] <- "poc"
d$poc[d$race_rc == "other"] <- "poc"
d$poc[d$race_rc == "prefer_not"] <- NA
d$poc[d$race_rc == "white"] <- "white"
table(d$poc)
##
## poc white
## 849 2026
d$poc <- as.factor(d$poc)
# you can use the describe() command on an entire dataframe (d) or just on a single variable
describe(d$efficacy)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 3176 3.13 0.45 3.1 3.13 0.44 1 4 3 -0.29 0.63 0.01
# we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d$efficacy, group = d$gender)
##
## Descriptive statistics by group
## group: f
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 2327 3.1 0.45 3.1 3.11 0.44 1.2 4 2.8 -0.27 0.63 0.01
## ------------------------------------------------------------
## group: m
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 792 3.21 0.44 3.2 3.21 0.44 1.5 4 2.5 -0.16 -0.1 0.02
## ------------------------------------------------------------
## group: nb
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 53 3 0.54 3 3.03 0.44 1.1 4 2.9 -0.64 1.24 0.07
describeBy(d$efficacy, group = d$poc)
##
## Descriptive statistics by group
## group: poc
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 847 3.12 0.46 3.1 3.12 0.44 1.4 4 2.6 -0.18 0.02 0.02
## ------------------------------------------------------------
## group: white
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 2024 3.12 0.44 3.1 3.13 0.44 1.1 4 2.9 -0.26 0.66 0.01
# also use histograms to examine your continuous variable
hist(d$efficacy)
# and cross_cases() to examine your categorical variables
cross_cases(d, gender, poc)
|  poc | ||
|---|---|---|
|  poc |  white | |
|  gender | ||
|    f | 630 | 1480 |
|    m | 205 | 508 |
|    nb | 14 | 38 |
|    #Total cases | 849 | 2026 |
table(d$gender)
##
## f m nb
## 2332 792 54
cross_cases(d, gender, poc)
|  poc | ||
|---|---|---|
|  poc |  white | |
|  gender | ||
|    f | 630 | 1480 |
|    m | 205 | 508 |
|    nb | 14 | 38 |
|    #Total cases | 849 | 2026 |
# our number of small nb participants is going to hurt us for the two-way anova, but it should be okay for the one-way anova
# so we'll create a new dataframe for the two-way analysis and call it d2
d <- subset(d, gender != "nb")
d$gender <- droplevels(d$gender)
# to double-check any changes we made
cross_cases(d, gender, poc)
|  poc | ||
|---|---|---|
|  poc |  white | |
|  gender | ||
|    f | 630 | 1480 |
|    m | 205 | 508 |
|    #Total cases | 835 | 1988 |
# use the leveneTest() command from the car package to test homogeneity of variance
# uses the 'formula' setup: formula is y~x1*x2, where y is our DV and x1 is our first IV and x2 is our second IV
leveneTest(efficacy~gender*poc, data = d)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 1.5532 0.1987
## 2816
# use this commented out section only if you need to remove outliers
# to drop a single outlier, remove the # at the beginning of the line and use this code:
# d <- subset(d, row_id!=c(1108))
# to drop multiple outliers, remove the # at the beginning of the line and use this code:
# d <- subset(d, row_id!=c(1108) & row_id!=c(602))
# use the lm() command to run the regression
# formula is y~x1*x2 + c, where y is our DV, x1 is our first IV, x2 is our second IV, and c is our covariate
reg_model <- lm(efficacy ~ gender*poc, data = d) #for two-way
# Cook's distance
plot(reg_model, 4)
# Residuals vs Leverage
plot(reg_model, 5)
The sizes of the cells within my data are unbalanced, and a small sample size for one of the levels of our variables increases the probability of a Type II error. Because of this, I dropped the Non-binary/Other gender level.
All of the other assumptions are met.
aov_model <- aov_ez(data = d,
id = "row_id",
between = c("gender","poc"),
dv = "efficacy",
anova_table = list(es = "pes"))
## Warning: Missing values for 304 ID(s), which were removed before analysis:
## 65, 74, 76, 93, 94, 98, 101, 105, 113, 114, ... [showing first 10 only]
## Below the first few rows (in wide format) of the removed cases with missing data.
## row_id gender poc .
## # 65 65 f <NA> 3.0
## # 74 74 f <NA> 2.8
## # 76 76 f <NA> 3.8
## # 93 93 f <NA> 3.4
## # 94 94 f <NA> 3.4
## # 98 98 f <NA> 2.8
## Contrasts set to contr.sum for the following variables: gender, poc
Effect size cutoffs from Cohen (1988):
nice(aov_model)
## Anova Table (Type 3 tests)
##
## Response: efficacy
## Effect df MSE F pes p.value
## 1 gender 1, 2816 0.19 19.05 *** .007 <.001
## 2 poc 1, 2816 0.19 1.55 <.001 .214
## 3 gender:poc 1, 2816 0.19 1.90 <.001 .168
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
afex_plot(aov_model, x = "gender", trace = "poc")
afex_plot(aov_model, x = "poc", trace = "gender")
Only run posthocs if the test is significant! E.g., only run the posthoc tests on gender if there is a main effect for gender.
emmeans(aov_model, specs="gender", adjust="tukey")
## NOTE: Results may be misleading due to involvement in interactions
## Note: adjust = "tukey" was changed to "sidak"
## because "tukey" is only appropriate for one set of pairwise comparisons
## gender emmean SE df lower.CL upper.CL
## f 3.10 0.0105 2816 3.08 3.12
## m 3.19 0.0182 2816 3.15 3.23
##
## Results are averaged over the levels of: poc
## Confidence level used: 0.95
## Conf-level adjustment: sidak method for 2 estimates
pairs(emmeans(aov_model, specs="gender", adjust="tukey"))
## NOTE: Results may be misleading due to involvement in interactions
## contrast estimate SE df t.ratio p.value
## f - m -0.0918 0.021 2816 -4.364 <.0001
##
## Results are averaged over the levels of: poc
To test my hypothesis that gender and race would impact and significantly interact with efficacy, I performed a two-way/factorial ANOVA. My data met most of the assumptions; however, there was a small cell size for the Non-Binary/Other Gender participants, so I dropped that level from my analysis. Additionally, our data between men and women is unbalanced. There are more women (n = 2110) than men (n = 713) participating in the survey. Visual analysis using a Residuals vs Leverage plot did not appear to have any outliers, so no outliers were removed.
As predicted, I found a significant main effect for gender, F(1,2816) = 19.05, p < .001, ηp2 .007 (trivial effect; Cohen, 1988). As predicted, men (M = 3.19, SE = .018) reported higher levels of efficacy compared to women (M = 3.10, SE = .011). Contrary to my hypothesis, I did not find a significant main effect for race (p = .214) or the interaction between gender and race (p = .168).
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.