library(psych) # for the describe() command
library(ggplot2) # to visualize our results
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(expss) # for the cross_cases() command
## Loading required package: maditr
##
## To aggregate data: take(mtcars, mean_mpg = mean(mpg), by = am)
##
## Use 'expss_output_rnotebook()' to display tables inside R Notebooks.
## To return to the console output, use 'expss_output_default()'.
##
## Attaching package: 'expss'
## The following object is masked from 'package:ggplot2':
##
## vars
library(car) # for the leveneTest() command
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:expss':
##
## recode
## The following object is masked from 'package:psych':
##
## logit
library(afex) # to run the ANOVA and plot results
## Loading required package: lme4
## Loading required package: Matrix
##
## Attaching package: 'lme4'
## The following object is masked from 'package:expss':
##
## dummy
## ************
## Welcome to afex. For support visit: http://afex.singmann.science/
## - Functions for ANOVAs: aov_car(), aov_ez(), and aov_4()
## - Methods for calculating p-values with mixed(): 'S', 'KR', 'LRT', and 'PB'
## - 'afex_aov' and 'mixed' objects can be passed to emmeans() for follow-up tests
## - Get and set global package options with: afex_options()
## - Set sum-to-zero contrasts globally: set_sum_contrasts()
## - For example analyses see: browseVignettes("afex")
## ************
##
## Attaching package: 'afex'
## The following object is masked from 'package:lme4':
##
## lmer
library(emmeans) # for posthoc tests
# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
# use ARC data
d <- read.csv(file="eammi2_final.csv", header=T)
# new code! this adds a column with a number for each row. it makes it easier when we drop outliers later
d$row_id <- 1:nrow(d)
One-Way: We predict that there will be a significant effect of gender on stress, as measured by the perceived stress scale (PSS).
# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)
## 'data.frame': 3182 obs. of 8 variables:
## $ ResponseId: chr "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
## $ gender : chr "f" "m" "m" "f" ...
## $ race_rc : chr "white" "white" "white" "other" ...
## $ swb : num 4.33 4.17 1.83 5.17 3.67 ...
## $ stress : num 3.3 3.6 3.3 3.2 3.5 2.9 3.2 3 2.9 3.2 ...
## $ belong : num 3.4 3.4 3.6 3.6 3.2 3.4 3.5 3.2 3.5 2.7 ...
## $ SocMedia : num 4.27 2.09 3.09 3.18 3.36 ...
## $ row_id : int 1 2 3 4 5 6 7 8 9 10 ...
# make our categorical variables factors
d$ResponseId <- as.factor(d$ResponseId) #we'll actually use our ID variable for this analysis, so make sure it's coded as a factor
d$gender <- as.factor(d$gender)
d$race_rc <- as.factor(d$race_rc)
d$row_id <- as.factor(d$row_id)
# you can use the describe() command on an entire dataframe (d) or just on a single variable
describe(d$stress)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 3175 3.27 0.41 3.3 3.26 0.44 1 5 4 -0.16 2.67 0.01
# we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d$stress, group = d$gender)
##
## Descriptive statistics by group
## group: f
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 2326 3.29 0.4 3.3 3.29 0.44 1 5 4 -0.06 2.16 0.01
## ------------------------------------------------------------
## group: m
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 791 3.18 0.41 3.2 3.18 0.44 1 5 4 -0.2 3.41 0.01
## ------------------------------------------------------------
## group: nb
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 54 3.36 0.37 3.4 3.36 0.3 2 4.4 2.4 -0.37 2.26 0.05
# also use histograms to examine your continuous variable
hist(d$stress)
table(d$gender)
##
## f m nb
## 2332 792 54
# use the leveneTest() command from the car package to test homogeneity of variance
# uses the 'formula' setup: formula is y~x1*x2, where y is our DV and x1 is our first IV and x2 is our second IV
leveneTest(stress~gender, data = d)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.4129 0.6617
## 3168
# use the lm() command to run the regression
# formula is y~x1*x2 + c, where y is our DV, x1 is our first IV, x2 is our second IV, and c is our covariate
reg_model <- lm(stress ~ gender, data = d) #for one-way
# Cook's distance
plot(reg_model, 4)
# Residuals vs Leverage
plot(reg_model, 5)
Our cell sizes are very unbalanced especially because we’ve kept our low amount of non-binary participants (because we need 3+ cells for one-way ANOVA). A small sample size for one of the levels of our variable limits our power and increases our Type II error rate.
Levene’s test is not significant for our three-level gender variable.
We identified our highest outliers but did not remove any because there was a straight red line already when displaying residuals vs leverage.
aov_model <- aov_ez(data = d,
id = "ResponseId",
between = c("gender"),
dv = "stress",
anova_table = list(es = "pes"))
## Warning: Missing values for 11 ID(s), which were removed before analysis:
## R_1EihKePhUenEL9L, R_1oBJ33zLPm5Lusa, R_2tbf8FyKp1Dg0wV, R_3PZvgQSzbXD8PG5, R_3QXfRHzrQr2VUE1, R_6sO4a1z8iDHY1kV, R_77J1rfMXuRJR5vP, R_9EPStPMnugteJXn, R_9n2p79F9uUrbMpH, R_Y69XOURBCDXZd0R, ... [showing first 10 only]
## Below the first few rows (in wide format) of the removed cases with missing data.
## ResponseId gender .
## # 215 R_1EihKePhUenEL9L <NA> 2.8
## # 587 R_1oBJ33zLPm5Lusa f NA
## # 1313 R_2tbf8FyKp1Dg0wV f NA
## # 2211 R_3PZvgQSzbXD8PG5 f NA
## # 2256 R_3QXfRHzrQr2VUE1 m NA
## # 2414 R_6sO4a1z8iDHY1kV <NA> 3.5
## Contrasts set to contr.sum for the following variables: gender
Effect size cutoffs from Cohen (1988):
nice(aov_model)
## Anova Table (Type 3 tests)
##
## Response: stress
## Effect df MSE F pes p.value
## 1 gender 2, 3168 0.16 25.09 *** .016 <.001
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
afex_plot(aov_model, x = "gender")
emmeans(aov_model, specs="gender", adjust="tukey")
## Note: adjust = "tukey" was changed to "sidak"
## because "tukey" is only appropriate for one set of pairwise comparisons
## gender emmean SE df lower.CL upper.CL
## f 3.294 0.008359 3168 3.274 3.314
## m 3.180 0.014334 3168 3.146 3.215
## nb 3.359 0.054859 3168 3.228 3.490
##
## Confidence level used: 0.95
## Conf-level adjustment: sidak method for 3 estimates
pairs(emmeans(aov_model, specs="gender", adjust="tukey"))
## contrast estimate SE df t.ratio p.value
## f - m 0.1141 0.0166 3168 6.878 <.0001
## f - nb -0.0648 0.0555 3168 -1.169 0.4721
## m - nb -0.1790 0.0567 3168 -3.157 0.0046
##
## P value adjustment: tukey method for comparing a family of 3 estimates
To test our hypothesis that there would be a significant effect of gender on stress, we used a one-way ANOVA. Our data was unbalanced, with many more women participating in our survey (n = 2326) than men (n = 791) or non-binary participants (n = 54). This significantly reduces the power of our test and increases the chances of a Type II error. We also identified our highest outliers but did no removing since they looked like they were giving us no issues. An insignificant Levene’s test (p = .66) also indicates that our data does not violate the assumption of homogeneity of variance.
We found a significant effect of gender, F(2,3168) = 25.09, p < .001, ηp2 = .016 (large effect size; Cohen, 1988). Posthoc tests using Tukey’s HSD revealed that women reported more stress than men but less stress than non-binary participants, while non-binary participants reported the highest amount of stress overall (see Figure 1 for a comparison).
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.
Cohen, S, Kamarck, T and Mermelstein, R 1983 A global measure of perceived stress. Journal of Health and Social Behavior, 24: 385–396. DOI: https://doi.org/10.2307/2136404