library(psych) # for the describe() command
library(ggplot2) # to visualize our results
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(expss) # for the cross_cases() command
## Loading required package: maditr
##
## To get total summary skip 'by' argument: take_all(mtcars, mean)
##
## Attaching package: 'expss'
## The following object is masked from 'package:ggplot2':
##
## vars
library(car) # for the leveneTest() command
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:expss':
##
## recode
## The following object is masked from 'package:psych':
##
## logit
library(afex) # to run the ANOVA and plot results
## Loading required package: lme4
## Loading required package: Matrix
##
## Attaching package: 'lme4'
## The following object is masked from 'package:expss':
##
## dummy
## ************
## Welcome to afex. For support visit: http://afex.singmann.science/
## - Functions for ANOVAs: aov_car(), aov_ez(), and aov_4()
## - Methods for calculating p-values with mixed(): 'S', 'KR', 'LRT', and 'PB'
## - 'afex_aov' and 'mixed' objects can be passed to emmeans() for follow-up tests
## - Get and set global package options with: afex_options()
## - Set sum-to-zero contrasts globally: set_sum_contrasts()
## - For example analyses see: browseVignettes("afex")
## ************
##
## Attaching package: 'afex'
## The following object is masked from 'package:lme4':
##
## lmer
# options(repos = "https://cloud.r-project.org")
# devtools::install_version("estimability","1.4")
# devtools::install_version("emmeans","1.7.0")
library(emmeans) # for posthoc tests
# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
# use ARC data
d <- read.csv(file="Data/eammi2_new_final.csv", header=T)
# for the HW, you may or may not need to use the code below this comment
# check to see if you have a variable called 'X' or 'ResponseId' in your data
names(d)
## [1] "gender" "age" "idea" "swb" "mindful" "socmeduse"
# if you do have 'X' or 'ResponseId' (aka your ID variable) then DELETE THE CODE BELOW
# if you don't have those variables, keep the code and run it
d$row_id <- 1:nrow(d)
Note: You can chose to run either a one-way ANOVA (a single IV with more than 3 levels) or a two-way/factorial ANOVA (at least two IVs) for the homework. You will need to specify your hypothesis and customize your code based on the choice you make. I will run both versions of the test here for illustrative purposes.
One-Way: We predict that there will be a significant effect of gender on subjective well-being, as measured by the satisfaction with life scale.
# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)
## 'data.frame': 2156 obs. of 7 variables:
## $ gender : chr "f" "m" "m" "f" ...
## $ age : chr "1 between 18 and 25" "1 between 18 and 25" "1 between 18 and 25" "1 between 18 and 25" ...
## $ idea : num 3.75 3.88 3.75 3.75 3.5 ...
## $ swb : num 4.33 4.17 1.83 5.17 3.67 ...
## $ mindful : num 2.4 1.8 2.2 2.2 3.2 ...
## $ socmeduse: int 47 23 34 35 37 13 37 43 37 29 ...
## $ row_id : int 1 2 3 4 5 6 7 8 9 10 ...
# make our categorical variables factors
d$row_id <- as.factor(d$row_id) #we'll actually use our ID variable for this analysis, so make sure it's coded as a factor
d$gender <- as.factor(d$gender)
d$row_id <- as.factor(d$row_id)
# check your categorical variables
table(d$gender)
##
## f m nb
## 1583 542 31
# also use histograms to examine your continuous variable
hist(d$swb)
Assumptions checked below:
If you have confirmed everything else…
# check your categorical variables and make sure they have decent cell sizes
# they should have at least 5 participants in each cell
# but larger numbers are always better
table(d$gender)
##
## f m nb
## 1583 542 31
# you can use the describe() command on an entire dataframe (d) or just on a single variable
describe(d$swb)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 2156 4.43 1.33 4.5 4.49 1.48 1 7 6 -0.35 -0.49 0.03
# we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d$swb, group = d$gender)
##
## Descriptive statistics by group
## group: f
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1583 4.43 1.31 4.5 4.49 1.48 1 7 6 -0.38 -0.5 0.03
## ------------------------------------------------------------
## group: m
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 542 4.48 1.37 4.67 4.52 1.48 1 7 6 -0.31 -0.49 0.06
## ------------------------------------------------------------
## group: nb
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 31 3.87 1.26 4 3.93 1.48 1 5.67 4.67 -0.37 -0.93 0.23
# use the leveneTest() command from the car package to test homogeneity of variance
# uses the 'formula' setup: formula is y~x1*x2, where y is our DV and x1 is our first IV and x2 is our second IV
leveneTest(swb~gender, data = d)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.9872 0.3728
## 2153
# use the lm() command to run the regression
# formula is y~x1*x2, where y is our DV, x1 is our first IV and x2 is our second IV
reg_model <- lm(swb ~ gender, data = d) #for one-way
# Cook's distance
plot(reg_model, 4)
# Residuals vs Leverage
plot(reg_model, 5)
Our cell sizes are very unbalanced. A small sample size for one of the levels of our variable limits our power and increases our Type II error rate.
aov_model <- aov_ez(data = d,
id = "row_id",
between = c("gender"),
dv = "swb",
anova_table = list(es = "pes"))
## Contrasts set to contr.sum for the following variables: gender
Effect size cutoffs from Cohen (1988):
nice(aov_model)
## Anova Table (Type 3 tests)
##
## Response: swb
## Effect df MSE F pes p.value
## 1 gender 2, 2153 1.76 3.18 * .003 .042
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
afex_plot(aov_model, x = "gender")
Only run posthocs if the test is significant! E.g., only run the posthoc tests on gender if there is a main effect for gender.
emmeans(aov_model, specs="gender", adjust="tukey")
## Note: adjust = "tukey" was changed to "sidak"
## because "tukey" is only appropriate for one set of pairwise comparisons
## gender emmean SE df lower.CL upper.CL
## f 4.43 0.0333 2153 4.35 4.51
## m 4.48 0.0569 2153 4.34 4.62
## nb 3.87 0.2379 2153 3.30 4.43
##
## Confidence level used: 0.95
## Conf-level adjustment: sidak method for 3 estimates
pairs(emmeans(aov_model, specs="gender", adjust="tukey"))
## contrast estimate SE df t.ratio p.value
## f - m -0.0494 0.0659 2153 -0.749 0.7341
## f - nb 0.5647 0.2403 2153 2.350 0.0494
## m - nb 0.6141 0.2447 2153 2.510 0.0325
##
## P value adjustment: tukey method for comparing a family of 3 estimates
To test our hypothesis that there would be a significant effect of gender on stress, we used a one-way ANOVA. Our data was unbalanced, with many more women participating in our survey (n = 1583) than men (n = 542) or non-binary and other gender participants (n = 31). This significantly reduces the power of our test and increases the chances of a Type II error.
We found a significant effect of gender, F(2,2153) = 3.18, p < .042, ηp2 < .01 (trivial effect size; Cohen, 1988). Posthoc tests using Tukey’s HSD revealed that women reported lower subjective well-being than men but higher subjective well-being than non-binary and other gender participants, while non-binary and other gender participants reported the least amount of subjective well-being overall (see Figure 1 for a comparison).
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.