library(psych) # for the describe() command
library(ggplot2) # to visualize our results
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(expss) # for the cross_cases() command
## Loading required package: maditr
##
## To get total summary skip 'by' argument: take_all(mtcars, mean)
##
## Attaching package: 'maditr'
## The following object is masked from 'package:base':
##
## sort_by
##
## Use 'expss_output_viewer()' to display tables in the RStudio Viewer.
## To return to the console output, use 'expss_output_default()'.
##
## Attaching package: 'expss'
## The following object is masked from 'package:ggplot2':
##
## vars
library(car) # for the leveneTest() command
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:expss':
##
## recode
## The following object is masked from 'package:psych':
##
## logit
library(afex) # to run the ANOVA and plot results
## Loading required package: lme4
## Loading required package: Matrix
##
## Attaching package: 'lme4'
## The following object is masked from 'package:expss':
##
## dummy
## ************
## Welcome to afex. For support visit: http://afex.singmann.science/
## - Functions for ANOVAs: aov_car(), aov_ez(), and aov_4()
## - Methods for calculating p-values with mixed(): 'S', 'KR', 'LRT', and 'PB'
## - 'afex_aov' and 'mixed' objects can be passed to emmeans() for follow-up tests
## - Get and set global package options with: afex_options()
## - Set sum-to-zero contrasts globally: set_sum_contrasts()
## - For example analyses see: browseVignettes("afex")
## ************
##
## Attaching package: 'afex'
## The following object is masked from 'package:lme4':
##
## lmer
library(emmeans) # for posthoc tests
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'
# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
# use ARC data
d <- read.csv(file="Data/mydata.csv", header=T)
# new code! this adds a column with a number for each row. it makes it easier when we drop outliers later
d$row_id <- 1:nrow(d)
Note: You can chose to run either a one-way ANOVA (a single IV with more than 3 levels) or a two-way/factorial ANOVA (at least two IVs) for the homework. You will need to specify your hypothesis and customize your code based on the choice you make. I will run both versions of the test here for illustrative purposes.
One-Way: We predict that there will be a significant effect of gender on the need to belong.
# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)
## 'data.frame': 2104 obs. of 7 variables:
## $ gender : chr "f" "m" "m" "f" ...
## $ age : chr "1 between 18 and 25" "1 between 18 and 25" "1 between 18 and 25" "1 between 18 and 25" ...
## $ moa_independence: num 3.67 3.67 3.5 3 3.83 ...
## $ swb : num 4.33 4.17 1.83 5.17 3.67 ...
## $ belong : num 2.8 4.2 3.6 4 3.4 4.2 3.9 3.6 2.9 2.5 ...
## $ socmeduse : int 47 23 34 35 37 13 37 43 37 29 ...
## $ row_id : int 1 2 3 4 5 6 7 8 9 10 ...
# make our categorical variables factors
d$row_id <- as.factor(d$row_id) #we'll actually use our ID variable for this analysis, so make sure it's coded as a factor
d$gender <- as.factor(d$gender)
d$row_id <- as.factor(d$row_id)
# you can use the describe() command on an entire dataframe (d) or just on a single variable
describe(d$belong)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 2104 3.21 0.61 3.2 3.23 0.59 1.3 5 3.7 -0.27 -0.12 0.01
# we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d$belong, group = d$gender)
##
## Descriptive statistics by group
## group: f
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1546 3.26 0.59 3.3 3.28 0.59 1.3 5 3.7 -0.24 -0.24 0.01
## ------------------------------------------------------------
## group: m
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 527 3.06 0.65 3.1 3.08 0.59 1.3 4.9 3.6 -0.22 -0.1 0.03
## ------------------------------------------------------------
## group: nb
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 31 3.34 0.5 3.3 3.33 0.44 2.3 4.6 2.3 0.29 -0.09 0.09
# also use histograms to examine your continuous variable
hist(d$belong)
# and cross_cases() to examine your categorical variables
cross_cases(d, gender)
| #Total | |
|---|---|
| gender | |
| f | 1546 |
| m | 527 |
| nb | 31 |
| #Total cases | 2104 |
table(d$gender)
##
## f m nb
## 1546 527 31
cross_cases(d, gender)
| #Total | |
|---|---|
| gender | |
| f | 1546 |
| m | 527 |
| nb | 31 |
| #Total cases | 2104 |
# our number of small nb participants is going to hurt us for the two-way anova, but it should be okay for the one-way anova
# so we'll create a new dataframe for the two-way analysis and call it d2
d2 <- subset(d, gender != "nb")
d2$gender <- droplevels(d2$gender)
# to double-check any changes we made
cross_cases(d2, gender)
| #Total | |
|---|---|
| gender | |
| f | 1546 |
| m | 527 |
| #Total cases | 2073 |
# use the leveneTest() command from the car package to test homogeneity of variance
# uses the 'formula' setup: formula is y~x1*x2, where y is our DV and x1 is our first IV and x2 is our second IV
leveneTest(belong~gender, data = d)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 4.2068 0.01502 *
## 2101
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# use this commented out section only if you need to remove outliers
# to drop a single outlier, remove the # at the beginning of the line and use this code:
# d <- subset(d, row_id!=c(1108))
# to drop multiple outliers, remove the # at the beginning of the line and use this code:
# d <- subset(d, row_id!=c(1108) & row_id!=c(602))
# use the lm() command to run the regression
# formula is y~x1*x2 + c, where y is our DV, x1 is our first IV, x2 is our second IV, and c is our covariate
reg_model <- lm(belong ~ gender, data = d) #for one-way
# Cook's distance
plot(reg_model, 4)
# Residuals vs Leverage
plot(reg_model, 5)
Our cell sizes are fairly balanced. A small sample size for one of the levels of our variable limits our power and increases our Type II error rate.
Levene’s test is significant for our three-level gender variable. We are ignoring this and continuing with the analysis anyway, but in the real world this is something we would have to correct for.
aov_model <- aov_ez(data = d,
id = "row_id",
between = c("gender"),
dv = "belong",
anova_table = list(es = "pes"))
## Contrasts set to contr.sum for the following variables: gender
Effect size cutoffs from Cohen (1988):
nice(aov_model)
## Anova Table (Type 3 tests)
##
## Response: belong
## Effect df MSE F pes p.value
## 1 gender 2, 2101 0.36 23.46 *** .022 <.001
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
afex_plot(aov_model, x = "gender")
Only run posthocs if the test is significant! E.g., only run the posthoc tests on gender if there is a main effect for gender.
emmeans(aov_model, specs="gender", adjust="tukey")
## Note: adjust = "tukey" was changed to "sidak"
## because "tukey" is only appropriate for one set of pairwise comparisons
## gender emmean SE df lower.CL upper.CL
## f 3.26 0.0153 2101 3.23 3.30
## m 3.06 0.0263 2101 2.99 3.12
## nb 3.34 0.1083 2101 3.08 3.60
##
## Confidence level used: 0.95
## Conf-level adjustment: sidak method for 3 estimates
pairs(emmeans(aov_model, specs="gender", adjust="tukey"))
## contrast estimate SE df t.ratio p.value
## f - m 0.2052 0.0304 2101 6.744 <.0001
## f - nb -0.0793 0.1094 2101 -0.724 0.7491
## m - nb -0.2844 0.1115 2101 -2.552 0.0291
##
## P value adjustment: tukey method for comparing a family of 3 estimates
To test our hypothesis that there would be a significant effect of gender on the need to belong, we used a one-way ANOVA. Our data was unbalanced, with many more women participating in our survey (n = 1546) than men (n = 527) or non-binary and other gender participants (n = 31). This significantly reduces the power of our test and increases the chances of a Type II error. We also identified and removed a single outlier following visual analysis of a Residuals vs Leverage plot. A significant Levene’s test (p = .002) also indicates that our data violates the assumption of homogeneity of variance. This suggests that there is an increased chance of Type I error. We continued with our analysis for the purpose of this class.
We found a significant effect of gender, F(2,2102) = 23.46, p < .001, ηp2 = 0.022 (large effect size; Cohen, 1988). Posthoc tests using Tukey’s HSD revealed that women reported more of a need to belong than men but less of a need to belong than non-binary and other gender participants, while non-binary and other gender participants reported the highest amount of a need to belong overall (see Figure 1 for a comparison).
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.