library(psych) # for the describe() command
library(ggplot2) # to visualize our results
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(expss) # for the cross_cases() command
## Loading required package: maditr
##
## To get total summary skip 'by' argument: take_all(mtcars, mean)
##
## Attaching package: 'maditr'
## The following object is masked from 'package:base':
##
## sort_by
##
## Use 'expss_output_rnotebook()' to display tables inside R Notebooks.
## To return to the console output, use 'expss_output_default()'.
##
## Attaching package: 'expss'
## The following object is masked from 'package:ggplot2':
##
## vars
library(car) # for the leveneTest() command
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:expss':
##
## recode
## The following object is masked from 'package:psych':
##
## logit
library(afex) # to run the ANOVA and plot results
## Loading required package: lme4
## Loading required package: Matrix
##
## Attaching package: 'lme4'
## The following object is masked from 'package:expss':
##
## dummy
## ************
## Welcome to afex. For support visit: http://afex.singmann.science/
## - Functions for ANOVAs: aov_car(), aov_ez(), and aov_4()
## - Methods for calculating p-values with mixed(): 'S', 'KR', 'LRT', and 'PB'
## - 'afex_aov' and 'mixed' objects can be passed to emmeans() for follow-up tests
## - Get and set global package options with: afex_options()
## - Set sum-to-zero contrasts globally: set_sum_contrasts()
## - For example analyses see: browseVignettes("afex")
## ************
##
## Attaching package: 'afex'
## The following object is masked from 'package:lme4':
##
## lmer
library(emmeans) # for posthoc tests
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'
# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
# use ARC data
d <- read.csv(file="Data/mydata.csv", header=T)
# new code! this adds a column with a number for each row. it makes it easier when we drop outliers later
d$row_id <- 1:nrow(d)
Note: You can chose to run either a one-way ANOVA (a single IV with more than 3 levels) or a two-way/factorial ANOVA (at least two IVs) for the homework. You will need to specify your hypothesis and customize your code based on the choice you make. I will run both versions of the test here for illustrative purposes.
I predict that there will be a significant effect of education level on stress. Furthermore, I predict that individuals with a higher level of education will experience higher stress.
# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)
## 'data.frame': 3069 obs. of 7 variables:
## $ edu : chr "2 Currently in college" "5 Completed Bachelors Degree" "2 Currently in college" "2 Currently in college" ...
## $ marriage5 : chr "are currently divorced from one another" "are currently married to one another" "are currently married to one another" "are currently married to one another" ...
## $ moa_independence: num 3.67 3.67 3.5 3 3.83 ...
## $ moa_maturity : num 3.67 3.33 3.67 3 3.67 ...
## $ mindful : num 2.4 1.8 2.2 2.2 3.2 ...
## $ stress : num 3.3 3.3 4 3.2 3.1 3.5 3.3 2.4 2.9 2.7 ...
## $ row_id : int 1 2 3 4 5 6 7 8 9 10 ...
# make our categorical variables factors
d$row_id <- as.factor(d$row_id) #we'll actually use our ID variable for this analysis, so make sure it's coded as a factor
d$edu <- as.factor(d$edu)
d$row_id <- as.factor(d$row_id)
# we're going to recode our race/ethnicity variable into two groups: poc and white
table(d$edu)
##
## 1 High school diploma or less, and NO COLLEGE
## 53
## 2 Currently in college
## 2482
## 3 Completed some college, but no longer in college
## 34
## 4 Complete 2 year College degree
## 175
## 5 Completed Bachelors Degree
## 136
## 6 Currently in graduate education
## 133
## 7 Completed some graduate degree
## 56
d$high_ed[d$edu == "2 Currently in college"] <- "high_ed"
d$high_ed[d$edu == "3 Completed some college, but no longer in college"] <- "standard_ed"
d$high_ed[d$edu == "4 Complete 2 year College degree"] <- "standard_ed"
d$high_ed[d$edu == "5 Completed Bachelors Degree"] <- "high_ed"
d$high_ed[d$edu == "6 Currently in graduate education"] <- "high_ed"
d$high_ed[d$edu == "7 Completed some graduate degree"] <- "high_ed"
d$high_ed[d$edu == "1 High school diploma or less, and NO COLLEGE"] <- "standard_ed"
table(d$high_ed)
##
## high_ed standard_ed
## 2807 262
d$high_ed <- as.factor(d$high_ed)
# you can use the describe() command on an entire dataframe (d) or just on a single variable
describe(d$stress)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 3069 3.05 0.6 3 3.05 0.59 1.3 4.7 3.4 0.04 -0.16 0.01
# we'll use the describeBy() command to view skew and kurtosis across our IVs
describeBy(d$stress, group = d$edu)
##
## Descriptive statistics by group
## group: 1 High school diploma or less, and NO COLLEGE
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 53 3.27 0.7 3.2 3.27 0.74 1.6 4.6 3 0.01 -0.44 0.1
## ------------------------------------------------------------
## group: 2 Currently in college
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 2482 3.07 0.59 3.1 3.06 0.59 1.3 4.7 3.4 0.04 -0.15 0.01
## ------------------------------------------------------------
## group: 3 Completed some college, but no longer in college
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 34 3.01 0.62 3.05 3.01 0.59 1.4 4.3 2.9 -0.12 -0.05 0.11
## ------------------------------------------------------------
## group: 4 Complete 2 year College degree
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 175 2.96 0.65 2.9 2.95 0.59 1.4 4.6 3.2 0.16 -0.07 0.05
## ------------------------------------------------------------
## group: 5 Completed Bachelors Degree
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 136 2.88 0.64 2.8 2.88 0.74 1.3 4.4 3.1 0.07 -0.65 0.06
## ------------------------------------------------------------
## group: 6 Currently in graduate education
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 133 2.97 0.6 3 2.97 0.59 1.5 4.3 2.8 0 -0.54 0.05
## ------------------------------------------------------------
## group: 7 Completed some graduate degree
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 56 2.92 0.66 2.95 2.92 0.59 1.5 4.4 2.9 0.02 -0.19 0.09
describeBy(d$stress, group = d$high_ed)
##
## Descriptive statistics by group
## group: high_ed
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 2807 3.05 0.6 3 3.05 0.59 1.3 4.7 3.4 0.03 -0.18 0.01
## ------------------------------------------------------------
## group: standard_ed
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 262 3.03 0.66 3 3.02 0.59 1.4 4.6 3.2 0.13 -0.12 0.04
# also use histograms to examine your continuous variable
hist(d$stress)
# and cross_cases() to examine your categorical variables
cross_cases(d, edu, high_ed)
| high_ed | ||
|---|---|---|
| high_ed | standard_ed | |
| edu | ||
| 1 High school diploma or less, and NO COLLEGE | 53 | |
| 2 Currently in college | 2482 | |
| 3 Completed some college, but no longer in college | 34 | |
| 4 Complete 2 year College degree | 175 | |
| 5 Completed Bachelors Degree | 136 | |
| 6 Currently in graduate education | 133 | |
| 7 Completed some graduate degree | 56 | |
| #Total cases | 2807 | 262 |
table(d$edu)
##
## 1 High school diploma or less, and NO COLLEGE
## 53
## 2 Currently in college
## 2482
## 3 Completed some college, but no longer in college
## 34
## 4 Complete 2 year College degree
## 175
## 5 Completed Bachelors Degree
## 136
## 6 Currently in graduate education
## 133
## 7 Completed some graduate degree
## 56
cross_cases(d, edu, high_ed)
| high_ed | ||
|---|---|---|
| high_ed | standard_ed | |
| edu | ||
| 1 High school diploma or less, and NO COLLEGE | 53 | |
| 2 Currently in college | 2482 | |
| 3 Completed some college, but no longer in college | 34 | |
| 4 Complete 2 year College degree | 175 | |
| 5 Completed Bachelors Degree | 136 | |
| 6 Currently in graduate education | 133 | |
| 7 Completed some graduate degree | 56 | |
| #Total cases | 2807 | 262 |
# our number of small nb participants is going to hurt us for the two-way anova, but it should be okay for the one-way anova
# so we'll create a new dataframe for the two-way analysis and call it d2
d2 <- subset(d, edu != "nb")
d2$edu <- droplevels(d2$edu)
# to double-check any changes we made
cross_cases(d2, edu, high_ed)
| high_ed | ||
|---|---|---|
| high_ed | standard_ed | |
| edu | ||
| 1 High school diploma or less, and NO COLLEGE | 53 | |
| 2 Currently in college | 2482 | |
| 3 Completed some college, but no longer in college | 34 | |
| 4 Complete 2 year College degree | 175 | |
| 5 Completed Bachelors Degree | 136 | |
| 6 Currently in graduate education | 133 | |
| 7 Completed some graduate degree | 56 | |
| #Total cases | 2807 | 262 |
# use the leveneTest() command from the car package to test homogeneity of variance
# uses the 'formula' setup: formula is y~x1*x2, where y is our DV and x1 is our first IV and x2 is our second IV
leveneTest(stress~edu, data = d)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 6 1.2895 0.2584
## 3062
leveneTest(stress~edu*high_ed, data = d)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 6 1.2895 0.2584
## 3062
# use this commented out section only if you need to remove outliers
# to drop a single outlier, remove the # at the beginning of the line and use this code:
d <- subset(d, row_id!=c(1246))
# to drop multiple outliers, remove the # at the beginning of the line and use this code:
d <- subset(d, row_id!=c(1108) & row_id!=c(602))
# use the lm() command to run the regression
# formula is y~x1*x2 + c, where y is our DV, x1 is our first IV, x2 is our second IV, and c is our covariate
reg_model <- lm(stress ~ edu, data = d) #for one-way
reg_model2 <- lm(stress ~ edu*high_ed, data = d) #for two-way
# Cook's distance
plot(reg_model, 4)
# Residuals vs Leverage
plot(reg_model, 5)
# Cook's distance
plot(reg_model2, 4)
# Residuals vs Leverage
plot(reg_model2, 5)
Our cell sizes are somewhat unbalanced. A small sample size for one of the levels of our variable limits our power and increases our Type II error rate.
Levene’s test is significant for my two-level education variable. I am ignoring this and continuing with the analysis anyway, but in the real world this is something I would have to correct for.
I identified and removed a single outlier.
aov_model <- aov_ez(data = d,
id = "row_id",
between = c("edu"),
dv = "stress",
anova_table = list(es = "pes"))
## Contrasts set to contr.sum for the following variables: edu
Effect size cutoffs from Cohen (1988):
nice(aov_model)
## Anova Table (Type 3 tests)
##
## Response: stress
## Effect df MSE F pes p.value
## 1 edu 6, 3059 0.36 4.84 *** .009 <.001
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
# Visualize Results
afex_plot(aov_model, x = "edu")
Only run posthocs if the test is significant! E.g., only run the posthoc tests on gender if there is a main effect for education.
To test my hypothesis that there would be a significant effect of education level on stress, I used a one-way ANOVA. I identified and removed a single outlier following visual analysis of a Residuals vs Leverage plot. A significant Levene’s test also indicates that our data violates the assumption of homogeneity of variance. This suggests that there is an increased chance of Type I error. I continued with my analysis for the purpose of this class.
I found an insignificant effect of gender, F(6,3061) = 4.85, p < .001, ηp2 = .009 (insignificant effect size; Cohen, 1988). Posthoc tests using Tukey’s HSD were not used for this analysis, due to insignificant effect size.
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.