#install.packages("afex")
#install.packages("emmeans")
#install.packages("ggbeeswarm")
#install.packages("expss")
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(afex) # to run the ANOVA
library(ggbeeswarm) # to run plot results
library(emmeans) # for posthoc tests
d <- read.csv(file="Data/projectdata.csv", header=T)
# new code! this adds a column with a number for each row. It will make it easier if we need to drop outliers later
d$row_id <- 1:nrow(d)
We predict that there will be a significant difference in people’s level of depression based on their mental health diagnosis (anxiety disorder, none or NA, PTSD), such that those with an anxiety disorder and PTSD will report higher depression than those with no diagnosis.
# you only need to check the variables you're using in the current analysis
str(d$mhealth)
## chr [1:256] "none or NA" "none or NA" "none or NA" "none or NA" ...
str(d$phq)
## num [1:256] 2 1.78 1.11 1.33 1.89 ...
# make our categorical variable of interest a factor
# also make the row ID variable a factor
d$mhealth <- as.factor(d$mhealth)
d$row_id <- as.factor(d$row_id)
# check that our categorical variable is now a factor
str(d$mhealth)
## Factor w/ 8 levels "anxiety disorder",..: 5 5 5 5 5 5 5 5 5 5 ...
# check our DV skew and kurtosis
describe(d$phq)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 256 2.69 0.85 2.78 2.7 0.99 1 4 3 -0.09 -1.06 0.05
# view DV skew and kurtosis across IV levels
describeBy(d$phq, group = d$mhealth)
##
## Descriptive statistics by group
## group: anxiety disorder
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 42 3.01 0.76 3.11 3.07 0.91 1.44 4 2.56 -0.36 -0.96 0.12
## ------------------------------------------------------------
## group: bipolar
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 3 3.59 0.17 3.56 3.59 0.16 3.44 3.78 0.33 0.21 -2.33 0.1
## ------------------------------------------------------------
## group: depression
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 4 2.81 0.74 2.89 2.81 0.74 1.89 3.56 1.67 -0.18 -2.11 0.37
## ------------------------------------------------------------
## group: eating disorders
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 14 2.84 0.72 2.78 2.83 0.99 1.78 4 2.22 0.01 -1.37 0.19
## ------------------------------------------------------------
## group: none or NA
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 162 2.48 0.86 2.44 2.46 0.99 1 4 3 0.18 -1.03 0.07
## ------------------------------------------------------------
## group: obsessive compulsive disorder
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 11 3.06 0.61 3.11 3.1 0.66 2 3.78 1.78 -0.32 -1.34 0.18
## ------------------------------------------------------------
## group: other
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 11 3.34 0.54 3.22 3.35 0.66 2.67 4 1.33 0.07 -1.89 0.16
## ------------------------------------------------------------
## group: ptsd
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 9 2.99 0.8 2.89 2.99 1.15 1.78 4 2.22 -0.01 -1.61 0.27
# also use a histogram to examine the continuous variable
hist(d$phq,
main = "Histogram of Patient Health Questionnaire-9",
xlab = "Patient Health Questionnaire-9")
# We are only comparing three levels: anxiety disorder, none or NA, and PTSD
# Drop all other levels first
d <- subset(d, mhealth %in% c("anxiety disorder", "none or NA", "ptsd"))
d$mhealth <- droplevels(d$mhealth)
table(d$mhealth)
##
## anxiety disorder none or NA ptsd
## 42 162 9
# REMEMBER your test's level of POWER is determined by your SMALLEST subsample
# use this commented out section ONLY IF you need to remove outliers after inspecting the plots
# to drop a single outlier, use this code:
# d <- subset(d, row_id!=c(XX))
# to drop multiple outliers, use this code:
# d <- subset(d, row_id!=c(XX) & row_id!=c(YY))
# use the lm() command to run the regression
# formula is y~x, where y is our DV and x is our IV
reg_model <- lm(phq ~ mhealth, data = d)
# Cook's distance
plot(reg_model, 4)
# Residuals VS Leverage
plot(reg_model, 5)
# IF you find outliers, go back up and remove them using subset(), then re-run the reg_model code
# use the leveneTest() command from the car package
# formula is y~x, where y is our DV and x is our IV
leveneTest(phq ~ mhealth, data = d)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.8176 0.4429
## 210
Our cell sizes are unbalanced between the mental health diagnosis group levels. The none or NA group is substantially larger (n = 162) than the anxiety disorder group (n = 42) or the PTSD group (n = 9). This significantly limits the power of our test and increases the chance of a Type II error, particularly for the PTSD group.
Levene’s test was non-significant (p = .443) for our three-level mental health diagnosis variable with the One-Way ANOVA. Our data met the homogeneity of variance assumption.
We did not identify any outliers for the One-Way ANOVA.
[UPDATE the bracketed sections above once you have run the code.]
aov_model <- aov_ez(data = d,
id = "row_id",
between = c("mhealth"),
dv = "phq",
anova_table = list(es = "pes"))
## Contrasts set to contr.sum for the following variables: mhealth
nice(aov_model)
## Anova Table (Type 3 tests)
##
## Response: phq
## Effect df MSE F pes p.value
## 1 mhealth 2, 210 0.70 7.59 *** .067 <.001
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
ANOVA Effect Size [partial eta-squared] cutoffs from Cohen (1988):
afex_plot(aov_model, x = "mhealth")
Remember: We ONLY run posthoc tests IF the ANOVA is SIGNIFICANT!
emmeans(aov_model, specs = "mhealth")
## mhealth emmean SE df lower.CL upper.CL
## anxiety disorder 3.01 0.1290 210 2.76 3.27
## none or NA 2.48 0.0658 210 2.36 2.61
## ptsd 2.99 0.2790 210 2.44 3.54
##
## Confidence level used: 0.95
pairs(emmeans(aov_model, specs = "mhealth", adjust = "tukey"))
## contrast estimate SE df t.ratio p.value
## anxiety disorder - none or NA 0.5283 0.145 210 3.643 0.0010
## anxiety disorder - ptsd 0.0256 0.308 210 0.083 0.9962
## none or NA - ptsd -0.5027 0.287 210 -1.753 0.1883
##
## P value adjustment: tukey method for comparing a family of 3 estimates
To test our hypothesis that there will be a significant difference in people’s level of depression based on their mental health diagnosis (anxiety disorder, none or NA, PTSD), we used a one-way ANOVA. Our data was unbalanced, with many more participants with no diagnosis participating in our survey (n = 162) than those with an anxiety disorder (n = 42) or PTSD (n = 9). This significantly reduces the power of our test and increases the chances of a Type II error, particularly for the PTSD group. We did not identify any outliers following visual analysis of Cook’s Distance and Residuals VS Leverage plots. Levene’s test was non-significant (p = .443), indicating our data met the assumption of homogeneity of variance.
We found a significant effect of mental health diagnosis, F(2, 210) = 7.59, p < .001, ηp2 = .067 (medium effect; Cohen, 1988). Posthoc tests using Tukey’s HSD adjustment revealed that participants with an anxiety disorder (M = 3.01, SE = 0.13) reported significantly higher depression than those with no diagnosis (M = 2.48, SE = 0.07; p = .001). Participants with PTSD (M = 2.99, SE = 0.28) did not differ significantly from those with an anxiety disorder (pn= .996) or those with no diagnosis (p = .188) (see Figure 1 for a comparison).
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.
Singmann, H., Bolker, B., Westfall, J., Aust, F., & Ben-Shachar, M. (2025). afex: Analysis of factorial experiments. R package version 1.5-1. https://github.com/singmann/afex