Import the data.
dat <- read.csv("~/Downloads/Caleb Sleep Data JMP my analysis.csv", comment.char="#")
First, let’s look at the independent variable, ACEs (i.e., “ACE.T”).
# X = ACE
hist(dat$ACE.T,
xlab="ACEs",
main="")
I see this certainly isn’t normally distributed. However, the
assumption of regression analysis is not that the data are normally
distributed but that the residuals (errors) from the model are normally
distributed. In other words, all of the distances between \((\hat{y} - y)\) have to be normally
distributed.
mod.0=lm(SQS.T~ACE.T,dat) # Create a model where X predicts Y
hist(mod.0$residuals,
xlab="Residuals",
main="") # Plot the residuals of that model
The residuals appear normal at first glance. I’ll run a few more
“rigorous” plots.
plot(mod.0)
These plots might not make much sense to a newcomer. If you’re
interested in learning more, check out this
website.
My takeaway from them is that there aren’t any issues with violating the normality of residuals assumption. Or any regression assumptions for that matter. But let’s also check how residuals look in all the other models.
mod.01=lm(dat$Inflexibilty.total~ACE.T,dat) # X predicts M
hist(mod.01$residuals) # Plot the model residuals
Everything looks pretty much okay here. Now for the other plots…
plot(mod.01)
I don’t see any issues with the model where X is predicting M
either.
mod.02=lm(SQS.T~ACE.T+dat$Inflexibilty.total,dat)
hist(mod.02$residuals)
For the full model (where X and M predict Y), there does appear to
be some slight issue with non-normal residuals.
plot(mod.02)
Yeah, the Q-Q plot confirms that the residuals aren’t quite
normal for the full model.
I’m not sure what the implications are for the final model having non-normal residuals. I know that, in general Sobel’s test is “frowned upon” because it makes assumptions that turn out to be implausible. Specifically, the distribution of the product of the coefficient for ACEs by itself predicting SQS and the coefficient ACEs+PI are not always normally distributed, as Sobel’s test assumes. That’s why, in general, people say to do a bootstrapped estimateion of that statistic (i.e., the product of those two coefficients).
# re-naming some variables for more convenient/readable code
X=dat$ACE.T
M=dat$Inflexibilty.total
Y=dat$SQS.T
# Creating the two models you need to calculate the total effect
# plus the direct and indirect effects
model.M <- lm(M ~ X)
model.Y <- lm(Y ~ X + M)
# loading libarary that conducts bootstrapped mediations
library(mediation)
## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: mvtnorm
## Loading required package: sandwich
## mediation: Causal Mediation Analysis
## Version: 4.5.0
# run the bootstrap, save results as "results"
results <- mediate(model.M, model.Y, treat='X', mediator='M', boot=TRUE, sims=500)
## Running nonparametric bootstrap
# summarize "results
summary(results)
##
## Causal Mediation Analysis
##
## Nonparametric Bootstrap Confidence Intervals with the Percentile Method
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME 1.417 1.060 1.82 <2e-16 ***
## ADE 0.605 0.142 1.04 0.008 **
## Total Effect 2.023 1.510 2.50 <2e-16 ***
## Prop. Mediated 0.701 0.550 0.91 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sample Size Used: 242
##
##
## Simulations: 500
Total effect = The effect of X on Y (both directly and indirectly)
ADE = Average Direct Effect, net effect of X on Y, after subtracting mediator
ACME = Average Causal Mediation Effect, mediation effect, strength of mediation effect
Prop. Mediated = the proportion of the total effect of X on Y that runs through the mediator.
In other words, it looks like you have a pretty robust mediation effect. I’m not too worried about one of the constituent regression models having non-normal residuals.
plot(results,xlim=c(0,3))
Above we see the point esitmates for the total effect, direct effect (ADE), and indirect effect (ACME) along with their bootstrapped 95% confidence intervals. Even if you account for some potential assumption breaking with the indirect effect, it would still be well above the direct effect and take up a larger proportion of the total effect compared to direct effect, IMO.