Initial set up

Import the data.

dat <- read.csv("~/Downloads/Caleb Sleep Data JMP my analysis.csv", comment.char="#")

Examine the distributions of the X, M, and Y variables

First, let’s look at the independent variable, ACEs (i.e., “ACE.T”).

# X = ACE
hist(dat$ACE.T,
     xlab="ACEs",
     main="")


I see this certainly isn’t normally distributed. However, the assumption of regression analysis is not that the data are normally distributed but that the residuals (errors) from the model are normally distributed. In other words, all of the distances between \((\hat{y} - y)\) have to be normally distributed.

mod.0=lm(SQS.T~ACE.T,dat) # Create a model where X predicts Y
hist(mod.0$residuals,
     xlab="Residuals",
     main="") # Plot the residuals of that model


The residuals appear normal at first glance. I’ll run a few more “rigorous” plots.

plot(mod.0)


These plots might not make much sense to a newcomer. If you’re interested in learning more, check out this website.

My takeaway from them is that there aren’t any issues with violating the normality of residuals assumption. Or any regression assumptions for that matter. But let’s also check how residuals look in all the other models.

mod.01=lm(dat$Inflexibilty.total~ACE.T,dat) # X predicts M
hist(mod.01$residuals) # Plot the model residuals


Everything looks pretty much okay here. Now for the other plots…

plot(mod.01)


I don’t see any issues with the model where X is predicting M either.

mod.02=lm(SQS.T~ACE.T+dat$Inflexibilty.total,dat)
hist(mod.02$residuals)


For the full model (where X and M predict Y), there does appear to be some slight issue with non-normal residuals.

plot(mod.02)


Yeah, the Q-Q plot confirms that the residuals aren’t quite normal for the full model.

Non-parametric (bootstrap) mediation analysis

I’m not sure what the implications are for the final model having non-normal residuals. I know that, in general Sobel’s test is “frowned upon” because it makes assumptions that turn out to be implausible. Specifically, the distribution of the product of the coefficient for ACEs by itself predicting SQS and the coefficient ACEs+PI are not always normally distributed, as Sobel’s test assumes. That’s why, in general, people say to do a bootstrapped estimateion of that statistic (i.e., the product of those two coefficients).

# re-naming some variables for more convenient/readable code
X=dat$ACE.T
M=dat$Inflexibilty.total
Y=dat$SQS.T

# Creating the two models you need to calculate the total effect
# plus the direct and indirect effects
model.M <- lm(M ~ X)
model.Y <- lm(Y ~ X + M)

# loading libarary that conducts bootstrapped mediations
library(mediation)
## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: mvtnorm
## Loading required package: sandwich
## mediation: Causal Mediation Analysis
## Version: 4.5.0
# run the bootstrap, save results as "results"
results <- mediate(model.M, model.Y, treat='X', mediator='M', boot=TRUE, sims=500)
## Running nonparametric bootstrap
# summarize "results
summary(results)
## 
## Causal Mediation Analysis 
## 
## Nonparametric Bootstrap Confidence Intervals with the Percentile Method
## 
##                Estimate 95% CI Lower 95% CI Upper p-value    
## ACME              1.417        1.060         1.82  <2e-16 ***
## ADE               0.605        0.142         1.04   0.008 ** 
## Total Effect      2.023        1.510         2.50  <2e-16 ***
## Prop. Mediated    0.701        0.550         0.91  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 242 
## 
## 
## Simulations: 500

Total effect = The effect of X on Y (both directly and indirectly)

ADE = Average Direct Effect, net effect of X on Y, after subtracting mediator

ACME = Average Causal Mediation Effect, mediation effect, strength of mediation effect

Prop. Mediated = the proportion of the total effect of X on Y that runs through the mediator.

In other words, it looks like you have a pretty robust mediation effect. I’m not too worried about one of the constituent regression models having non-normal residuals.

plot(results,xlim=c(0,3))

Above we see the point esitmates for the total effect, direct effect (ADE), and indirect effect (ACME) along with their bootstrapped 95% confidence intervals. Even if you account for some potential assumption breaking with the indirect effect, it would still be well above the direct effect and take up a larger proportion of the total effect compared to direct effect, IMO.