this is a tutorial about how to do basic mediation analysis

For any recommendation of advices plz feel free to contact me : vet.m.mohamed@gmail.com

Need at first to load our libraries

library(tidyverse)
library(mediation)
library(semPlot)
library(knitr)
library(caret)
library(lavaan)

Now lets import our data and inspect it

download.file(url = 'http://static.lib.virginia.edu/statlab/materials/data/mediationData.csv',destfile = "./data sets/medanalysis.csv")
data<-read.csv("./data sets/medanalysis.csv")

head(data)
##   X M Y
## 1 6 5 6
## 2 7 5 5
## 3 7 7 4
## 4 8 4 8
## 5 4 3 5
## 6 4 4 7

here we have three variables x and y and the mediator M between x and y

first lets look at their features graphically befor starting analysis

plot(data)# seeing the corelation and linearity between each pairs

par(mfrow=c(1,3))
for(i in names(data)){
        density(data[,i])%>%plot(main=paste("density plot of variable",i,sep = " "))
} # seeing the normality of each variable

par(mfrow=c(1,3))
for(i in names(data)){
        d<-jitter(data[,paste(i)],factor = 10)
        qqnorm(d,main = paste("Normal QQ plot of",i,sep = " "))
} #seeing the normality and linearity using QQ plot 

Now lets dive directly to mediation

1st model x~y

fit1<-lm(Y~X,data=data)
summary(fit1)
## 
## Call:
## lm(formula = Y ~ X, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0262 -1.2340 -0.3282  1.5583  5.1622 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.8572     0.6932   4.122 7.88e-05 ***
## X             0.3961     0.1112   3.564 0.000567 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.929 on 98 degrees of freedom
## Multiple R-squared:  0.1147, Adjusted R-squared:  0.1057 
## F-statistic:  12.7 on 1 and 98 DF,  p-value: 0.0005671

the first model is significant

now lets run the 2nd model m~x

fit2<-lm(M~X,data=data)
summary(fit2)
## 
## Call:
## lm(formula = M ~ X, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3046 -0.8656  0.1344  1.1344  4.6954 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.49952    0.58920   2.545   0.0125 *  
## X            0.56102    0.09448   5.938 4.39e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.639 on 98 degrees of freedom
## Multiple R-squared:  0.2646, Adjusted R-squared:  0.2571 
## F-statistic: 35.26 on 1 and 98 DF,  p-value: 4.391e-08

great this is also significant

Lets run the model 3 y~X+M –> adjusting for X

fit3<-lm(Y~X+M,data)

summary(fit3)
## 
## Call:
## lm(formula = Y ~ X + M, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7631 -1.2393  0.0308  1.0832  4.0055 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.9043     0.6055   3.145   0.0022 ** 
## X             0.0396     0.1096   0.361   0.7187    
## M             0.6355     0.1005   6.321 7.92e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.631 on 97 degrees of freedom
## Multiple R-squared:  0.373,  Adjusted R-squared:  0.3601 
## F-statistic: 28.85 on 2 and 97 DF,  p-value: 1.471e-10

here , M is significant amd X reduced to be completely non significant

So we here in front of case of complete mediation

but we still need to test if this mediation effect is significant or not

So we will use bootstrabing method to test that

medeffect<-mediate(model.m = fit2,model.y = fit3,sims = 1000,boot = T,boot.ci.type = "perc",treat = "X",mediator = "M")

summary(medeffect)
## 
## Causal Mediation Analysis 
## 
## Nonparametric Bootstrap Confidence Intervals with the Percentile Method
## 
##                Estimate 95% CI Lower 95% CI Upper p-value    
## ACME             0.3565       0.2131         0.52  <2e-16 ***
## ADE              0.0396      -0.2006         0.30   0.734    
## Total Effect     0.3961       0.1571         0.65   0.002 ** 
## Prop. Mediated   0.9000       0.4831         2.04   0.002 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 100 
## 
## 
## Simulations: 1000

here we see ACME : Average causal mediation Effect is significant , and see that ADE : average direct effect is not significant

This indicate that the mediation effect is significant !!

Now i want to test the nediation effect using Lavaan and structural equation modeling

model<-"

Y~a*X+b*M
M~c*X
#indirect effect
ind:=b*c
#total effect
total:=a+(b*c)
"

now run the model through cfa() function

medsem<-cfa(model = model,data = data,se="bootstrap")

summary(medsem,standardized=T,fit.measures=T,rsquare=T)
## lavaan 0.6-3 ended normally after 14 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          5
## 
##   Number of observations                           100
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                       0.000
##   Degrees of freedom                                 0
##   Minimum Function Value               0.0000000000000
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic               77.413
##   Degrees of freedom                                 3
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -379.612
##   Loglikelihood unrestricted model (H1)       -379.612
## 
##   Number of free parameters                          5
##   Akaike (AIC)                                 769.225
##   Bayesian (BIC)                               782.250
##   Sample-size adjusted Bayesian (BIC)          766.459
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent Confidence Interval          0.000  0.000
##   P-value RMSEA <= 0.05                             NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard Errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Y ~                                                                   
##     X          (a)    0.040    0.124    0.319    0.750    0.040    0.034
##     M          (b)    0.635    0.099    6.448    0.000    0.635    0.593
##   M ~                                                                   
##     X          (c)    0.561    0.100    5.588    0.000    0.561    0.514
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Y                 2.581    0.331    7.794    0.000    2.581    0.627
##    .M                 2.633    0.360    7.308    0.000    2.633    0.735
## 
## R-Square:
##                    Estimate
##     Y                 0.373
##     M                 0.265
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     ind               0.357    0.080    4.463    0.000    0.357    0.305
##     total             0.396    0.122    3.246    0.001    0.396    0.339
semPaths(object = medsem,,whatLabels = "std")

we see here in the summary that the p value of X incase of presence of M is not significant and it is the same of model fit3

also we see that the indirect Effect b*c is significant and the mediation effect is also significant

Finaly we can see that the total effect is significant and it is so close to the total effect estimated in the bootstraping p.value in mediation effect model

REGARDS