this is a tutorial about how to do basic mediation analysis
For any recommendation of advices plz feel free to contact me : vet.m.mohamed@gmail.com
Need at first to load our libraries
library(tidyverse)
library(mediation)
library(semPlot)
library(knitr)
library(caret)
library(lavaan)
Now lets import our data and inspect it
download.file(url = 'http://static.lib.virginia.edu/statlab/materials/data/mediationData.csv',destfile = "./data sets/medanalysis.csv")
data<-read.csv("./data sets/medanalysis.csv")
head(data)
## X M Y
## 1 6 5 6
## 2 7 5 5
## 3 7 7 4
## 4 8 4 8
## 5 4 3 5
## 6 4 4 7
here we have three variables x and y and the mediator M between x and y
first lets look at their features graphically befor starting analysis
plot(data)# seeing the corelation and linearity between each pairs
par(mfrow=c(1,3))
for(i in names(data)){
density(data[,i])%>%plot(main=paste("density plot of variable",i,sep = " "))
} # seeing the normality of each variable
par(mfrow=c(1,3))
for(i in names(data)){
d<-jitter(data[,paste(i)],factor = 10)
qqnorm(d,main = paste("Normal QQ plot of",i,sep = " "))
} #seeing the normality and linearity using QQ plot
Now lets dive directly to mediation
1st model x~y
fit1<-lm(Y~X,data=data)
summary(fit1)
##
## Call:
## lm(formula = Y ~ X, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0262 -1.2340 -0.3282 1.5583 5.1622
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.8572 0.6932 4.122 7.88e-05 ***
## X 0.3961 0.1112 3.564 0.000567 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.929 on 98 degrees of freedom
## Multiple R-squared: 0.1147, Adjusted R-squared: 0.1057
## F-statistic: 12.7 on 1 and 98 DF, p-value: 0.0005671
the first model is significant
now lets run the 2nd model m~x
fit2<-lm(M~X,data=data)
summary(fit2)
##
## Call:
## lm(formula = M ~ X, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3046 -0.8656 0.1344 1.1344 4.6954
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.49952 0.58920 2.545 0.0125 *
## X 0.56102 0.09448 5.938 4.39e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.639 on 98 degrees of freedom
## Multiple R-squared: 0.2646, Adjusted R-squared: 0.2571
## F-statistic: 35.26 on 1 and 98 DF, p-value: 4.391e-08
great this is also significant
Lets run the model 3 y~X+M –> adjusting for X
fit3<-lm(Y~X+M,data)
summary(fit3)
##
## Call:
## lm(formula = Y ~ X + M, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7631 -1.2393 0.0308 1.0832 4.0055
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.9043 0.6055 3.145 0.0022 **
## X 0.0396 0.1096 0.361 0.7187
## M 0.6355 0.1005 6.321 7.92e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.631 on 97 degrees of freedom
## Multiple R-squared: 0.373, Adjusted R-squared: 0.3601
## F-statistic: 28.85 on 2 and 97 DF, p-value: 1.471e-10
here , M is significant amd X reduced to be completely non significant
So we here in front of case of complete mediation
but we still need to test if this mediation effect is significant or not
So we will use bootstrabing method to test that
medeffect<-mediate(model.m = fit2,model.y = fit3,sims = 1000,boot = T,boot.ci.type = "perc",treat = "X",mediator = "M")
summary(medeffect)
##
## Causal Mediation Analysis
##
## Nonparametric Bootstrap Confidence Intervals with the Percentile Method
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME 0.3565 0.2131 0.52 <2e-16 ***
## ADE 0.0396 -0.2006 0.30 0.734
## Total Effect 0.3961 0.1571 0.65 0.002 **
## Prop. Mediated 0.9000 0.4831 2.04 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sample Size Used: 100
##
##
## Simulations: 1000
here we see ACME : Average causal mediation Effect is significant , and see that ADE : average direct effect is not significant
This indicate that the mediation effect is significant !!
Now i want to test the nediation effect using Lavaan and structural equation modeling
model<-"
Y~a*X+b*M
M~c*X
#indirect effect
ind:=b*c
#total effect
total:=a+(b*c)
"
now run the model through cfa() function
medsem<-cfa(model = model,data = data,se="bootstrap")
summary(medsem,standardized=T,fit.measures=T,rsquare=T)
## lavaan 0.6-3 ended normally after 14 iterations
##
## Optimization method NLMINB
## Number of free parameters 5
##
## Number of observations 100
##
## Estimator ML
## Model Fit Test Statistic 0.000
## Degrees of freedom 0
## Minimum Function Value 0.0000000000000
##
## Model test baseline model:
##
## Minimum Function Test Statistic 77.413
## Degrees of freedom 3
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -379.612
## Loglikelihood unrestricted model (H1) -379.612
##
## Number of free parameters 5
## Akaike (AIC) 769.225
## Bayesian (BIC) 782.250
## Sample-size adjusted Bayesian (BIC) 766.459
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent Confidence Interval 0.000 0.000
## P-value RMSEA <= 0.05 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000
##
## Parameter Estimates:
##
## Standard Errors Bootstrap
## Number of requested bootstrap draws 1000
## Number of successful bootstrap draws 1000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Y ~
## X (a) 0.040 0.124 0.319 0.750 0.040 0.034
## M (b) 0.635 0.099 6.448 0.000 0.635 0.593
## M ~
## X (c) 0.561 0.100 5.588 0.000 0.561 0.514
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Y 2.581 0.331 7.794 0.000 2.581 0.627
## .M 2.633 0.360 7.308 0.000 2.633 0.735
##
## R-Square:
## Estimate
## Y 0.373
## M 0.265
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## ind 0.357 0.080 4.463 0.000 0.357 0.305
## total 0.396 0.122 3.246 0.001 0.396 0.339
semPaths(object = medsem,,whatLabels = "std")
we see here in the summary that the p value of X incase of presence of M is not significant and it is the same of model fit3
also we see that the indirect Effect b*c is significant and the mediation effect is also significant
Finaly we can see that the total effect is significant and it is so close to the total effect estimated in the bootstraping p.value in mediation effect model
REGARDS