Conceptual Overview
Mediation analysis serves as a crucial statistical tool to explore and quantify the indirect effects of an independent variable (IV) on a dependent variable (DV) via one or more intermediary variables, known as mediators. This methodological approach sheds light on the underlying processes or mechanisms driving an observed relationship. Several techniques for executing mediation analysis have been developed, each with its own merits. Among the most prominent are the traditional approach proposed by Baron and Kenny, the Sobel test for assessing the significance of indirect effects, and bootstrapping methods, which provide a non-parametric alternative for estimating indirect effects.
Consider a scenario examining the potential mediation effect of communication frequency on the relationship between criticism and relationship satisfaction. In this context, we hypothesize that increased criticism may lead to decreased communication frequency, which, in turn, diminishes relationship satisfaction. This example illustrates the potential pathway through which criticism indirectly influences relationship satisfaction, highlighting the mediating role of communication frequency.
Mediation is established if steps 1-3 are satisfied.
# Total effect of IV on DV
model1 <- lm(rel_sat ~ criticism, data=data)
# Effect of IV on Mediator
model2 <- lm(comm_freq ~ criticism, data=data)
# Direct effect of both IV and Mediator on DV
model3 <- lm(rel_sat ~ criticism + comm_freq, data=data)
summary(model1)##
## Call:
## lm(formula = rel_sat ~ criticism, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2271 -1.4224 0.0121 1.1771 4.2809
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.50937 0.47425 11.617 <0.0000000000000002 ***
## criticism 0.02489 0.11404 0.218 0.827
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.678 on 198 degrees of freedom
## Multiple R-squared: 0.0002405, Adjusted R-squared: -0.004809
## F-statistic: 0.04763 on 1 and 198 DF, p-value: 0.8275
##
## Call:
## lm(formula = comm_freq ~ criticism, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4878 -1.6386 -0.1101 1.7865 3.1391
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7509 0.5794 6.474 0.000000000735 ***
## criticism 0.1190 0.1393 0.854 0.394
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.05 on 198 degrees of freedom
## Multiple R-squared: 0.003671, Adjusted R-squared: -0.001361
## F-statistic: 0.7295 on 1 and 198 DF, p-value: 0.3941
##
## Call:
## lm(formula = rel_sat ~ criticism + comm_freq, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9942 -0.7790 -0.0983 0.8457 3.1609
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.21751 0.34828 9.238 <0.0000000000000002 ***
## criticism -0.04782 0.07622 -0.627 0.531
## comm_freq 0.61102 0.03881 15.744 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.12 on 197 degrees of freedom
## Multiple R-squared: 0.5573, Adjusted R-squared: 0.5528
## F-statistic: 124 on 2 and 197 DF, p-value: < 0.00000000000000022
The first model assessing the total effect of criticism on relationship satisfaction was not significant, with \(b = 0.02489\), \(SE = 0.11404\), \(t(198) = 0.218\), \(p = .827\), suggesting that criticism alone did not predict relationship satisfaction.
If this relationship is not significant, the assumption is that there is no effect to be mediated, and therefore, it is not customary to proceed with the subsequent steps of mediation analysis.
The second model examined the effect of criticism on the proposed mediator, communication frequency. The effect of criticism on communication frequency was also not significant, with \(b = 0.1190\), \(SE = 0.1393\), \(t(198) = 0.854\), \(p = .394\).
The final model included both criticism and communication frequency predicting relationship satisfaction. In this model, communication frequency significantly predicted relationship satisfaction, with \(b = 0.61102\), \(SE = 0.03881\), \(t(197) = 15.744\), \(p < .001\), while the effect of criticism remained non-significant, with \(b = -0.04782\), \(SE = 0.07622\), \(t(197) = -0.627\), \(p = .531\).
Given that the initial relationship between the independent variable and the dependent variable was not established, the conditions for mediation as outlined by Baron and Kenny were not met. Thus, the data do not support a mediating role of communication frequency in the relationship between criticism and relationship satisfaction.
Complete mediation occurs when controlling for the mediator eliminates the effect of the independent variable on the dependent variable, provided all specified conditions are satisfied. Partial mediation occurs when the influence of the independent variable on the dependent variable diminishes, but does not disappear entirely, upon inclusion of the mediator in the analysis.
Low Statistical Power: Simulation studies suggest the causal steps approach has lower power compared to other methods, making it less likely to detect an indirect effect when it exists.
Dependence on Significance of Paths: The method assumes that if both path a (from IV to mediator) and path b (from mediator to DV) are significant, then the indirect effect must also be significant. However, a significant indirect effect can exist even if one of the paths is not statistically significant.
Multiple Hypothesis Tests Increase Error Risk: The causal steps approach requires multiple null hypotheses to be rejected to claim an indirect effect, increasing the risk of type II errors (failing to detect an effect that is there).
Potential for Indirect Effects Even With Non-Significant Paths: It is possible for an indirect effect to be significant even if one or more of the constituent paths are not, which the causal steps approach does not accommodate.
Reference Hayes, A. F. (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Communication monographs, 76(4), 408-420.
Sobel test calculates whether the indirect effect of the IV on the DV through the mediator is significantly different from zero (the product of coefficients approach).
Sobel test equation:
\[ z = \frac{ab}{\sqrt{a^2 \cdot s_b^2 + b^2 \cdot s_a^2}} \]
a is the effect of the independent variable on the mediator. b is the effect of the mediator on the dependent variable. s_a is the standard error of a. s_b is the standard error of b.
# Coefficients and standard errors for paths a and b
a <- 0.1190
s_a <- 0.1393
b <- 0.61102
s_b <- 0.03881
# Calculate the Sobel test statistic (z-value)
z <- (a * b) / sqrt((b^2 * s_a^2) + (a^2 * s_b^2))
# Calculate the two-tailed p-value
p_value <- 2 * pnorm(abs(z), lower.tail = FALSE)
# Display the results
cat("Sobel test statistic (z-value):", z, "\n")## Sobel test statistic (z-value): 0.8530166
## p-value for the Sobel test: 0.3936501
#calculation tool for the Sobel test https://quantpsy.org/sobel/sobel.htm
We could also use a package to conduct the Sobel’s test
## Warning: package 'multilevel' was built under R version 4.3.3
## $`Mod1: Y~X`
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.50937298 0.4742516 11.6169846 0.000000000000000000000003979234
## pred 0.02488966 0.1140445 0.2182451 0.827462910364342496549738825706
##
## $`Mod2: Y~X+M`
## Estimate Std. Error t value
## (Intercept) 3.21750814 0.34827801 9.2383327
## pred -0.04781855 0.07622433 -0.6273397
## med 0.61101664 0.03881049 15.7435940
## Pr(>|t|)
## (Intercept) 0.000000000000000040913347831108658691177948
## pred 0.531163002810534412567733397736446931958199
## med 0.000000000000000000000000000000000001082172
##
## $`Mod3: M~X`
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7509042 0.5793587 6.4742342 0.0000000007348964
## pred 0.1189955 0.1393199 0.8541166 0.3940725828067388
##
## $Indirect.Effect
## [1] 0.07270821
##
## $SE
## [1] 0.08525198
##
## $z.value
## [1] 0.8528624
##
## $N
## [1] 200
Yet, the Sobel test assumes that the sampling distribution of the indirect effect is normal. However, the sampling distribution of the indirect effect tends to be asymmetric.
Bootstrapping does not rely on the normality assumption for the indirect effect. It estimates the distribution of the indirect effect by repeatedly resampling the data, making it robust to non-normal distributions.
The primary advantage of bootstrapping lies in its ability to generate confidence intervals for the statistical estimates. Unlike a p-value, which quantifies the probability of observing the data if the null hypothesis were true, confidence intervals provide a range of likely values for a parameter, offering a more comprehensive insight into its possible magnitude and precision. This feature makes bootstrapping particularly valuable in situations where traditional parametric assumptions are not met or when the sample size is small, allowing for more reliable and interpretable statistical inference.
The relationship between the independent variable \(X\) and the dependent variable \(Y\), mediated by \(M\), is captured by the following equations:
The equation for \(Y\) as a function of the mediator \(M\) and the independent variable \(X\) is: \[ Y = b_0 + b_1M + c'X \]
The mediator \(M\) as a function of the independent variable \(X\) is given by: \[ M = a_0 + a_1X \]
By substituting the equation for \(M\) into the first equation, we arrive at a consolidated equation for \(Y\):
\[ Y = b_0 + b_1(a_0 + a_1X) + c'X \] \[ Y = (b_0 + b_1a_0) + (b_1a_1 + c')X \]
From these equations, we can identify the effects:
The indirect effect of \(X\) on \(Y\) through the mediator \(M\) is quantified by the product of the coefficients \(a_1\) and \(b_1\): \[ \text{Indirect effect} = a_1b_1 \]
The direct effect of \(X\) on \(Y\), not through the mediator, is represented by \(c'\): \[ \text{Direct effect} = c' \]
These effects shed light on the pathways through which \(X\) influences \(Y\) and the role of \(M\) as a mediator in this relationship.
To conduct a bootstrapping mediation analysis within a path analysis framework, we will utilize the lavaan package in R. This package is specifically designed for structural equation modeling (SEM).
You’ll need to define your model in lavaan’s syntax. In our hypothetical example, we are interested in the mediation effect of communication frequency (comm_freq) on the relationship between criticism (criticism) and relationship satisfaction (rel_sat).
mediation <- 'rel_sat ~ a*comm_freq + b*criticism
comm_freq ~ c*criticism
indirect := a*b
total := c + (a*b)
proportion := indirect/total
'
mediationout <- sem(mediation, data=data)
summary(mediationout)## lavaan 0.6.16 ended normally after 1 iteration
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 5
##
## Number of observations 200
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## rel_sat ~
## comm_freq (a) 0.611 0.039 15.863 0.000
## criticism (b) -0.048 0.076 -0.632 0.527
## comm_freq ~
## criticism (c) 0.119 0.139 0.858 0.391
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .rel_sat 1.235 0.123 10.000 0.000
## .comm_freq 4.162 0.416 10.000 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## indirect -0.029 0.046 -0.630 0.529
## total 0.090 0.146 0.614 0.539
## proportion -0.325 0.849 -0.383 0.702
set.seed(100)
mediationbootout <- sem(mediation, data=data, test="bootstrap", bootstrap=100)
summary(mediationbootout)## lavaan 0.6.16 ended normally after 1 iteration
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 5
##
## Number of observations 200
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## rel_sat ~
## comm_freq (a) 0.611 0.039 15.863 0.000
## criticism (b) -0.048 0.076 -0.632 0.527
## comm_freq ~
## criticism (c) 0.119 0.139 0.858 0.391
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .rel_sat 1.235 0.123 10.000 0.000
## .comm_freq 4.162 0.416 10.000 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## indirect -0.029 0.046 -0.630 0.529
## total 0.090 0.146 0.614 0.539
## proportion -0.325 0.849 -0.383 0.702
## lavaan 0.6.16 ended normally after 1 iteration
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 5
##
## Number of observations 200
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Model Test Baseline Model:
##
## Test statistic 163.695
## Degrees of freedom 3
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -731.271
## Loglikelihood unrestricted model (H1) -731.271
##
## Akaike (AIC) 1472.542
## Bayesian (BIC) 1489.033
## Sample-size adjusted Bayesian (SABIC) 1473.193
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: RMSEA <= 0.050 NA
## P-value H_0: RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## rel_sat ~
## comm_freq (a) 0.611 0.039 15.863 0.000 0.536 0.687
## criticism (b) -0.048 0.076 -0.632 0.527 -0.196 0.100
## comm_freq ~
## criticism (c) 0.119 0.139 0.858 0.391 -0.153 0.391
## Std.lv Std.all
##
## 0.611 0.748
## -0.048 -0.030
##
## 0.119 0.061
##
## Variances:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## .rel_sat 1.235 0.123 10.000 0.000 0.993 1.477
## .comm_freq 4.162 0.416 10.000 0.000 3.346 4.977
## Std.lv Std.all
## 1.235 0.443
## 4.162 0.996
##
## R-Square:
## Estimate
## rel_sat 0.557
## comm_freq 0.004
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## indirect -0.029 0.046 -0.630 0.529 -0.120 0.062
## total 0.090 0.146 0.614 0.539 -0.197 0.376
## proportion -0.325 0.849 -0.383 0.702 -1.990 1.339
## Std.lv Std.all
## -0.029 -0.022
## 0.090 0.038
## -0.325 -0.582
In the output, you’ll want to pay close attention to the estimates for a (effect of criticism on communication frequency), b (effect of communication frequency on relationship satisfaction controlling for criticism), and especially the indirect_effect (product of a and b), along with its bootstrapped confidence intervals. A mediation effect is considered statistically significant if its confidence interval does not include zero.