Mediation

Jeong Eun Cheon

cheonje@yonsei.ac.kr

Preparing for the analysis
- Read in data
Mediation

Preparing for the analysis

Read in data

library(tidyverse)

## Warning: package 'ggplot2' was built under R version 4.3.2

data <- read_csv("Practice.csv")


data$criticism <- rowMeans(data[,c("criticism1","criticism2", "criticism3")], na.rm=TRUE)

Mediation

Conceptual Overview

Mediation analysis serves as a crucial statistical tool to explore and quantify the indirect effects of an independent variable (IV) on a dependent variable (DV) via one or more intermediary variables, known as mediators. This methodological approach sheds light on the underlying processes or mechanisms driving an observed relationship. Several techniques for executing mediation analysis have been developed, each with its own merits. Among the most prominent are the traditional approach proposed by Baron and Kenny, the Sobel test for assessing the significance of indirect effects, and bootstrapping methods, which provide a non-parametric alternative for estimating indirect effects.

Consider a scenario examining the potential mediation effect of communication frequency on the relationship between criticism and relationship satisfaction. In this context, we hypothesize that increased criticism may lead to decreased communication frequency, which, in turn, diminishes relationship satisfaction. This example illustrates the potential pathway through which criticism indirectly influences relationship satisfaction, highlighting the mediating role of communication frequency.

mediationmodel

Method 1: Baron and Kenny’s Steps

1. X → Y: Show that the independent variable (IV, criticism) significantly affects the dependent variable (DV, relationship satisfaction) without the mediator in the model.
1. X → M: Show that the IV significantly affects the mediator (communication frequency).
1. X + M → Y: Include both IV and mediator in the same model predicting the DV. The mediator must significantly predict the DV, and the effect of the IV on the DV should be less than in step 1 (partial mediation) or non-significant (full mediation); The effect of the IV on the DV should decrease (partial mediation) or become non-significant (full mediation) compared to when the mediator was not included in the model.

Mediation is established if steps 1-3 are satisfied.

# Total effect of IV on DV
model1 <- lm(rel_sat ~ criticism, data=data)

# Effect of IV on Mediator
model2 <- lm(comm_freq ~ criticism, data=data)

# Direct effect of both IV and Mediator on DV
model3 <- lm(rel_sat ~ criticism + comm_freq, data=data)

summary(model1)

## 
## Call:
## lm(formula = rel_sat ~ criticism, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2271 -1.4224  0.0121  1.1771  4.2809 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)  5.50937    0.47425  11.617 <0.0000000000000002 ***
## criticism    0.02489    0.11404   0.218               0.827    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.678 on 198 degrees of freedom
## Multiple R-squared:  0.0002405,  Adjusted R-squared:  -0.004809 
## F-statistic: 0.04763 on 1 and 198 DF,  p-value: 0.8275

summary(model2)

## 
## Call:
## lm(formula = comm_freq ~ criticism, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4878 -1.6386 -0.1101  1.7865  3.1391 
## 
## Coefficients:
##             Estimate Std. Error t value       Pr(>|t|)    
## (Intercept)   3.7509     0.5794   6.474 0.000000000735 ***
## criticism     0.1190     0.1393   0.854          0.394    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.05 on 198 degrees of freedom
## Multiple R-squared:  0.003671,   Adjusted R-squared:  -0.001361 
## F-statistic: 0.7295 on 1 and 198 DF,  p-value: 0.3941

summary(model3)

## 
## Call:
## lm(formula = rel_sat ~ criticism + comm_freq, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9942 -0.7790 -0.0983  0.8457  3.1609 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)  3.21751    0.34828   9.238 <0.0000000000000002 ***
## criticism   -0.04782    0.07622  -0.627               0.531    
## comm_freq    0.61102    0.03881  15.744 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.12 on 197 degrees of freedom
## Multiple R-squared:  0.5573, Adjusted R-squared:  0.5528 
## F-statistic:   124 on 2 and 197 DF,  p-value: < 0.00000000000000022

Results

The first model assessing the total effect of criticism on relationship satisfaction was not significant, with \(b = 0.02489\), \(SE = 0.11404\), \(t(198) = 0.218\), \(p = .827\), suggesting that criticism alone did not predict relationship satisfaction.

If this relationship is not significant, the assumption is that there is no effect to be mediated, and therefore, it is not customary to proceed with the subsequent steps of mediation analysis.

The second model examined the effect of criticism on the proposed mediator, communication frequency. The effect of criticism on communication frequency was also not significant, with \(b = 0.1190\), \(SE = 0.1393\), \(t(198) = 0.854\), \(p = .394\).

The final model included both criticism and communication frequency predicting relationship satisfaction. In this model, communication frequency significantly predicted relationship satisfaction, with \(b = 0.61102\), \(SE = 0.03881\), \(t(197) = 15.744\), \(p < .001\), while the effect of criticism remained non-significant, with \(b = -0.04782\), \(SE = 0.07622\), \(t(197) = -0.627\), \(p = .531\).

Given that the initial relationship between the independent variable and the dependent variable was not established, the conditions for mediation as outlined by Baron and Kenny were not met. Thus, the data do not support a mediating role of communication frequency in the relationship between criticism and relationship satisfaction.

Complete mediation occurs when controlling for the mediator eliminates the effect of the independent variable on the dependent variable, provided all specified conditions are satisfied. Partial mediation occurs when the influence of the independent variable on the dependent variable diminishes, but does not disappear entirely, upon inclusion of the mediator in the analysis.

Limitations

Low Statistical Power: Simulation studies suggest the causal steps approach has lower power compared to other methods, making it less likely to detect an indirect effect when it exists.
Dependence on Significance of Paths: The method assumes that if both path a (from IV to mediator) and path b (from mediator to DV) are significant, then the indirect effect must also be significant. However, a significant indirect effect can exist even if one of the paths is not statistically significant.
Multiple Hypothesis Tests Increase Error Risk: The causal steps approach requires multiple null hypotheses to be rejected to claim an indirect effect, increasing the risk of type II errors (failing to detect an effect that is there).
Potential for Indirect Effects Even With Non-Significant Paths: It is possible for an indirect effect to be significant even if one or more of the constituent paths are not, which the causal steps approach does not accommodate.

Reference Hayes, A. F. (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Communication monographs, 76(4), 408-420.

Method 2: Sobel Test

Sobel test calculates whether the indirect effect of the IV on the DV through the mediator is significantly different from zero (the product of coefficients approach).

Sobel test equation:

\[ z = \frac{ab}{\sqrt{a^2 \cdot s_b^2 + b^2 \cdot s_a^2}} \]

a is the effect of the independent variable on the mediator. b is the effect of the mediator on the dependent variable. s_a is the standard error of a. s_b is the standard error of b.

# Coefficients and standard errors for paths a and b
a <- 0.1190
s_a <- 0.1393
b <- 0.61102
s_b <- 0.03881

# Calculate the Sobel test statistic (z-value)
z <- (a * b) / sqrt((b^2 * s_a^2) + (a^2 * s_b^2))

# Calculate the two-tailed p-value
p_value <- 2 * pnorm(abs(z), lower.tail = FALSE)

# Display the results
cat("Sobel test statistic (z-value):", z, "\n")

## Sobel test statistic (z-value): 0.8530166

cat("p-value for the Sobel test:", p_value, "\n")

## p-value for the Sobel test: 0.3936501

#calculation tool for the Sobel test https://quantpsy.org/sobel/sobel.htm

We could also use a package to conduct the Sobel’s test

library(multilevel)

## Warning: package 'multilevel' was built under R version 4.3.3

sobel(data$criticism,data$comm_freq,data$rel_sat)

## $`Mod1: Y~X`
##               Estimate Std. Error    t value                         Pr(>|t|)
## (Intercept) 5.50937298  0.4742516 11.6169846 0.000000000000000000000003979234
## pred        0.02488966  0.1140445  0.2182451 0.827462910364342496549738825706
## 
## $`Mod2: Y~X+M`
##                Estimate Std. Error    t value
## (Intercept)  3.21750814 0.34827801  9.2383327
## pred        -0.04781855 0.07622433 -0.6273397
## med          0.61101664 0.03881049 15.7435940
##                                                 Pr(>|t|)
## (Intercept) 0.000000000000000040913347831108658691177948
## pred        0.531163002810534412567733397736446931958199
## med         0.000000000000000000000000000000000001082172
## 
## $`Mod3: M~X`
##              Estimate Std. Error   t value           Pr(>|t|)
## (Intercept) 3.7509042  0.5793587 6.4742342 0.0000000007348964
## pred        0.1189955  0.1393199 0.8541166 0.3940725828067388
## 
## $Indirect.Effect
## [1] 0.07270821
## 
## $SE
## [1] 0.08525198
## 
## $z.value
## [1] 0.8528624
## 
## $N
## [1] 200

Limitations

Yet, the Sobel test assumes that the sampling distribution of the indirect effect is normal. However, the sampling distribution of the indirect effect tends to be asymmetric.

Method 3: Bootstrapping

Bootstrapping does not rely on the normality assumption for the indirect effect. It estimates the distribution of the indirect effect by repeatedly resampling the data, making it robust to non-normal distributions.

What is bootstrapping? Bootstrapping is a versatile statistical resampling method used to estimate the distribution of a parameter by drawing random samples with replacement from the original dataset. This approach treats the available data as a representative population, creating numerous subsets to mimic potential variations within the actual population. Typically, the number of bootstrap samples ranges from 1,000 to 10,000, aiming to ensure robustness in the parameter estimation.

The primary advantage of bootstrapping lies in its ability to generate confidence intervals for the statistical estimates. Unlike a p-value, which quantifies the probability of observing the data if the null hypothesis were true, confidence intervals provide a range of likely values for a parameter, offering a more comprehensive insight into its possible magnitude and precision. This feature makes bootstrapping particularly valuable in situations where traditional parametric assumptions are not met or when the sample size is small, allowing for more reliable and interpretable statistical inference.

Understanding mediation under the path analysis framework.

Model Equations:

The relationship between the independent variable \(X\) and the dependent variable \(Y\), mediated by \(M\), is captured by the following equations:

The equation for \(Y\) as a function of the mediator \(M\) and the independent variable \(X\) is: \[ Y = b_0 + b_1M + c'X \]
The mediator \(M\) as a function of the independent variable \(X\) is given by: \[ M = a_0 + a_1X \]

By substituting the equation for \(M\) into the first equation, we arrive at a consolidated equation for \(Y\):

\[ Y = b_0 + b_1(a_0 + a_1X) + c'X \] \[ Y = (b_0 + b_1a_0) + (b_1a_1 + c')X \]

From these equations, we can identify the effects:

The indirect effect of \(X\) on \(Y\) through the mediator \(M\) is quantified by the product of the coefficients \(a_1\) and \(b_1\): \[ \text{Indirect effect} = a_1b_1 \]
The direct effect of \(X\) on \(Y\), not through the mediator, is represented by \(c'\): \[ \text{Direct effect} = c' \]

These effects shed light on the pathways through which \(X\) influences \(Y\) and the role of \(M\) as a mediator in this relationship.

To conduct a bootstrapping mediation analysis within a path analysis framework, we will utilize the lavaan package in R. This package is specifically designed for structural equation modeling (SEM).

library(lavaan)

You’ll need to define your model in lavaan’s syntax. In our hypothetical example, we are interested in the mediation effect of communication frequency (comm_freq) on the relationship between criticism (criticism) and relationship satisfaction (rel_sat).

mediation <- 'rel_sat ~ a*comm_freq + b*criticism
              comm_freq ~ c*criticism
              
  indirect := a*b
  total := c + (a*b)
  proportion := indirect/total

'

mediationout <- sem(mediation, data=data)
summary(mediationout)

## lavaan 0.6.16 ended normally after 1 iteration
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         5
## 
##   Number of observations                           200
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   rel_sat ~                                           
##     comm_freq  (a)    0.611    0.039   15.863    0.000
##     criticism  (b)   -0.048    0.076   -0.632    0.527
##   comm_freq ~                                         
##     criticism  (c)    0.119    0.139    0.858    0.391
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .rel_sat           1.235    0.123   10.000    0.000
##    .comm_freq         4.162    0.416   10.000    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     indirect         -0.029    0.046   -0.630    0.529
##     total             0.090    0.146    0.614    0.539
##     proportion       -0.325    0.849   -0.383    0.702

set.seed(100)
mediationbootout <- sem(mediation, data=data, test="bootstrap", bootstrap=100)
summary(mediationbootout)

## lavaan 0.6.16 ended normally after 1 iteration
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         5
## 
##   Number of observations                           200
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   rel_sat ~                                           
##     comm_freq  (a)    0.611    0.039   15.863    0.000
##     criticism  (b)   -0.048    0.076   -0.632    0.527
##   comm_freq ~                                         
##     criticism  (c)    0.119    0.139    0.858    0.391
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .rel_sat           1.235    0.123   10.000    0.000
##    .comm_freq         4.162    0.416   10.000    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     indirect         -0.029    0.046   -0.630    0.529
##     total             0.090    0.146    0.614    0.539
##     proportion       -0.325    0.849   -0.383    0.702

summary(mediationbootout, fit.measures = TRUE, standardize=TRUE, rsquare=TRUE, ci = TRUE)

## lavaan 0.6.16 ended normally after 1 iteration
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         5
## 
##   Number of observations                           200
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Model Test Baseline Model:
## 
##   Test statistic                               163.695
##   Degrees of freedom                                 3
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -731.271
##   Loglikelihood unrestricted model (H1)       -731.271
##                                                       
##   Akaike (AIC)                                1472.542
##   Bayesian (BIC)                              1489.033
##   Sample-size adjusted Bayesian (SABIC)       1473.193
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value H_0: RMSEA <= 0.050                       NA
##   P-value H_0: RMSEA >= 0.080                       NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##   rel_sat ~                                                             
##     comm_freq  (a)    0.611    0.039   15.863    0.000    0.536    0.687
##     criticism  (b)   -0.048    0.076   -0.632    0.527   -0.196    0.100
##   comm_freq ~                                                           
##     criticism  (c)    0.119    0.139    0.858    0.391   -0.153    0.391
##    Std.lv  Std.all
##                   
##     0.611    0.748
##    -0.048   -0.030
##                   
##     0.119    0.061
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .rel_sat           1.235    0.123   10.000    0.000    0.993    1.477
##    .comm_freq         4.162    0.416   10.000    0.000    3.346    4.977
##    Std.lv  Std.all
##     1.235    0.443
##     4.162    0.996
## 
## R-Square:
##                    Estimate
##     rel_sat           0.557
##     comm_freq         0.004
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##     indirect         -0.029    0.046   -0.630    0.529   -0.120    0.062
##     total             0.090    0.146    0.614    0.539   -0.197    0.376
##     proportion       -0.325    0.849   -0.383    0.702   -1.990    1.339
##    Std.lv  Std.all
##    -0.029   -0.022
##     0.090    0.038
##    -0.325   -0.582

In the output, you’ll want to pay close attention to the estimates for a (effect of criticism on communication frequency), b (effect of communication frequency on relationship satisfaction controlling for criticism), and especially the indirect_effect (product of a and b), along with its bootstrapped confidence intervals. A mediation effect is considered statistically significant if its confidence interval does not include zero.