Moderation

Jeong Eun Cheon

cheonje@yonsei.ac.kr

Preparing for the analysis
- Read in data
Moderation1 (Continous * Categorical -> Continuous)
Moderation2 (Continous * Continous -> Continuous)

Preparing for the analysis

Read in data

library(tidyverse)

## Warning: package 'ggplot2' was built under R version 4.3.2

data <- read_csv("Practice.csv")


data$criticism <- rowMeans(data[,c("criticism1","criticism2", "criticism3")], na.rm=TRUE)

Moderation1 (Continous * Categorical -> Continuous)

Conceptual Overview

Moderation analysis is used to explore whether the relationship between two variables changes across different levels of a third variable, known as the moderator. Let’s say we are interested in understanding how the association between communication frequency and relationship satisfaction differs for men and women. Specifically, we hypothesize that the association is stronger for women than for men.

moderation1 <- lm(rel_sat ~ comm_freq*gender, data)

summary(moderation1)

## 
## Call:
## lm(formula = rel_sat ~ comm_freq * gender, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.43130 -0.71326 -0.00504  0.56640  3.13379 
## 
## Coefficients:
##                       Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)            3.07578    0.19728  15.591 < 0.0000000000000002 ***
## comm_freq              0.49635    0.04200  11.817 < 0.0000000000000002 ***
## genderWomen           -0.21241    0.32402  -0.656                0.513    
## comm_freq:genderWomen  0.28740    0.06896   4.167            0.0000462 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9625 on 196 degrees of freedom
## Multiple R-squared:  0.6745, Adjusted R-squared:  0.6695 
## F-statistic: 135.4 on 3 and 196 DF,  p-value: < 0.00000000000000022

The interaction between communication frequency and gender is significant (\(B = 0.287\), \(SE = 0.068\), \(t = 4.167\), \(p < .001\)), indicating that the relationship between communication frequency and relationship satisfaction differs by gender.

Now, how can we decompose a significant moderation effect and conduct a simple slopes test?

Method 1: use a package

There are many packages available to estimate simple slopes at different levels of the moderator.

Let’s use the interactions package

library(interactions)

sim_slopes(model = moderation1,
                  pred = comm_freq,
                  modx = gender)

## Warning: Johnson-Neyman intervals are not available for factor moderators.

## SIMPLE SLOPES ANALYSIS 
## 
## Slope of comm_freq when gender = Men: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.50   0.04    11.82   0.00
## 
## Slope of comm_freq when gender = Women: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.78   0.05    14.33   0.00

Results

A moderation analysis was conducted to examine the effect of gender on the relationship between communication frequency and relationship satisfaction, using simple slopes analysis. Results indicated a significant moderation effect of gender.

For women, the slope of the relationship between communication frequency and relationship satisfaction was significantly positive, \(b = 0.78\), \(SE = 0.05\), \(t(198) = 14.33\), \(p < .001\). This suggests that for women, as communication frequency increases, relationship satisfaction also increases, and this relationship is statistically significant.

For men, the analysis also revealed a significantly positive slope, \(b = 0.50\), \(SE = 0.04\), \(t(198) = 11.82\), \(p < .001\). However, the magnitude of this relationship is less pronounced for men compared to women. This indicates that while an increase in communication frequency is associated with an increase in relationship satisfaction for men, the effect is stronger for women.

library(reghelper)

## 
## Attaching package: 'reghelper'

## The following object is masked from 'package:base':
## 
##     beta

simple_slopes(moderation1)

##   comm_freq gender Test Estimate Std. Error t value  df              Pr(>|t|)
## 1  2.181076 sstest        0.4144     0.1984  2.0884 196               0.03806
## 2      4.23 sstest        1.0033     0.1377  7.2840 196   0.00000000000771175
## 3  6.278924 sstest        1.5921     0.1962  8.1149 196   0.00000000000005254
## 4    sstest    Men        0.4964     0.0420 11.8175 196 < 0.00000000000000022
## 5    sstest  Women        0.7838     0.0547 14.3284 196 < 0.00000000000000022
##   Sig.
## 1    *
## 2  ***
## 3  ***
## 4  ***
## 5  ***

Method 2: Move the data point

Flip the reference group for the categorical variable gender from men to women. In the original coding, where men are the reference category (coded as 0), the coefficient for comm_freq represents the effect of communication frequency on relationship satisfaction for men, because it measures the change in the outcome variable (relationship satisfaction) for each unit increase in communication frequency when gender is at its reference level (men). But when you flip the reference category, making women the reference group (coded as 0), the interpretation of the coefficients shifts: The coefficient for comm_freq now directly represents the effect of communication frequency on relationship satisfaction for women.

data$gender2 <- factor(data$gender,levels=c("Women","Men"))

# Fit the model using the recoded gender variable
moderation_reversed <- lm(rel_sat ~ comm_freq * gender2, data = data)

# Summarize the model
summary(moderation_reversed)

## 
## Call:
## lm(formula = rel_sat ~ comm_freq * gender2, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.43130 -0.71326 -0.00504  0.56640  3.13379 
## 
## Coefficients:
##                      Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)           2.86338    0.25704  11.140 < 0.0000000000000002 ***
## comm_freq             0.78375    0.05470  14.328 < 0.0000000000000002 ***
## gender2Men            0.21241    0.32402   0.656                0.513    
## comm_freq:gender2Men -0.28740    0.06896  -4.167            0.0000462 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9625 on 196 degrees of freedom
## Multiple R-squared:  0.6745, Adjusted R-squared:  0.6695 
## F-statistic: 135.4 on 3 and 196 DF,  p-value: < 0.00000000000000022

Method 3: Do calculation

Our moderation model is:

\[ \text{relationship_satisfaction} = \beta_0 + \beta_1(\text{communication_frequency}) + \beta_2(\text{gender}) + \beta_3(\text{communication_frequency} \times \text{gender}) + \epsilon \]

Given this model, the simple slope of \(\text{communication_frequency}\) for each level of the moderator (gender) can be described as follows:

For Men (gender = 0):

\[ \text{Simple Slope for Men} = \beta_1 \]

For Women (gender = 1):

\[ \text{Simple Slope for Women} = \beta_1 + \beta_3 \]

# Extract model summary and variance-covariance matrix
vcov_mat <- vcov(moderation1)

# Calculate simple slopes
slope_men <- coef(moderation1)["comm_freq"]
slope_women <- coef(moderation1)["comm_freq"] + coef(moderation1)["comm_freq:genderWomen"]

# Calculate SEs
se_men <- sqrt(vcov_mat["comm_freq", "comm_freq"])
se_women <- sqrt(vcov_mat["comm_freq", "comm_freq"] + 
                  vcov_mat["comm_freq:genderWomen", "comm_freq:genderWomen"] + 
                  2 * vcov_mat["comm_freq", "comm_freq:genderWomen"])

# Calculate t-values
t_men <- slope_men / se_men
t_women <- slope_women / se_women

# Calculate p-values
p_men <- 2 * (1 - pt(abs(t_men), df = moderation1$df.residual))
p_women <- 2 * (1 - pt(abs(t_women), df = moderation1$df.residual))

# Display results
cat("Men: Slope =", slope_men, "SE =", se_men, "t =", t_men, "p =", p_men, "\n")

## Men: Slope = 0.4963526 SE = 0.0420015 t = 11.8175 p = 0

cat("Women: Slope =", slope_women, "SE =", se_women, "t =", t_women, "p =", p_women, "\n")

## Women: Slope = 0.7837505 SE = 0.05469899 t = 14.32843 p = 0

p.s. The formula for the variance of a sum \(A + B\) of two variables is:

\[ \text{Var}(A + B) = \text{Var}(A) + \text{Var}(B) + 2\text{Cov}(A, B) \]

where: - \(\text{Var}(A)\) is the variance of \(A\) (in this case, the variance of the coefficient for comm_freq). - \(\text{Var}(B)\) is the variance of \(B\) (the variance of the coefficient for the interaction term comm_freq:genderWomen). - \(\text{Cov}(A, B)\) is the covariance between \(A\) and \(B\) (the covariance between the coefficients of comm_freq and comm_freq:genderWomen).

The pt(x, df) function in R returns the probability that a t-distributed random variable with df degrees of freedom is less than or equal to x. This function is essential for calculating p-values from t-values, especially in a two-tailed test.

Visualization

data %>% 
  select(rel_sat, gender, comm_freq) %>% 
  ggplot(aes(x = comm_freq, y = rel_sat, color = gender)) +
  geom_point() +
  geom_smooth(method = "lm")

## `geom_smooth()` using formula = 'y ~ x'

probe_interaction (model = moderation1,
                  pred = comm_freq,
                  modx = gender,
                  x.label = "Communication Frequency",
                  y.label = "Relationship Satisfaction")

## Warning: Johnson-Neyman intervals are not available for factor moderators.

## SIMPLE SLOPES ANALYSIS 
## 
## Slope of comm_freq when gender = Men: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.50   0.04    11.82   0.00
## 
## Slope of comm_freq when gender = Women: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.78   0.05    14.33   0.00

interact_plot(moderation1, pred = comm_freq, modx = gender, interval = TRUE, int.width = 0.8, plot.points = TRUE)

Moderation2 (Continous * Continous -> Continuous)

Now let’s explore continuous-by-continuous interaction.

moderation2 <- lm(rel_sat ~ comm_freq*criticism, data)

summary(moderation2)

## 
## Call:
## lm(formula = rel_sat ~ comm_freq * criticism, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.02477 -0.78870 -0.08513  0.87375  3.13684 
## 
## Coefficients:
##                     Estimate Std. Error t value   Pr(>|t|)    
## (Intercept)          3.42183    0.73570   4.651 0.00000606 ***
## comm_freq            0.56389    0.15434   3.654   0.000332 ***
## criticism           -0.10000    0.18218  -0.549   0.583705    
## comm_freq:criticism  0.01195    0.03788   0.315   0.752721    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.122 on 196 degrees of freedom
## Multiple R-squared:  0.5575, Adjusted R-squared:  0.5507 
## F-statistic: 82.31 on 3 and 196 DF,  p-value: < 0.00000000000000022

The interaction between communication frequency and criticism is nonsignificant (\(B = 0.012\), \(SE = 0.038\), \(t = 0.315\), \(p = .753\)), indicating that the relationship between communication frequency and relationship satisfaction does not differ by the level of criticism. Although the interaction is nonsignificant, let’s try to decompose the interaction effect.

Method 1: use a package

Let’s use the interactions package once again.

library(interactions)

sim_slopes(model = moderation2,
                  pred = comm_freq,
                  modx = criticism)

## JOHNSON-NEYMAN INTERVAL 
## 
## When criticism is INSIDE the interval [-3.04, 13.61], the slope of
## comm_freq is p < .05.
## 
## Note: The range of observed values of criticism is [0.92, 6.59]
## 
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of comm_freq when criticism = 2.982936 (- 1 SD): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.60   0.05    11.26   0.00
## 
## Slope of comm_freq when criticism = 4.026169 (Mean): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.61   0.04    15.68   0.00
## 
## Slope of comm_freq when criticism = 5.069402 (+ 1 SD): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.62   0.06    10.81   0.00

Results

The analysis revealed that the effect of communication frequency on relationship satisfaction did not differ significantly across different levels of criticism.

Method 2: Move the data point

mean_criticism <- mean(data$criticism, na.rm = TRUE)
sd_criticism <- sd(data$criticism, na.rm = TRUE)

library(ggplot2)
# Create shifted versions of criticism for visualization
data$criticism_centered <- data$criticism - mean_criticism
data$criticism_high <- data$criticism_centered - sd_criticism
data$criticism_low <- data$criticism_centered + sd_criticism

# Plot histograms
library(dplyr)

data_long <- data %>% 
  dplyr::select(criticism_centered, criticism_high, criticism_low) %>%
  pivot_longer(cols = everything(), names_to = "Condition", values_to = "Criticism_Level")

# Plot
ggplot(data_long, aes(x=Criticism_Level, fill=Condition)) +
  geom_histogram(binwidth = 0.5, alpha=0.7, position = "identity") +
  scale_fill_manual(values = c("blue", "red", "green")) +
  labs(title="Distributions of Criticism: Centered, High, and Low", x="Criticism Level", y="Frequency") +
  theme_minimal() +
  facet_wrap(~Condition)

\[ \text{relationship_satisfaction} = \beta_0 + \beta_1 \times \text{communication_frequency} + \beta_2 \times \text{criticism} + \beta_3 \times (\text{communication_frequency} \times \text{criticism}) + \epsilon \]

When criticism is centered (or set) to 0, the model’s equation simplifies to include just the intercept (\(\beta_0\)) and the effect of communication frequency (\(\beta_1 \times \text{comm_freq}\)). This baseline model helps understand the relationship between comm_freq and rel_sat when the moderator criticism is at its mean level (after centering). This makes the slope of comm_freq directly interpretable at the mean level of criticism.

# For low level of criticism (-1 SD)
moderation2_low <- lm(rel_sat ~ comm_freq * criticism_low, data = data)

# For high level of criticism (+1 SD)
moderation2_high <- lm(rel_sat ~ comm_freq * criticism_high, data = data)

summary(moderation2_low)

## 
## Call:
## lm(formula = rel_sat ~ comm_freq * criticism_low, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.02477 -0.78870 -0.08513  0.87375  3.13684 
## 
## Coefficients:
##                         Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)              3.12354    0.24887  12.551 <0.0000000000000002 ***
## comm_freq                0.59955    0.05324  11.260 <0.0000000000000002 ***
## criticism_low           -0.10000    0.18218  -0.549               0.584    
## comm_freq:criticism_low  0.01195    0.03788   0.315               0.753    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.122 on 196 degrees of freedom
## Multiple R-squared:  0.5575, Adjusted R-squared:  0.5507 
## F-statistic: 82.31 on 3 and 196 DF,  p-value: < 0.00000000000000022

summary(moderation2_high)

## 
## Call:
## lm(formula = rel_sat ~ comm_freq * criticism_high, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.02477 -0.78870 -0.08513  0.87375  3.13684 
## 
## Coefficients:
##                          Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)               2.91490    0.27879  10.456 <0.0000000000000002 ***
## comm_freq                 0.62448    0.05775  10.813 <0.0000000000000002 ***
## criticism_high           -0.10000    0.18218  -0.549               0.584    
## comm_freq:criticism_high  0.01195    0.03788   0.315               0.753    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.122 on 196 degrees of freedom
## Multiple R-squared:  0.5575, Adjusted R-squared:  0.5507 
## F-statistic: 82.31 on 3 and 196 DF,  p-value: < 0.00000000000000022

Method 3: Do calculation

At the Mean: The slope of \(\text{comm_freq}\) at the mean level of criticism is represented as \(\beta_{\text{comm_freq@mean}} = \beta_1 + \beta_3 \cdot \bar{x}_{\text{criticism}}\).

At the High Level: The slope of \(\text{comm_freq}\) at 1 standard deviation above the mean of criticism is \(\beta_{\text{comm_freq@high}} = \beta_1 + \beta_3 \cdot (\bar{x}_{\text{criticism}} + \text{SD})\).

At the Low Level: The slope of \(\text{comm_freq}\) at 1 standard deviation below the mean of criticism is \(\beta_{\text{comm_freq@low}} = \beta_1 + \beta_3 \cdot (\bar{x}_{\text{criticism}} - \text{SD})\).

mean_criticism <- mean(data$criticism, na.rm = TRUE)
sd_criticism <- sd(data$criticism, na.rm = TRUE)

# Extract coefficients from the model
coeffs <- coef(moderation2)
beta_1 <- coeffs["comm_freq"]
beta_3 <- coeffs["comm_freq:criticism"]

# Calculate the slopes
slope_at_mean <- beta_1 + beta_3 * mean_criticism
slope_at_high <- beta_1 + beta_3 * (mean_criticism + sd_criticism)
slope_at_low <- beta_1 + beta_3 * (mean_criticism - sd_criticism)

vcov_mat <- vcov(moderation2)
# Calculating SE for the slope at mean criticism
se_slope_at_mean <- sqrt(vcov_mat["comm_freq", "comm_freq"] + 
                         mean_criticism^2 * vcov_mat["comm_freq:criticism", "comm_freq:criticism"] +
                         2 * mean_criticism * vcov_mat["comm_freq", "comm_freq:criticism"])

# Calculating t-value for the slope at mean criticism
t_value_slope_at_mean <- slope_at_mean / se_slope_at_mean

# Calculating p-value for the t-value
p_value_slope_at_mean <- 2 * (1 - pt(abs(t_value_slope_at_mean), df = moderation2$df.residual))


# High level of criticism (mean + 1 SD)
value_high <- mean_criticism + sd_criticism
se_slope_at_high <- sqrt(vcov_mat["comm_freq", "comm_freq"] + 
                         value_high^2 * vcov_mat["comm_freq:criticism", "comm_freq:criticism"] +
                         2 * value_high * vcov_mat["comm_freq", "comm_freq:criticism"])
slope_at_high <- beta_1 + beta_3 * value_high
t_value_slope_at_high <- slope_at_high / se_slope_at_high
p_value_slope_at_high <- 2 * (1 - pt(abs(t_value_slope_at_high), df = moderation2$df.residual))

# Low level of criticism (mean - 1 SD)
value_low <- mean_criticism - sd_criticism
se_slope_at_low <- sqrt(vcov_mat["comm_freq", "comm_freq"] + 
                        value_low^2 * vcov_mat["comm_freq:criticism", "comm_freq:criticism"] +
                        2 * value_low * vcov_mat["comm_freq", "comm_freq:criticism"])
slope_at_low <- beta_1 + beta_3 * value_low
t_value_slope_at_low <- slope_at_low / se_slope_at_low
p_value_slope_at_low <- 2 * (1 - pt(abs(t_value_slope_at_low), df = moderation2$df.residual))


cat("Low: Slope =", slope_at_low, "SE =", se_slope_at_low, "t =", t_value_slope_at_low, "p =", p_value_slope_at_low, "\n")

## Low: Slope = 0.5995464 SE = 0.05324423 t = 11.26031 p = 0

cat("Mean: Slope =", slope_at_mean, "SE =", se_slope_at_mean, "t =", t_value_slope_at_mean, "p =", p_value_slope_at_mean, "\n")

## Mean: Slope = 0.6120152 SE = 0.03902805 t = 15.68142 p = 0

cat("High: Slope =", slope_at_high, "SE =", se_slope_at_high, "t =", t_value_slope_at_high, "p =", p_value_slope_at_high, "\n")

## High: Slope = 0.6244841 SE = 0.05775215 t = 10.81317 p = 0

Visualization

interact_plot(moderation2, pred = comm_freq, modx = criticism, interval = TRUE, int.width = 0.8, plot.points = TRUE)