The processr package aims to be a user-friendly way to perform moderation, mediation, and moderated mediation in R. Andrew Hayes created the famous PROCESS macro for SPSS and SAS users. As more and more people switch over to using R, a number of packages have been written to do the type of analyses that PROCESS uses. I found myself opting for creating my own scripts to do these, instead of relying on packages already out there, so I thought I would write them up as functions and put them into a package.
processr depends on two other packages: broom, which is used to tidy some results, and lavaan, which is what actually runs the analyses. All this package does is makes it quicker (on the user input end) to run these models by automating the way in which the model code is written. If you are well-versed in lavaan, you might not have any trouble writing this code yourself; I just found myself writing the same code over and over again, so I put it into functions.
You can download this package by installing the devtools package, if you don’t already have it, with the code install.packages("devtools"). Then, you run devtools::install_github("markhwhiteii/processr") to actually download processr.
I named all functions after their PROCESS model number. Right now, it allows for people to run Model 1 (moderation and simple slope analyses), Model 4 (simple mediation), Model 7 (first-stage moderated mediation), and Model 14 (second-stage moderated mediation). Model 1 does a simple two-way interaction; all that I have added is the automated output of simple slopes analyses. Models 4, 7, and 14 are all written based on the equations provided in Hayes’s (2015) paper introducing his index of moderated mediation. Models 7 and 14 also report conditional indirect effects.
lavaan requires you to use continuous inputs. Any dichotomous variables that are as of class factor will have to be converted to a numeric variable where the two levels must be coded as 0 or 1 (note that, as of right now, processr does not support categorical variables with more than two levels). processr::make_numeric will convert a specified variable into this 0 and 1 format for you. Consider the following data frame (all examples provide code for you to generate the data in the example on your own):
df1 <- data.frame(var1 = 1:5, var2 = 51:55, gender = factor(c("M", "F", "M", "F", "F")))
kable(df1)| var1 | var2 | gender |
|---|---|---|
| 1 | 51 | M |
| 2 | 52 | F |
| 3 | 53 | M |
| 4 | 54 | F |
| 5 | 55 | F |
Note that gender is a factor variable. To convert gender to a numeric variable where female is the reference category, we would specify:
df2 <- make_numeric(var = "gender", ref = "F", newvar = "gender_num", data = df1)
kable(df2)| var1 | var2 | gender | gender_num |
|---|---|---|---|
| 1 | 51 | M | 1 |
| 2 | 52 | F | 0 |
| 3 | 53 | M | 1 |
| 4 | 54 | F | 0 |
| 5 | 55 | F | 0 |
We would then use df2 as the data in all of our analyses.
Model 1 is a simple interaction, such as specifying lm(dv ~ iv * mod, data). processr::model1 will run this model, return the coefficients table, and run the simple slopes analyses. If you enter in a moderator variable that is just 0s and 1s, it will return the simple slopes when the moderator equals 0 and when it equals 1. If you specify a continuous variable, then it will return the simple slopes at the typical one standard deviation below the mean, at the mean, and one standard deviation above the mean of the moderator. If you want it at another value, read the documentation for processr::simple_slope, which allows you to specify a custom value of the moderator.
Consider a study where you think there is a relationship between var1 and var2, but only in one condition, cond. If your data support your hypothesis, the data might look something like:
set.seed(1839)
var1 <- rnorm(100)
cond <- rbinom(100, 1, .5)
var2 <- var1 * cond + rnorm(100)
df3 <- data.frame(var1, var2, cond)
kable(head(df3))| var1 | var2 | cond |
|---|---|---|
| 1.0127014 | 0.9658409 | 0 |
| -0.6845605 | 0.3072628 | 1 |
| 0.3492607 | -0.1172101 | 1 |
| -1.6245010 | 1.0012646 | 0 |
| -0.5162476 | -0.3680959 | 0 |
| -0.7025836 | -1.2563921 | 1 |
To test your hypothesis, you would run:
mod1result <- model1(iv = "var1", dv = "var2", mod = "cond", data = df3)
kable(mod1result)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| intercept | 0.1333021 | 0.1455496 | 0.9158533 | 0.3620387 |
| var1 | 0.0695562 | 0.1562653 | 0.4451162 | 0.6572379 |
| cond | -0.1730380 | 0.2000502 | -0.8649730 | 0.3892099 |
| interaction | 0.8543593 | 0.2127935 | 4.0149685 | 0.0001180 |
| when cond = 0 | 0.0695562 | 0.1562653 | 0.4451162 | 0.6572379 |
| when cond = 1 | 0.9239155 | 0.1444377 | 6.3966382 | 0.0000000 |
The first four rows give you the typical regression output: the intercept (intercept), the main effects (var1 and cond), and the interaction you are testing between the dv and the mod (in this case, interaction is the interaction between var1 and cond). Below, you can also see the simple slope of var1 when cond = 0 and cond = 1. This output shows us that the relationship between var1 and var2 is significant in condition 1, b = .92, SE = 0.14, t(96) = 6.40, p < .001; however, the relationship between the two in condition 0 is not, b = .07, SE = 0.16, t(96) = 0.45, p = .657.
If we make iv = cond and mod = var1, we can see the effect of condition at three levels of var1:
mod1result2 <- model1(iv = "cond", dv = "var2", mod = "var1", data = df3)
kable(mod1result2)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| intercept | 0.1333021 | 0.1455496 | 0.9158533 | 0.3620387 |
| cond | -0.1730380 | 0.2000502 | -0.8649730 | 0.3892099 |
| var1 | 0.0695562 | 0.1562653 | 0.4451162 | 0.6572379 |
| interaction | 0.8543593 | 0.2127935 | 4.0149685 | 0.0001180 |
| when var1 = -1.102 | -1.1146022 | 0.2769619 | -4.0243884 | 0.0001140 |
| when var1 = -0.173 | -0.3204243 | 0.1962185 | -1.6329971 | 0.1057454 |
| when var1 = 0.757 | 0.4737536 | 0.2802652 | 1.6903762 | 0.0942007 |
Since the moderator isn’t only 0s and 1s, it now returns the effect of the iv on the dv at -1SD, M, and +1SD of mod.
Model 4 tests simple mediation. Consider a study where you think a treatment affects an outcome through some mechanism. Your data frame might look like this:
set.seed(1839)
treatment <- rbinom(200, 1, .5)
mechanism <- treatment + rnorm(200, 0, 2)
outcome <- treatment + mechanism + rnorm(200, 0, 2)
df4 <- data.frame(treatment, mechanism, outcome)
kable(head(df4))| treatment | mechanism | outcome |
|---|---|---|
| 1 | -0.9639840 | -0.7595236 |
| 1 | 1.7490348 | 3.6130118 |
| 0 | -0.1637910 | 1.1407326 |
| 1 | 0.8352562 | 2.5142448 |
| 1 | 3.2953317 | 4.3602544 |
| 1 | -0.3560048 | 0.2026814 |
To test for mediation, one would use processr::model4:
set.seed(1839)
mod4result <- model4(iv = "treatment", dv = "outcome", med = "mechanism", data = df4)Your argument to samples will tell the function how many bias-corrected bootstrap resamples to do for confidence intervals. It defaults to 5000. Note also that the estimation method for model4, model7, and model14 are all maximum likelihood; additionally, bootstrapping relies on random number generation, so to make your results reproducible, you should specify a seed with set.seed(). Your result looks like:
kable(mod4result)| label | est | se | z | pvalue | ci.lower | ci.upper |
|---|---|---|---|---|---|---|
| a | 1.758791 | 0.2831751 | 6.210965 | 0.0000000 | 1.2153246 | 2.326147 |
| b | 1.040059 | 0.0754065 | 13.792693 | 0.0000000 | 0.8863211 | 1.187440 |
| cp | 1.201219 | 0.3251238 | 3.694651 | 0.0002202 | 0.5719374 | 1.862893 |
| ind | 1.829246 | NA | NA | NA | 1.2557620 | 2.511538 |
| c | 3.030464 | 0.4179270 | 7.251181 | 0.0000000 | 2.2265496 | 3.855065 |
The labels a, b, cp, and c refer to the paths in a mediation model. The a-path is from the independent variable to the mediator; the b-path is from the mediator to the dependent variable. The c-path is broken into two pieces: cp stands for “c-prime,” often written as \(c'\) and referred to as the direct effect. It is the effect of the independent variable on the dependent variable, after controlling for the mediator. c refers to the effect of the independent variable on the dependent variable alone, not considering the mediator. This is often referred to as the total effect. Lastly, ind refers to the indirect effect. Note that it does not have an associated standard error, test statistic, or p-value. This notation is the same as in the aforementioned Hayes (2015) paper, as well as in the PROCESS templates. In the present example, we can see that each path is significant and so is the indirect effect (i.e., the confidence interval does not include zero).
Models 7 and 14 refer to first-stage and second-stage moderated mediation, respectively. With Model 7, the interaction is on the a-path (between independent variable and mediator); with Model 14, it is on the b-path (between the mediator and the dependent variable). Neither of these models include an interaction on the c-path; these are models 8 and 15, respectively, and I hope to add them to the package soon.
Consider the hypothesis: watching a sad movie (variable named sadmovie, coded 1) as compared to a comedy (coded 0) increases how much people see the movie as realistic (variable named realistic), which in turn increases how much they liked it (variable named likedmovie). However, this only occurs for people who highly identify with the characters (variable named identify). One’s data might look like:
set.seed(1839)
sadmovie <- rbinom(200, 1, .5)
identify <- rnorm(200)
realistic <- sadmovie + identify + sadmovie * identify + rnorm(200, 0, 2)
likedmovie <- sadmovie + realistic + rnorm(200, 0, 2)
df5 <- data.frame(sadmovie, identify, realistic, likedmovie)
kable(head(df5))| sadmovie | identify | realistic | likedmovie |
|---|---|---|---|
| 1 | -0.9819920 | -1.7595236 | -1.9850710 |
| 1 | 0.3745174 | 2.6130118 | 0.1823599 |
| 0 | -0.0818955 | 1.2226281 | 2.5019982 |
| 1 | -0.0823719 | 1.5142448 | 3.2935188 |
| 1 | 1.1476659 | 3.3602544 | 4.4885258 |
| 1 | -0.6780024 | -0.7973186 | 4.4237486 |
Model 7 and Model 14 would have the same inputs:
set.seed(1839)
mod7result <- model7(iv = "sadmovie", dv = "likedmovie", med = "realistic", mod = "identify", df5)
set.seed(1839)
mod14result <- model14(iv = "sadmovie", dv = "likedmovie", med = "realistic", mod = "identify", df5)Let’s look at the outputs one at a time: model7 and then model14.
kable(mod7result)| label | est | se | z | pvalue | ci.lower | ci.upper |
|---|---|---|---|---|---|---|
| a1 | 1.2139498 | 0.2944865 | 4.122260 | 3.75e-05 | 0.6448535 | 1.8034826 |
| a2 | 0.9309337 | 0.1963193 | 4.741937 | 2.10e-06 | 0.5380265 | 1.3133503 |
| a3 | 1.3606263 | 0.3038171 | 4.478439 | 7.50e-06 | 0.7680835 | 1.9533038 |
| b | 0.8852709 | 0.0581143 | 15.233283 | 0.00e+00 | 0.7732910 | 1.0038464 |
| cp | 1.9627073 | 0.3217393 | 6.100304 | 0.00e+00 | 1.3076183 | 2.5654760 |
| imm | 1.2045229 | NA | NA | NA | 0.6823188 | 1.7412299 |
| ind_lo | -0.1081077 | NA | NA | NA | -0.8570683 | 0.6699658 |
| ind_mn | 1.1036861 | NA | NA | NA | 0.5539136 | 1.7023036 |
| ind_hi | 2.3154799 | NA | NA | NA | 1.5293412 | 3.1520620 |
Since this is first-stage moderated mediation, we have three a-paths: a1 is often called \(a_1\), which is the coefficient of regressing the mediator on the independent variable; a2 is often called \(a_2\), which is the coefficient of regressing the mediator on the moderator; lastly, a3 (or \(a_3\)) is the interaction between the moderator and the independent variable on the mediator. b and cp retain the same meaning as before. imm is the index of moderated mediation, which tests if moderated mediation is present. ind_lo, ind_mn, and ind_hi refer to the indirect effect (mediation model) at -1 SD, M, and +1 SD of the moderator. As we can see, moderated mediation is present, and the indirect effect is present for people at the mean (mn) and one standard deviation above the mean (hi) on the moderator. Again, we see that there are no standard errors, test-statistics, or p-values for the indirect effects analyses; refer to the bootstrapped confidence intervals instead. I often report a, b3, the imm, and then the conditional indirect effects (ind_lo, ind_mn, ind_hi).
kable(mod14result)| label | est | se | z | pvalue | ci.lower | ci.upper |
|---|---|---|---|---|---|---|
| a | 1.8838220 | 0.3831332 | 4.9168851 | 0.0000009 | 1.1554294 | 2.6394754 |
| b1 | 0.8784593 | 0.0711883 | 12.3399391 | 0.0000000 | 0.7454491 | 1.0237648 |
| b2 | -0.0697860 | 0.2250198 | -0.3101329 | 0.7564599 | -0.4754310 | 0.4098422 |
| b3 | 0.1593535 | 0.3782722 | 0.4212669 | 0.6735602 | -0.5696516 | 0.9110628 |
| cp | 1.9649305 | 0.3220118 | 6.1020444 | 0.0000000 | 1.3060376 | 2.5732479 |
| imm | 0.3001937 | NA | NA | NA | -1.0730669 | 1.7904738 |
| ind_lo | 1.3600858 | NA | NA | NA | -0.0734827 | 3.3154591 |
| ind_mn | 1.6620913 | NA | NA | NA | 0.9829477 | 2.5084015 |
| ind_hi | 1.9640968 | NA | NA | NA | 0.7249264 | 3.9160984 |
Since this is second-stage moderation, now there are three b-paths: b1 (or \(b_1\)) refers to the mediator on the dependent variable, b2 (or \(b_2\)) refers to the moderator on the dependent variable, and b3 (or \(b_3\)) is the interaction between the two on the dependent variable. The same interpretation is there for the rest of the labels. If the moderator is dichotomous, it will return conditional indirect effects when the moderator is 0 (ind_0) and 1 (ind_1).