Using processr

Mark White | markhwhiteii@gmail.com

2018-01-12

Introduction

The processr package aims to be a user-friendly way to perform moderation, mediation, and moderated mediation in R. Andrew Hayes created the famous PROCESS macro for SPSS and SAS users. As more and more people switch over to using R, a number of packages have been written to do the type of analyses that PROCESS uses. I found myself opting for creating my own scripts to do these, instead of relying on packages already out there, so I thought I would write them up as functions and put them into a package.

processr depends on two other packages: broom, which is used to tidy some results, and lavaan, which is what actually runs the analyses. All this package does is makes it quicker (on the user input end) to run these models by automating the way in which the model code is written. If you are well-versed in lavaan, you might not have any trouble writing this code yourself; I just found myself writing the same code over and over again, so I put it into functions.

You can download this package by installing the devtools package, if you don’t already have it, with the code install.packages("devtools"). Then, you run devtools::install_github("markhwhiteii/processr") to actually download processr.

What does this package cover?

I named all functions after their PROCESS model number. Right now, it allows for people to run Model 1 (moderation and simple slope analyses), Model 4 (simple mediation), Model 7 (first-stage moderated mediation), and Model 14 (second-stage moderated mediation). Model 1 does a simple two-way interaction; all that I have added is the automated output of simple slopes analyses. Models 4, 7, and 14 are all written based on the equations provided in Hayes’s (2015) paper introducing his index of moderated mediation. Models 7 and 14 also report conditional indirect effects.

Preparation

lavaan requires you to use continuous inputs. Any dichotomous variables that are as of class factor will have to be converted to a numeric variable where the two levels must be coded as 0 or 1 (note that, as of right now, processr does not support categorical variables with more than two levels). processr::make_numeric will convert a specified variable into this 0 and 1 format for you. Consider the following data frame (all examples provide code for you to generate the data in the example on your own):

df1 <- data.frame(var1 = 1:5, var2 = 51:55, gender = factor(c("M", "F", "M", "F", "F")))
kable(df1)
var1 var2 gender
1 51 M
2 52 F
3 53 M
4 54 F
5 55 F

Note that gender is a factor variable. To convert gender to a numeric variable where female is the reference category, we would specify:

df2 <- make_numeric(var = "gender", ref = "F", newvar = "gender_num", data = df1)
kable(df2)
var1 var2 gender gender_num
1 51 M 1
2 52 F 0
3 53 M 1
4 54 F 0
5 55 F 0

We would then use df2 as the data in all of our analyses.

Model 1

Model 1 is a simple interaction, such as specifying lm(dv ~ iv * mod, data). processr::model1 will run this model, return the coefficients table, and run the simple slopes analyses. If you enter in a moderator variable that is just 0s and 1s, it will return the simple slopes when the moderator equals 0 and when it equals 1. If you specify a continuous variable, then it will return the simple slopes at the typical one standard deviation below the mean, at the mean, and one standard deviation above the mean of the moderator. If you want it at another value, read the documentation for processr::simple_slope, which allows you to specify a custom value of the moderator.

Consider a study where you think there is a relationship between var1 and var2, but only in one condition, cond. If your data support your hypothesis, the data might look something like:

set.seed(1839)
var1 <- rnorm(100)
cond <- rbinom(100, 1, .5)
var2 <- var1 * cond + rnorm(100)
df3 <- data.frame(var1, var2, cond)
kable(head(df3))
var1 var2 cond
1.0127014 0.9658409 0
-0.6845605 0.3072628 1
0.3492607 -0.1172101 1
-1.6245010 1.0012646 0
-0.5162476 -0.3680959 0
-0.7025836 -1.2563921 1

To test your hypothesis, you would run:

mod1result <- model1(iv = "var1", dv = "var2", mod = "cond", data = df3)
kable(mod1result)
term estimate std.error statistic p.value
intercept 0.1333021 0.1455496 0.9158533 0.3620387
var1 0.0695562 0.1562653 0.4451162 0.6572379
cond -0.1730380 0.2000502 -0.8649730 0.3892099
interaction 0.8543593 0.2127935 4.0149685 0.0001180
when cond = 0 0.0695562 0.1562653 0.4451162 0.6572379
when cond = 1 0.9239155 0.1444377 6.3966382 0.0000000

The first four rows give you the typical regression output: the intercept (intercept), the main effects (var1 and cond), and the interaction you are testing between the dv and the mod (in this case, interaction is the interaction between var1 and cond). Below, you can also see the simple slope of var1 when cond = 0 and cond = 1. This output shows us that the relationship between var1 and var2 is significant in condition 1, b = .92, SE = 0.14, t(96) = 6.40, p < .001; however, the relationship between the two in condition 0 is not, b = .07, SE = 0.16, t(96) = 0.45, p = .657.

If we make iv = cond and mod = var1, we can see the effect of condition at three levels of var1:

mod1result2 <- model1(iv = "cond", dv = "var2", mod = "var1", data = df3)
kable(mod1result2)
term estimate std.error statistic p.value
intercept 0.1333021 0.1455496 0.9158533 0.3620387
cond -0.1730380 0.2000502 -0.8649730 0.3892099
var1 0.0695562 0.1562653 0.4451162 0.6572379
interaction 0.8543593 0.2127935 4.0149685 0.0001180
when var1 = -1.102 -1.1146022 0.2769619 -4.0243884 0.0001140
when var1 = -0.173 -0.3204243 0.1962185 -1.6329971 0.1057454
when var1 = 0.757 0.4737536 0.2802652 1.6903762 0.0942007

Since the moderator isn’t only 0s and 1s, it now returns the effect of the iv on the dv at -1SD, M, and +1SD of mod.

Model 4

Model 4 tests simple mediation. Consider a study where you think a treatment affects an outcome through some mechanism. Your data frame might look like this:

set.seed(1839)
treatment <- rbinom(200, 1, .5)
mechanism <- treatment + rnorm(200, 0, 2)
outcome <- treatment + mechanism + rnorm(200, 0, 2)
df4 <- data.frame(treatment, mechanism, outcome)
kable(head(df4))
treatment mechanism outcome
1 -0.9639840 -0.7595236
1 1.7490348 3.6130118
0 -0.1637910 1.1407326
1 0.8352562 2.5142448
1 3.2953317 4.3602544
1 -0.3560048 0.2026814

To test for mediation, one would use processr::model4:

set.seed(1839)
mod4result <- model4(iv = "treatment", dv = "outcome", med = "mechanism", data = df4)

Your argument to samples will tell the function how many bias-corrected bootstrap resamples to do for confidence intervals. It defaults to 5000. Note also that the estimation method for model4, model7, and model14 are all maximum likelihood; additionally, bootstrapping relies on random number generation, so to make your results reproducible, you should specify a seed with set.seed(). Your result looks like:

kable(mod4result)
label est se z pvalue ci.lower ci.upper
a 1.758791 0.2831751 6.210965 0.0000000 1.2153246 2.326147
b 1.040059 0.0754065 13.792693 0.0000000 0.8863211 1.187440
cp 1.201219 0.3251238 3.694651 0.0002202 0.5719374 1.862893
ind 1.829246 NA NA NA 1.2557620 2.511538
c 3.030464 0.4179270 7.251181 0.0000000 2.2265496 3.855065

The labels a, b, cp, and c refer to the paths in a mediation model. The a-path is from the independent variable to the mediator; the b-path is from the mediator to the dependent variable. The c-path is broken into two pieces: cp stands for “c-prime,” often written as \(c'\) and referred to as the direct effect. It is the effect of the independent variable on the dependent variable, after controlling for the mediator. c refers to the effect of the independent variable on the dependent variable alone, not considering the mediator. This is often referred to as the total effect. Lastly, ind refers to the indirect effect. Note that it does not have an associated standard error, test statistic, or p-value. This notation is the same as in the aforementioned Hayes (2015) paper, as well as in the PROCESS templates. In the present example, we can see that each path is significant and so is the indirect effect (i.e., the confidence interval does not include zero).

Model 7 and Model 14

Models 7 and 14 refer to first-stage and second-stage moderated mediation, respectively. With Model 7, the interaction is on the a-path (between independent variable and mediator); with Model 14, it is on the b-path (between the mediator and the dependent variable). Neither of these models include an interaction on the c-path; these are models 8 and 15, respectively, and I hope to add them to the package soon.

Consider the hypothesis: watching a sad movie (variable named sadmovie, coded 1) as compared to a comedy (coded 0) increases how much people see the movie as realistic (variable named realistic), which in turn increases how much they liked it (variable named likedmovie). However, this only occurs for people who highly identify with the characters (variable named identify). One’s data might look like:

set.seed(1839)
sadmovie <- rbinom(200, 1, .5)
identify <- rnorm(200)
realistic <- sadmovie + identify + sadmovie * identify + rnorm(200, 0, 2)
likedmovie <- sadmovie + realistic + rnorm(200, 0, 2)
df5 <- data.frame(sadmovie, identify, realistic, likedmovie)
kable(head(df5))
sadmovie identify realistic likedmovie
1 -0.9819920 -1.7595236 -1.9850710
1 0.3745174 2.6130118 0.1823599
0 -0.0818955 1.2226281 2.5019982
1 -0.0823719 1.5142448 3.2935188
1 1.1476659 3.3602544 4.4885258
1 -0.6780024 -0.7973186 4.4237486

Model 7 and Model 14 would have the same inputs:

set.seed(1839)
mod7result <- model7(iv = "sadmovie", dv = "likedmovie", med = "realistic", mod = "identify", df5)

set.seed(1839)
mod14result <- model14(iv = "sadmovie", dv = "likedmovie", med = "realistic", mod = "identify", df5)

Let’s look at the outputs one at a time: model7 and then model14.

kable(mod7result)
label est se z pvalue ci.lower ci.upper
a1 1.2139498 0.2944865 4.122260 3.75e-05 0.6448535 1.8034826
a2 0.9309337 0.1963193 4.741937 2.10e-06 0.5380265 1.3133503
a3 1.3606263 0.3038171 4.478439 7.50e-06 0.7680835 1.9533038
b 0.8852709 0.0581143 15.233283 0.00e+00 0.7732910 1.0038464
cp 1.9627073 0.3217393 6.100304 0.00e+00 1.3076183 2.5654760
imm 1.2045229 NA NA NA 0.6823188 1.7412299
ind_lo -0.1081077 NA NA NA -0.8570683 0.6699658
ind_mn 1.1036861 NA NA NA 0.5539136 1.7023036
ind_hi 2.3154799 NA NA NA 1.5293412 3.1520620

Since this is first-stage moderated mediation, we have three a-paths: a1 is often called \(a_1\), which is the coefficient of regressing the mediator on the independent variable; a2 is often called \(a_2\), which is the coefficient of regressing the mediator on the moderator; lastly, a3 (or \(a_3\)) is the interaction between the moderator and the independent variable on the mediator. b and cp retain the same meaning as before. imm is the index of moderated mediation, which tests if moderated mediation is present. ind_lo, ind_mn, and ind_hi refer to the indirect effect (mediation model) at -1 SD, M, and +1 SD of the moderator. As we can see, moderated mediation is present, and the indirect effect is present for people at the mean (mn) and one standard deviation above the mean (hi) on the moderator. Again, we see that there are no standard errors, test-statistics, or p-values for the indirect effects analyses; refer to the bootstrapped confidence intervals instead. I often report a, b3, the imm, and then the conditional indirect effects (ind_lo, ind_mn, ind_hi).

kable(mod14result)
label est se z pvalue ci.lower ci.upper
a 1.8838220 0.3831332 4.9168851 0.0000009 1.1554294 2.6394754
b1 0.8784593 0.0711883 12.3399391 0.0000000 0.7454491 1.0237648
b2 -0.0697860 0.2250198 -0.3101329 0.7564599 -0.4754310 0.4098422
b3 0.1593535 0.3782722 0.4212669 0.6735602 -0.5696516 0.9110628
cp 1.9649305 0.3220118 6.1020444 0.0000000 1.3060376 2.5732479
imm 0.3001937 NA NA NA -1.0730669 1.7904738
ind_lo 1.3600858 NA NA NA -0.0734827 3.3154591
ind_mn 1.6620913 NA NA NA 0.9829477 2.5084015
ind_hi 1.9640968 NA NA NA 0.7249264 3.9160984

Since this is second-stage moderation, now there are three b-paths: b1 (or \(b_1\)) refers to the mediator on the dependent variable, b2 (or \(b_2\)) refers to the moderator on the dependent variable, and b3 (or \(b_3\)) is the interaction between the two on the dependent variable. The same interpretation is there for the rest of the labels. If the moderator is dichotomous, it will return conditional indirect effects when the moderator is 0 (ind_0) and 1 (ind_1).