Mediation & Path Analysis Using Lavaan

You will need the following packages to succesfully complete this assignment:

library(lavaan)

Data

Download the attached data file which consists of a categorical grouping variable, and variables grades, SES, self-esteem, and happiness

data<- read.table("school.dat", quote="\"", comment.char="")

Now that we have imported our data, let’s change the names in the dataset so it’s easier to construct our model:

The ‘names’ command makes it easier when we have only a few variables in our dataset that we would like to re-name. Pay attention that the names are listed in the correct order though!

names(data)<- c("Group","Grades","SES","SelfEsteem","Happiness")
as.factor(data$Group) #tell R that 'Group' is categorical

Q1) Run a Mediation Analysis with bootstrapping and standard errors. Does self-esteem mediate the association between grades and happiness?

Some Lavaan Basics to Know for Mediation:

The ‘:=’ allows you to define a parameter by saying ‘param x’ has to equal the sum of product of other estimated parameters in your model.
- This is usual for when we want to see what our indirect and total effects are.
- You’ll also notice that the paths of interest (a,b,& c) are labeled. These are arbitrary names, and can realistically be anything you want (though for this purpose, it makes sense to label them based on the mediation parameters).

mediation.model <- ' 
             # mediator
             SelfEsteem ~ a*Grades
             Happiness ~ b*SelfEsteem 
           
             # direct effect
             Happiness ~  c*Grades

           # indirect effect (a*b)
             ab := a*b

           # total effect
             total := c + (a*b)
         '

To extract bootstrapped SE’s, we simply add the command se = “boot”, and can then also specify the number of boostrap’s we want with bootstrap = x (I believe mPlus defaults to 5000 or so, but for this demo, I set it to ‘500’).

Notice, we also are using the ‘sem’ command in Lavaan instead of ‘cfa.’ The commands are similar, but have different default values/functions. For path analysis/mediation purposes, we want to use ‘sem.’

fit <- sem(mediation.model, data = data, meanstructure = TRUE,
           se = "boot", bootstrap = 500)

Extract the results:

We also want to obtain the confidence intervals for our different effects, so in the summary statement, we can include ‘ci=TRUE.’
If zero is not in the confidence interval for the indirect effect, then the researcher can be confident that the indirect effect is different from zero.

summary(fit, fit.measures=T, standardized=T, ci=TRUE)

## lavaan 0.6-3 ended normally after 21 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          7
## 
##   Number of observations                         12080
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                       0.000
##   Degrees of freedom                                 0
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic            10721.558
##   Degrees of freedom                                 3
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -63045.523
##   Loglikelihood unrestricted model (H1)     -63045.523
## 
##   Number of free parameters                          7
##   Akaike (AIC)                              126105.047
##   Bayesian (BIC)                            126156.842
##   Sample-size adjusted Bayesian (BIC)       126134.597
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent Confidence Interval          0.000  0.000
##   P-value RMSEA <= 0.05                             NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard Errors                            Bootstrap
##   Number of requested bootstrap draws              500
##   Number of successful bootstrap draws             497
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##   SelfEsteem ~                                                          
##     Grades     (a)    0.550    0.008   72.039    0.000    0.534    0.566
##   Happiness ~                                                           
##     SelfEsteem (b)    0.262    0.004   63.372    0.000    0.254    0.270
##     Grades     (c)    0.024    0.004    6.098    0.000    0.016    0.032
##    Std.lv  Std.all
##                   
##     0.550    0.561
##                   
##     0.262    0.599
##     0.024    0.056
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .SelfEsteem       19.911    0.220   90.652    0.000   19.459   20.359
##    .Happiness         7.022    0.132   53.330    0.000    6.749    7.261
##    Std.lv  Std.all
##    19.911    3.206
##     7.022    2.586
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .SelfEsteem       26.406    0.377   70.066    0.000   25.675   27.199
##    .Happiness         4.431    0.092   48.210    0.000    4.254    4.631
##    Std.lv  Std.all
##    26.406    0.685
##     4.431    0.601
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##     ab                0.144    0.003   45.527    0.000    0.138    0.151
##     total             0.168    0.004   41.827    0.000    0.159    0.176
##    Std.lv  Std.all
##     0.144    0.336
##     0.168    0.392

Results from Q1

Our indirect effect confidence interval does not include 0, meaning it does appear we have evidence for self-esteem mediating the relationship between grades and happiness.

Q2) Is the mediation moderated by group? If so, which path shows moderation by group? Use path analysis (PA) with bootstrapping to report the moderation effect.

Steps to see if a Grouping Variable Moderates Mediation

1.) Run the PA model with bootstrapping to test for indirect (mediation) effects.

We have already done this from model 1 and found evidence for mediation, so we can proceed to step 2.

2.) Run a Multi-Group PA model and simultaneously compare all paths across groups. If the overall Wald Test is significant, then we conclude our grouping vbl is a moderator.

3.) If overall Wald Test is significant (from step 2), separately test individual paths to see which paths [relationship(s)] the grouping variable moderates (1 df Wald tests).

Modifications to Lavaan Syntax

We will use similar model syntax from above, but add constraints that correspond to either group 1 or group 2 so we can directly test which paths (if any) differ by group.

To do this, we simply create arbitrary names for both groups and tell R that we are doing this through the ‘c’ function in R.
For example, we label the ‘a’ parameter from our initial model as ‘c(ag1, ag2)’ – I simply added the ‘g1’ and ‘g2’ after ‘a’ to differentiate group 1 versus group 2. Lavaan knows which group corresponds to what because later in the code, we will define our ‘Grouping’ variable. Again, these labels are completely arbitrary - you can make them whatever you want, I would just recommend doing something that is consistent and easy for you to look back on so you know what parameter corresponds to what group.
- The point of these labels will be used later for our series of Wald Tests. As a reminder, Wald Tests allow us to test if a path coefficient differs by group.

Model Syntax with Grouping Specific Constraints:

model.group <- ' 
             # mediator
             SelfEsteem ~ c(ag1,ag2)*Grades
             Happiness ~ c(bg1,bg2)*SelfEsteem 
           
             # direct effect
             Happiness ~  c(cg1, cg2)*Grades

           # indirect effect (a*b)
             abg1 := ag1*bg1 #group 1
             abg2 := ag2*bg2 #group 2

           # total effect
             totalg1 := cg1 + (ag1*bg1)
            totalg2 := cg2 + (ag2*bg2)
         '

Run the model:

Here to run the path model, the syntax is exactly the same, except now we are specifying what our ‘grouping’ variable is with the code ‘group = x.’ This should correspond to the specific variable name in your dataset that represents the grouping variable of interest, in this case ‘Group.’

fit.group <- sem(model.group, data = data, group="Group", 
                 se="boot", bootstrap=500, meanstructure = TRUE)

Extract Model Results:

summary(fit.group, fit.measures=T, standardized=T, ci=TRUE)

## lavaan 0.6-3 ended normally after 37 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         14
## 
##   Number of observations per group         
##   2                                               6181
##   1                                               5899
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                       0.000
##   Degrees of freedom                                 0
##   Minimum Function Value               0.0000000000000
## 
## Chi-square for each group:
## 
##   2                                              0.000
##   1                                              0.000
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic            10766.693
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -62970.994
##   Loglikelihood unrestricted model (H1)     -62970.994
## 
##   Number of free parameters                         14
##   Akaike (AIC)                              125969.989
##   Bayesian (BIC)                            126073.579
##   Sample-size adjusted Bayesian (BIC)       126029.089
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent Confidence Interval          0.000  0.000
##   P-value RMSEA <= 0.05                             NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard Errors                            Bootstrap
##   Number of requested bootstrap draws              500
##   Number of successful bootstrap draws             500
## 
## 
## Group 1 [2]:
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##   SelfEsteem ~                                                          
##     Grades   (ag1)    0.550    0.010   52.941    0.000    0.529    0.570
##   Happiness ~                                                           
##     SelfEstm (bg1)    0.245    0.005   47.213    0.000    0.235    0.255
##     Grades   (cg1)    0.026    0.005    5.081    0.000    0.016    0.037
##    Std.lv  Std.all
##                   
##     0.550    0.553
##                   
##     0.245    0.590
##     0.026    0.062
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .SelfEsteem       19.966    0.300   66.565    0.000   19.403   20.569
##    .Happiness         7.724    0.172   44.787    0.000    7.377    8.029
##    Std.lv  Std.all
##    19.966    3.214
##     7.724    2.998
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .SelfEsteem       26.803    0.494   54.220    0.000   25.786   27.785
##    .Happiness         4.029    0.122   33.146    0.000    3.816    4.298
##    Std.lv  Std.all
##    26.803    0.694
##     4.029    0.607
## 
## 
## Group 2 [1]:
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##   SelfEsteem ~                                                          
##     Grades   (ag2)    0.551    0.011   51.168    0.000    0.529    0.572
##   Happiness ~                                                           
##     SelfEstm (bg2)    0.280    0.007   42.842    0.000    0.266    0.292
##     Grades   (cg2)    0.022    0.006    3.927    0.000    0.011    0.033
##    Std.lv  Std.all
##                   
##     0.551    0.570
##                   
##     0.280    0.610
##     0.022    0.050
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .SelfEsteem       19.848    0.310   63.973    0.000   19.246   20.437
##    .Happiness         6.286    0.207   30.319    0.000    5.909    6.699
##    Std.lv  Std.all
##    19.848    3.198
##     6.286    2.210
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .SelfEsteem       25.984    0.478   54.362    0.000   25.016   26.866
##    .Happiness         4.776    0.146   32.798    0.000    4.483    5.076
##    Std.lv  Std.all
##    25.984    0.675
##     4.776    0.591
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##     abg1              0.135    0.004   34.687    0.000    0.127    0.142
##     abg2              0.154    0.005   32.971    0.000    0.145    0.163
##     totalg1           0.160    0.005   31.118    0.000    0.150    0.171
##     totalg2           0.176    0.006   31.479    0.000    0.165    0.187
##    Std.lv  Std.all
##     0.135    0.326
##     0.154    0.348
##     0.160    0.389
##     0.176    0.398

Overall (Omnibus) Wald-Test

Now that our model with the grouping component is ran, we will use the ‘lavTestWald’ command to test for overall group differences in our mediation analysis.

In Lavaan, there are a few ways to do this. However, the ‘lavTestWald’ command by default will only test the difference of two paths at a time unless we specify all of our constraints of interest beforehand as seperate syntax.
Below, I specify each path of interest (‘a’, ‘b’, & ‘c’) from both groups by telling Lavaan I want to test if the ‘a’ path of group 1 is equivalent to the ‘a’ path of group 2. Remember, these path names need to correspond to whatever you labeled these constraints in your group PA model, and should parallel one another (i.e., you would not test if path b of group 1 is equivalent to path c of group 2).
- the ‘==’ tells Lavaan that we are testing if these two paths are equal.

Defining constraints

#Define object similarly to a normal lavaan model, but just telling Lavaan these are the constraints we are interested in testing simultaneously. 
all.constraints<- 'ag1 == ag2
                  bg1 == bg2
                  cg1 == cg2'

Conducting the Omnibus Wald Test:

Now that all our constraints are defined above as the object ‘all.constraints,’ we use the ‘lavTestWald’ command.
To do so, we simply feed in our defined constraint object as well as the fitted lavaan object (in this case, ‘fit.group’).

lavTestWald(fit.group, #the name of the Lavaan 'fitted' object
            constraints = all.constraints) #the name of our previously specified paths that we would like to test

## $stat
## [1] 20.93889
## 
## $df
## [1] 3
## 
## $p.value
## [1] 0.0001083976
## 
## $se
## [1] "bootstrap"

Results suggest that ‘Group’ is an overall significant moderator (P=.00017), so we now proceed to step 3 to determine which paths in particular differ between group 1 & 2.

Testing for Specific Group Differences Between Paths - Individual Wald Tests:

We use similar logic from the omnibus Wald test, except now we don’t need to pre-define an object with our constraints. Instead, we can just specify which two paths we would like to test again with the ‘==’ syntax plus whatever the names of our constraint we are testing is. (Again, these paths names must correspond to our labels from the ‘fit.group’ PA model).

Testing if ‘a’ path differs:

No significant difference based on Wald Test (P = 0.959) for ‘a’ path.

lavTestWald(fit.group, constraints = "ag1==ag2")

## $stat
## [1] 0.002671969
## 
## $df
## [1] 1
## 
## $p.value
## [1] 0.9587748
## 
## $se
## [1] "bootstrap"

Testing if ‘b’ path differs:

Significant difference based on Wald Test (P < .001) for ‘b’ path.

lavTestWald(fit.group, constraints = "bg1==bg2")

## $stat
## [1] 17.37025
## 
## $df
## [1] 1
## 
## $p.value
## [1] 3.076034e-05
## 
## $se
## [1] "bootstrap"

Testing if ‘c’ path differs:

No difference based on Wald Test (P = .626) for ‘c’ path.

lavTestWald(fit.group, constraints = "cg1==cg2")

## $stat
## [1] 0.2695051
## 
## $df
## [1] 1
## 
## $p.value
## [1] 0.603664
## 
## $se
## [1] "bootstrap"

Results from Q2

Overall, it appears that our ‘Group’ variable is a significant moderator on self-esteem mediating the relationship for grades and happiness.

Results suggest that ‘group’ moderated this mediation specifically on the ‘b’ path (self-esteem predicting happiness - i.e., our mediator predicting the DV).

Now if we wanted, we could re-run the mediation PA model again without groups being seperated out, and just add our grouping variable as a moderator specifically on the ‘b’ path of our mediation. To do so, we make some minor changes to the model syntax below:

fullmod.mediation <- ' 
          # mediator 
             SelfEsteem ~ a*Grades
             Happiness ~ b*SelfEsteem + w*Group #adding group variable here (called w)
           
          #define moderator (Z - group with b path)
              Z := w*b
              
          # direct effect
             Happiness ~ c*Grades

          # indirect effect (a*b)
             ab := a*b
      
          # total effect
             total := c + (a*b)
         '

Run the model: (remember, increase # of bootstraps in practice)

Code here is the same as previous models, just now with no grouping vbl specified.

full.medmod<-sem(fullmod.mediation, data=data, se="boot", 
                 bootstrap=100, meanstructure = TRUE)

Extract the results:

summary(full.medmod, fit.measures=T, stand=T, ci=T)

## lavaan 0.6-3 ended normally after 25 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          8
## 
##   Number of observations                         12080
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                       1.071
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.301
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic            10796.939
##   Degrees of freedom                                 5
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -63008.369
##   Loglikelihood unrestricted model (H1)     -63007.833
## 
##   Number of free parameters                          8
##   Akaike (AIC)                              126032.738
##   Bayesian (BIC)                            126091.932
##   Sample-size adjusted Bayesian (BIC)       126066.509
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.002
##   90 Percent Confidence Interval          0.000  0.024
##   P-value RMSEA <= 0.05                          1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.002
## 
## Parameter Estimates:
## 
##   Standard Errors                            Bootstrap
##   Number of requested bootstrap draws              100
##   Number of successful bootstrap draws             100
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##   SelfEsteem ~                                                          
##     Grades     (a)    0.550    0.007   73.400    0.000    0.535    0.567
##   Happiness ~                                                           
##     SelfEsteem (b)    0.261    0.004   59.200    0.000    0.252    0.268
##     Group      (w)    0.330    0.037    8.876    0.000    0.268    0.398
##     Grades     (c)    0.024    0.004    6.660    0.000    0.016    0.031
##    Std.lv  Std.all
##                   
##     0.550    0.561
##                   
##     0.261    0.598
##     0.330    0.061
##     0.024    0.057
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .SelfEsteem       19.911    0.218   91.293    0.000   19.444   20.359
##    .Happiness         6.522    0.143   45.511    0.000    6.230    6.841
##    Std.lv  Std.all
##    19.911    3.206
##     6.522    2.403
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .SelfEsteem       26.406    0.339   77.952    0.000   25.690   27.082
##    .Happiness         4.404    0.086   51.102    0.000    4.216    4.583
##    Std.lv  Std.all
##    26.406    0.685
##     4.404    0.598
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##     Z                 0.086    0.010    8.671    0.000    0.069    0.106
##     ab                0.144    0.003   47.786    0.000    0.137    0.149
##     total             0.168    0.004   45.035    0.000    0.160    0.176
##    Std.lv  Std.all
##     0.086    0.036
##     0.144    0.336
##     0.168    0.392

Psychometrics - HW #9

Allie Choate