SEM Example using Lavaan

Lavaan

lvaan is a package in R that is designed for structural equation modeling.
You can use this even if you aren’t an R expert
It offers a lot of modeling options that are comparable to Mplus
Lots of great resources to learn lavaan
- http://lavaan.ugent.be/tutorial/index.html
Down sides
- Can’t do multilevel models
- Can’t do mixture models
- Can’t do Bayesian (use blavaan with lavaan)

Data

We will be using the PoliticalDemocracy data from the lavaan package, so we start by loading the package.

#install.packages("lavaan")
library(lavaan)

## This is lavaan 0.5-23.1097

## lavaan is BETA software! Please report any bugs.

Specify the model (figure)

Below is any a model from the lavaan tutorial website. You can see that there are three latent variables, one with three indicators and two with four indicators. Additionally, we’ve specified the hypothesized causal order.

CFA

Specify CFA for latent variables

The first step for fitting and SEM model is to fit each individual CFA model for each latent variable. We can easily do this in lavaan.

The first step is to specify each of the individual CFA models.

cfa1 <- '
    ind60 =~ x1 + x2 + x3'

cfa2 <- '
    dem60 =~ y1 + y2 + y3 + y4'

cfa3 <- '
    dem65 =~ y5 + y6 + y7 + y8'

cfa_all <- '
    ind60 =~ x1 + x2 + x3
    dem60 =~ y1 + y2 + y3 + y4
    dem65 =~ y5 + y6 + y7 + y8'

Fit CFA models

The next step in lavaan is to fit the models we specified above. We do this by using the cfa() function from lavaan.

fit_cfa1 <- cfa(model = cfa1, data = PoliticalDemocracy)

fit_cfa2 <- cfa(model = cfa2, data = PoliticalDemocracy)

fit_cfa3 <- cfa(model = cfa3, data = PoliticalDemocracy)

fit_cfa_all <- cfa(model = cfa_all, data = PoliticalDemocracy)

Above we are using two arguments from the cfa(). The first is model which we set equal to the model we specified above. The second argument is the data.

Output CFA results

Lastly, we get the output by using the summary() function. Because we have the lavaan package loaded, R knows to use the lavaan version summary(). If you didn’t have lavaan loaded, the summary() function wouldn’t work properly.

summary(fit_cfa1, fit.measures = TRUE)

## lavaan (0.5-23.1097) converged normally after  22 iterations
## 
##   Number of observations                            75
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                0.000
##   Degrees of freedom                                 0
##   Minimum Function Value               0.0000000000000
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              219.165
##   Degrees of freedom                                 3
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -241.345
##   Loglikelihood unrestricted model (H1)       -241.345
## 
##   Number of free parameters                          6
##   Akaike (AIC)                                 494.690
##   Bayesian (BIC)                               508.595
##   Sample-size adjusted Bayesian (BIC)          489.684
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent Confidence Interval          0.000  0.000
##   P-value RMSEA <= 0.05                             NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   ind60 =~                                            
##     x1                1.000                           
##     x2                2.193    0.142   15.403    0.000
##     x3                1.824    0.153   11.883    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .x1                0.084    0.020    4.140    0.000
##    .x2                0.108    0.074    1.455    0.146
##    .x3                0.468    0.091    5.124    0.000
##     ind60             0.446    0.087    5.135    0.000

summary(fit_cfa2, fit.measures = TRUE)

## lavaan (0.5-23.1097) converged normally after  26 iterations
## 
##   Number of observations                            75
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               10.006
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.007
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              159.183
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.948
##   Tucker-Lewis Index (TLI)                       0.843
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -704.138
##   Loglikelihood unrestricted model (H1)       -699.135
## 
##   Number of free parameters                          8
##   Akaike (AIC)                                1424.275
##   Bayesian (BIC)                              1442.815
##   Sample-size adjusted Bayesian (BIC)         1417.601
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.231
##   90 Percent Confidence Interval          0.103  0.382
##   P-value RMSEA <= 0.05                          0.014
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.046
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   dem60 =~                                            
##     y1                1.000                           
##     y2                1.404    0.197    7.119    0.000
##     y3                1.089    0.167    6.529    0.000
##     y4                1.370    0.167    8.228    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .y1                2.239    0.512    4.371    0.000
##    .y2                6.412    1.293    4.960    0.000
##    .y3                5.229    0.990    5.281    0.000
##    .y4                2.530    0.765    3.306    0.001
##     dem60             4.548    1.106    4.112    0.000

summary(fit_cfa3, fit.measures = TRUE)

## lavaan (0.5-23.1097) converged normally after  23 iterations
## 
##   Number of observations                            75
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                6.329
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.042
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              171.430
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.974
##   Tucker-Lewis Index (TLI)                       0.921
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -681.851
##   Loglikelihood unrestricted model (H1)       -678.686
## 
##   Number of free parameters                          8
##   Akaike (AIC)                                1379.702
##   Bayesian (BIC)                              1398.242
##   Sample-size adjusted Bayesian (BIC)         1373.028
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.170
##   90 Percent Confidence Interval          0.028  0.327
##   P-value RMSEA <= 0.05                          0.069
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.034
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   dem65 =~                                            
##     y5                1.000                           
##     y6                1.418    0.208    6.806    0.000
##     y7                1.378    0.203    6.789    0.000
##     y8                1.516    0.204    7.431    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .y5                3.099    0.587    5.281    0.000
##    .y6                3.919    0.828    4.732    0.000
##    .y7                3.754    0.790    4.753    0.000
##    .y8                2.035    0.646    3.150    0.002
##     dem65             3.635    1.021    3.562    0.000

summary(fit_cfa_all, fit.measures = TRUE)

## lavaan (0.5-23.1097) converged normally after  47 iterations
## 
##   Number of observations                            75
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               72.462
##   Degrees of freedom                                41
##   P-value (Chi-square)                           0.002
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              730.654
##   Degrees of freedom                                55
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.953
##   Tucker-Lewis Index (TLI)                       0.938
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1564.959
##   Loglikelihood unrestricted model (H1)      -1528.728
## 
##   Number of free parameters                         25
##   Akaike (AIC)                                3179.918
##   Bayesian (BIC)                              3237.855
##   Sample-size adjusted Bayesian (BIC)         3159.062
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.101
##   90 Percent Confidence Interval          0.061  0.139
##   P-value RMSEA <= 0.05                          0.021
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.055
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   ind60 =~                                            
##     x1                1.000                           
##     x2                2.182    0.139   15.714    0.000
##     x3                1.819    0.152   11.956    0.000
##   dem60 =~                                            
##     y1                1.000                           
##     y2                1.354    0.175    7.755    0.000
##     y3                1.044    0.150    6.961    0.000
##     y4                1.300    0.138    9.412    0.000
##   dem65 =~                                            
##     y5                1.000                           
##     y6                1.258    0.164    7.651    0.000
##     y7                1.282    0.158    8.137    0.000
##     y8                1.310    0.154    8.529    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   ind60 ~~                                            
##     dem60             0.660    0.206    3.202    0.001
##     dem65             0.774    0.208    3.715    0.000
##   dem60 ~~                                            
##     dem65             4.487    0.911    4.924    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .x1                0.082    0.020    4.180    0.000
##    .x2                0.118    0.070    1.689    0.091
##    .x3                0.467    0.090    5.174    0.000
##    .y1                1.942    0.395    4.910    0.000
##    .y2                6.490    1.185    5.479    0.000
##    .y3                5.340    0.943    5.662    0.000
##    .y4                2.887    0.610    4.731    0.000
##    .y5                2.390    0.447    5.351    0.000
##    .y6                4.343    0.796    5.456    0.000
##    .y7                3.510    0.668    5.252    0.000
##    .y8                2.940    0.586    5.019    0.000
##     ind60             0.448    0.087    5.169    0.000
##     dem60             4.845    1.088    4.453    0.000
##     dem65             4.345    1.051    4.134    0.000

SEM Model

Specify the model (code)

In this model we include the final measurement model from above, but now we add in the causal paths from our conceptual model.

Additionally, this example includes some correlated residuals we we are also specifing in this mode.

The last section does the formal test of mediation looking at how demo60 mediates the relationship between ind60 and demo65

model <- '
  # measurement model
    ind60 =~ x1 + x2 + x3
    dem60 =~ y1 + y2 + y3 + y4
    dem65 =~ y5 + y6 + y7 + y8
  # regressions
    dem60 ~ a*ind60
    dem65 ~ c*ind60 + b*dem60
  # residual correlations
    y1 ~~ y5
    y2 ~~ y4 + y6
    y3 ~~ y7
    y4 ~~ y8
    y6 ~~ y8
  # indirect/mediating effect
    ab := a*b
  # Total effct
    total := c + (a*b)
'

Fit the model

We fit the model using sem() function from lavaan which works almost exactly like the cfa() function.

We could also get bootstrapped standard errors using the se = "bootstrap" argument in the sem() function.

fit <- sem(model, data = PoliticalDemocracy)

Get output

Finally, we get the output from the model we just fit using the summary(). Here we are also using the argument standardized = TRUE to get the standardized latent variable estimates and the completely standardized solution.

summary(fit, fit.measures = TRUE, standardized=TRUE)

## lavaan (0.5-23.1097) converged normally after  68 iterations
## 
##   Number of observations                            75
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               38.125
##   Degrees of freedom                                35
##   P-value (Chi-square)                           0.329
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              730.654
##   Degrees of freedom                                55
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.995
##   Tucker-Lewis Index (TLI)                       0.993
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1547.791
##   Loglikelihood unrestricted model (H1)      -1528.728
## 
##   Number of free parameters                         31
##   Akaike (AIC)                                3157.582
##   Bayesian (BIC)                              3229.424
##   Sample-size adjusted Bayesian (BIC)         3131.720
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.035
##   90 Percent Confidence Interval          0.000  0.092
##   P-value RMSEA <= 0.05                          0.611
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.044
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   ind60 =~                                                              
##     x1                1.000                               0.670    0.920
##     x2                2.180    0.139   15.742    0.000    1.460    0.973
##     x3                1.819    0.152   11.967    0.000    1.218    0.872
##   dem60 =~                                                              
##     y1                1.000                               2.223    0.850
##     y2                1.257    0.182    6.889    0.000    2.794    0.717
##     y3                1.058    0.151    6.987    0.000    2.351    0.722
##     y4                1.265    0.145    8.722    0.000    2.812    0.846
##   dem65 =~                                                              
##     y5                1.000                               2.103    0.808
##     y6                1.186    0.169    7.024    0.000    2.493    0.746
##     y7                1.280    0.160    8.002    0.000    2.691    0.824
##     y8                1.266    0.158    8.007    0.000    2.662    0.828
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   dem60 ~                                                               
##     ind60      (a)    1.483    0.399    3.715    0.000    0.447    0.447
##   dem65 ~                                                               
##     ind60      (c)    0.572    0.221    2.586    0.010    0.182    0.182
##     dem60      (b)    0.837    0.098    8.514    0.000    0.885    0.885
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .y1 ~~                                                                 
##    .y5                0.624    0.358    1.741    0.082    0.624    0.296
##  .y2 ~~                                                                 
##    .y4                1.313    0.702    1.871    0.061    1.313    0.273
##    .y6                2.153    0.734    2.934    0.003    2.153    0.356
##  .y3 ~~                                                                 
##    .y7                0.795    0.608    1.308    0.191    0.795    0.191
##  .y4 ~~                                                                 
##    .y8                0.348    0.442    0.787    0.431    0.348    0.109
##  .y6 ~~                                                                 
##    .y8                1.356    0.568    2.386    0.017    1.356    0.338
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .x1                0.082    0.019    4.184    0.000    0.082    0.154
##    .x2                0.120    0.070    1.718    0.086    0.120    0.053
##    .x3                0.467    0.090    5.177    0.000    0.467    0.239
##    .y1                1.891    0.444    4.256    0.000    1.891    0.277
##    .y2                7.373    1.374    5.366    0.000    7.373    0.486
##    .y3                5.067    0.952    5.324    0.000    5.067    0.478
##    .y4                3.148    0.739    4.261    0.000    3.148    0.285
##    .y5                2.351    0.480    4.895    0.000    2.351    0.347
##    .y6                4.954    0.914    5.419    0.000    4.954    0.443
##    .y7                3.431    0.713    4.814    0.000    3.431    0.322
##    .y8                3.254    0.695    4.685    0.000    3.254    0.315
##     ind60             0.448    0.087    5.173    0.000    1.000    1.000
##    .dem60             3.956    0.921    4.295    0.000    0.800    0.800
##    .dem65             0.172    0.215    0.803    0.422    0.039    0.039
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     ab                1.242    0.355    3.494    0.000    0.395    0.395
##     total             1.814    0.374    4.856    0.000    0.578    0.578

Plot output

Results from SEM models are typically displayed graphically.

Due to update issues this code doesn’t work right now, but if it did, you would get a display with the beta coefficients along the covariances, loadings, and regression paths.

semPlot::semPaths(fit, what = c("paths", "est"), whatLabels = "est", layout = "tree2")

Fixing loadings and variances

You can also change the scaling of the latent variables in the model specification portion of the code.

Use the * symbol to indicate you want to specify a loading
UseNA to freely estimate a pramater
Or specify the value you want the loading to be fixed to

cfa3_2 <- '
    dem65 =~ NA*y5 + y6 + y7 + y8
    dem65 ~~ 1*dem65
'

fit2 <- cfa(cfa3_2, data = PoliticalDemocracy)

summary(fit2, fit.measures = TRUE, standardized=TRUE)

## lavaan (0.5-23.1097) converged normally after  20 iterations
## 
##   Number of observations                            75
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                6.329
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.042
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              171.430
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.974
##   Tucker-Lewis Index (TLI)                       0.921
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -681.851
##   Loglikelihood unrestricted model (H1)       -678.686
## 
##   Number of free parameters                          8
##   Akaike (AIC)                                1379.702
##   Bayesian (BIC)                              1398.242
##   Sample-size adjusted Bayesian (BIC)         1373.028
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.170
##   90 Percent Confidence Interval          0.028  0.327
##   P-value RMSEA <= 0.05                          0.069
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.034
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   dem65 =~                                                              
##     y5                1.907    0.268    7.124    0.000    1.907    0.735
##     y6                2.703    0.332    8.134    0.000    2.703    0.807
##     y7                2.627    0.324    8.105    0.000    2.627    0.805
##     y8                2.891    0.303    9.530    0.000    2.891    0.897
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     dem65             1.000                               1.000    1.000
##    .y5                3.099    0.587    5.281    0.000    3.099    0.460
##    .y6                3.919    0.828    4.732    0.000    3.919    0.349
##    .y7                3.754    0.790    4.753    0.000    3.754    0.352
##    .y8                2.035    0.646    3.150    0.002    2.035    0.196