Required Packages
knitr::opts_chunk$set(echo = TRUE)
if (!require("pacman")) install.packages("pacman")
library(pacman)
# Load installed packages, install and load the package if not installed
pacman::p_load(here, rio, conflicted, tidyverse,
janitor, summarytools, skimr, DataExplorer, inspectdf, Hmisc,
lavaan, semTools, semoutput, semPlot, sjPlot, psych, readr, dplyr,
GPArotation, psychTools, semptools,tinytex)
In the Beginning there was LISREL, and it was good… Structural equation modeling (SEM) is a family of statistical methods for analyzing the structure of a variance/covariance matrix among a set of variables. SEM has its origins in the work of Karl Joreskog who developed the LISREL model and software during the 1970s. LISREL is the acronym for LInear Structural RELations.
# creating an example w 5 latents w 3 indicators and mediation
# Lambda matrices:
lambdax <- matrix(0,9,3)
lambdax[1:3,1] <-1
lambdax[4:6,2] <-1
lambdax[7:9,3] <-1
lambday <- matrix(0,6,2)
lambday[1:3,1] <- 1
lambday[4:6,2] <- 1
# Phi and Psi matrices:
LatX <- matrix(1,3,3) #phi
LatY <- diag(1,2,2) #psy
# Gamma matrix:
Gamma <- matrix(0,2,3)
Gamma[1,1] <- 1
Gamma[1,2] <- 1
Gamma[1,3] <- 1
Gamma[2,2] <- 1
# Beta matrix:
Beta <- matrix(0,2,2)
Beta[2,1] <- 1
# Theta matrices:
thd <- diag(1,nrow(lambdax))
the <- diag(1,nrow(lambday))
# Combine matrices into a model:
mod3 <- lisrelModel(LX=lambdax, LY=lambday, TD=thd, TE=the,
PH=LatX, GA=Gamma, BE=Beta, PS=LatY)
# Plot path diagram using semPlot::semPaths sizelat=4, sizeman=3
semPaths(mod3, as.expression=c("nodes","edges"), sizeMan = 4,
sizeLat = 5, style="lisrel", rotation = 2,
residScale=8, edge.color="black", mar=c(2,2,2,2),
edge.label.cex = 0.9, label.cex=1.5)
Figure 1. The LISREL model for a hypothetical example.
The LISREL model consists of two parts, a measurement model and a structural model. This is the structural portion of the model showing how the variances of the endogenous latent variables (constructs) are related: \[ \mathbf{\eta = \beta \eta + \Gamma \xi + \zeta} \tag{1} \]
where \(\eta\) (eta) is a m x 1 vector of endogenous latent constructs, \(\xi\) (xi) is a n x 1 vector of exogenous latent constructs, and \(\zeta\) (zeta) is a m x 1 vector of residual variances for the endogenous latent constructs. \(\beta\) (beta) specifies the causal relationships among latent endogenous constructs (\(\eta\)’s); size = m x m. \(\Gamma\) (gamma) specifies the causal influences of latent exogenous constructs (\(\xi\)’s) on the latent endogenous constructs (\(\eta\)’s); size = m x n.
This is the measurement model for the variances of the endogenous observed y variables (indicators): \[ \mathbf{y = \Lambda_y \eta + \Theta_\epsilon}\tag{2} \]
where y is a p x 1 vector of observed endogenous indicators, (p = the number of observed endogenous indicators) and m = the number of latent endogenous constructs. \(\Lambda_y\) (lambda y) specifies which y variables are indicators of which endogenous constructs (\(\eta\)’s); size = p x m. \(\Theta_\epsilon\) (theta epsilon) specifies the residual variances/covariances among the y variables. The diagonal elements are error variances and the off-diagonal elements are covariances between residuals. Size = p x p, usually specify as diagonal.
This is the measurement model for the variances of the exogenous observed x variables: \[ \mathbf{x = \Lambda_x \xi + \Theta_\delta}\tag{3} \] where x is a q x 1 vector of observed exogenous indicators, (q = the number of observed exogenous indicators) and n = the number of latent exogenous constructs. \(\Lambda_x\) (lambda x) specifies which x variables are indicators of which exogenous constructs (\(\xi\)’s); size = q x n. \(\Theta_\delta\) (theta delta) specifies the residual variances/covariances among the x variables. The diagonal elements are error variances and the off-diagonal elements are covariances between residuals. Size = q x q, usually specify as diagonal.
Other Matrices in the LISREL Model: \(\Phi\) (phi) – specifies the variances and covariances among latent exogenous constructs (ξ’s); size = n x n. \(\Psi\) (psi) matrix – specifies the variances and covariances among the latent residual terms (\(\zeta\)’s); size = m x m.
Listing the p observed \(y\) variables first, followed by the q observed \(x\) variables, the covariances of these (p + q = 15) variables in Figure 1 are contained in the \(\mathbf\Sigma\) matrix which may be partitioned as:
\[\mathbf\Sigma = \left[ \begin{array}{c|c} covariances_y & covariances_{y,x} \\ \hline covariances_{x,y} & covariances_x \end{array} \right]\]
In the full LISREL model the partitions of \(\mathbf\Sigma\) are estimated as:
\[\mathbf\Sigma^* = \left[ \begin{array}{c|c} \Lambda_y [(I-\beta)^{-1} (\Gamma\Phi\Gamma' + \Psi)(I-\beta)^{-1'}]\Lambda'_y + \Theta_\epsilon & \Lambda_y(I-\beta)^{-1} \Gamma \Phi \Lambda'_x \\ \hline \Lambda_x \Phi\Gamma'(I-\beta)^{-1'}\Lambda'_y & \Lambda_x\Phi\Lambda'_x + \Theta_\delta \end{array} \right] \tag{4}\]
How well \(\mathbf\Sigma^*\) reproduces \(\mathbf\Sigma\) is the basis for assessing the fit of the model to the data. The differences between observed and model implied covariances form the basis for various fit indices such as \(\chi^2\), GFI, CFI, TLI, RMSEA, and SRSR.
Other SEM programs have been developed since LISREL. These include
EQS, Amos, Mplus, and the lavaan package in R. Most of
these reparameterize the LISREL model treating all variables as
endogenous. This means that the structural model (Eq. 1) simplifies to:
\[
\mathbf{\eta = \beta \eta + \Psi}\tag{5}
\]
and Eq. 2 and Eq. 3 are combined into a single equation linking observed and latent variables: \[ \mathbf{y = \Lambda \eta + \Theta} \tag{6} \]
Let’s look at the parameter matrices in Eq. 5 and Eq.6 in more detail using the model in Figure 1 as an example. The \(\mathbf\Lambda\) matrix contains 15 rows, one for each indicator variable (9 x’s and 6 y’s) and 5 columns, one for each latent variable, \(\eta_1\) to \(\eta_5\). The elements (\(\lambda_{ij}\)) are factor loadings, quantifying the relation between each indicator variable and the latent variable (\(\eta_j\)) hypothesized to underlie it. Estimating some \(\lambda_{ij}\) coefficients while fixing others to 0.0, allows us to specify the measurement model.
\[\mathbf\Lambda = \begin{bmatrix} \lambda_{1,1} & \lambda_{1,2} & \cdots & \lambda_{1,5} \\ \lambda_{2,1} & \lambda_{2,2} & \cdots & \lambda_{2,5} \\ \vdots & \vdots & \ddots & \vdots \\ \lambda_{15,1} & \lambda_{15,2} & \cdots & \lambda_{15,5} \end{bmatrix}\]
The \(\mathbf\beta\) matrix is a 5 x 5 indicating the influence of the column \(\eta\) variable on the row \(\eta\) variable. The diagonal elements are set to 0.0.
\[\mathbf\beta = \begin{bmatrix} 0 & \beta_{1,2} & \cdots & \beta_{1,5} \\ \beta_{2,1} & 0 & \cdots & \beta_{2,5} \\ \vdots & \vdots & \ddots & \vdots \\ \beta_{5,1} & \beta_{5,2} & \cdots & 0 \end{bmatrix}\]
In the reparameterization, the \(\mathbf\psi\) matrix is a 5 x 5 and contains the variances and covariances among exogenous latent variables (originally in \(\mathbf\Phi\)), and the residual variances (\(\zeta\)s) for the endogenous latent variables. All elements are denoted as \(\psi\) coefficients. \[\mathbf\Psi = \begin{bmatrix} \psi_{1,1} & \psi_{1,2} & \cdots & \psi_{1,5} \\ \psi_{2,1} & \psi_{2,2} & \cdots & \psi_{2,5} \\ \vdots & \vdots & \ddots & \vdots \\ \psi_{5,1} & \psi_{5,2} & \cdots & \psi_{5,5} \end{bmatrix}\]
The residual variances of the 15 indicator variables are in the \(\Theta\) matrix which is typically specified as a diagonal matrix: \[\mathbf\Theta = \begin{bmatrix} \theta_{1,1} & 0 & \cdots & 0 \\ 0 & \theta_{2,2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \theta_{15,15} \end{bmatrix}\]
Specifying models in lavaan is pretty straightforward.
There are different operators for specifying various relations among
variables. Regressions use ~ with the outcome variable on
the left and the predictors on the right. Latent variables are defined
using the =~ operator with the latent variable on the left
and the indicator variables on the right. The := operator
is used to define parameters constructed from other parameters;
specifying indirect effects for example. Correlations (covariances)
between variables are indicated with the ~~ operator. In
the examples below I show the how to specify and run various models in
lavaan. One nice feature is that we don’t always need the
raw data file; we can work from the covariance matrix, or correlation
matrix plus a vector of standard deviations and the sample size. I’m
going to work through a few examples to demonstrate bits of SEM theory
and features of lavaan and its support packages. I created
this document in R Markdown.
This is an example of path analysis. Illness (ill) is regressed onto
Stress (str) and Fitness (fit), which are regressed onto Hardiness
(hardy) and Exercise (exer), respectively. Here we are providing a
correlation matrix and vector of standard deviations.
lavaan will convert these into a covariance matrix for
analysis. We must also provide N = 373 when specifying the model
below.
Reading in data:
# reading in corr matrix
corr <- '
1.00
-.03 1.00
.39 .07 1.00
-.05 -.23 -.13 1.00
-.08 -.16 -.29 .34 1.00'
# add variable names and convert to full correlation matrix
corrfull <- lavaan::getCov(corr, names=c("exer","hardy","fit","str","ill"))
# add SDs and convert to full covariance matrix
covfull <- lavaan::cor2cov(corrfull, sds=c(66.50, 38.00, 36.80, 67.00, 62.48))
# observed covariance matrix
#covfull
Specifying model in lavaan. Note we can define the
hypothesized indirect effects. A convenient way to do this is to name
the parameters and then use these names to create products. Here “g1”
and “g2” may be considered as \(\gamma\) coefficients and “b1” and “b2” may
be considered as \(\beta\) coefficients
in the original LISREL model. Indirect effects are defined using the
:= operator.
Specify the model:
# specify the path model
roth <- '
fit ~ g1*exer
str ~ g2*hardy
ill ~ b1*fit + b2*str
# indirect effects
exer_ill := g1*b1
hardy_ill := g2*b2
# allowing resids of two parallel mediators to correlate
fit ~~ str '
Running the model:
# fit the path model
roth.fit <- lavaan::sem(roth, sample.cov=covfull, sample.nobs=373, optim.method=list("BFGS"))
The fit of the model to the data is reasonable according to the following fit indices: CFI = .980, TLI = .955, RMSEA = .046, and SRMR = .038. The \(\chi^2\) = 7.135 with df = 4, and p = .129. The results are summarized in Figure 2 which presents standardized coefficients. On the left we see that the correlation between hardiness and exercise is -.03 (the arrow is dashed to indicate that this value was not estimated but was given in the data. The model explains 5% of the variance in stress (note the residual arrow is 1 - .05) and 15% of the variance in fittness (the residual arrow is 1 - .15). The unexplained variance in each of these variables is allowed to correlate (-.10) meaning that they share the influence of omitted variables. The model explains about 17% of the variance in illness (1 - .83). The indirect effect of hardiness on illness is obtained by multiplying the paths from hardy to str and from str to ill (-.22 x .31 = -.068, p < .05). The indirect effect of exercise on illness is similarly obtained (.39 x -.25 = -.096, p < .05).
Plotting results as path diagram:
#--making path diagram obs only
fig2 <- semPlot::semPaths(roth.fit, whatLabels = "std", layout = "tree2",
rotation = 2, style = "lisrel", optimizeLatRes = TRUE,
structural = FALSE, layoutSplit = FALSE,
intercepts = FALSE, residuals = T,
curve = 3, curvature = 3, nCharNodes = 8,
sizeLat = 5, sizeMan = 8, sizeMan2 = 6,
edge.label.cex = 0.9, label.cex=1.1, residScale=10,
edge.color = "black", edge.label.position = .40, DoNotPlot=T)
#--working with semptools to modify path diagrams
my_position_list <- c("hardy ~~ exer" = .50, "str ~~ fit"=.50)
fig2_1 <- fig2 |> set_edge_label_position(my_position_list) |>
mark_sig(roth.fit, alpha = c("(n.s.)" = 1.00, "*" = .05)) #adding *, ns
plot(fig2_1)
Figure 2. Path analysis using observed variables.
Full output:
summary() function to
write lavaan results to the console. There is a lot of
output to review. The first bit is model fit statistics. Next we see the
regression models involving the observed variables. The column labeled
Estimate gives the unstandardized coefficients, and the column
Std.all gives the completely standardized coefficients. The
covariance/correlation between the residuals of the two regressions is
shown next, followed by estimated variances. Variables prefixed with a
“.” indicate residual terms. Then we see the \(R^2\) values indicating the proportion of
variance explained in each endogenous variable. Finally, under the
heading “Defined Parameters” we see the indirect effects as we specified
in our model statement above.summary(roth.fit, fit.measures=T, standardized=T, rsquare=T)
## lavaan 0.6.15 ended normally after 15 iterations
##
## Estimator ML
## Optimization method BFGS
## Number of model parameters 8
##
## Number of observations 373
##
## Model Test User Model:
##
## Test statistic 7.135
## Degrees of freedom 4
## P-value (Chi-square) 0.129
##
## Model Test Baseline Model:
##
## Test statistic 165.608
## Degrees of freedom 9
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.980
## Tucker-Lewis Index (TLI) 0.955
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -5962.553
## Loglikelihood unrestricted model (H1) -5958.985
##
## Akaike (AIC) 11941.105
## Bayesian (BIC) 11972.478
## Sample-size adjusted Bayesian (SABIC) 11947.096
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.046
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.099
## P-value H_0: RMSEA <= 0.050 0.474
## P-value H_0: RMSEA >= 0.080 0.171
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.038
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## fit ~
## exer (g1) 0.213 0.026 8.107 0.000 0.213 0.385
## str ~
## hardy (g2) -0.390 0.088 -4.411 0.000 -0.390 -0.222
## ill ~
## fit (b1) -0.424 0.080 -5.290 0.000 -0.424 -0.250
## str (b2) 0.287 0.044 6.506 0.000 0.287 0.308
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fit ~~
## .str -228.048 114.706 -1.988 0.047 -228.048 -0.103
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fit 1145.182 83.856 13.657 0.000 1145.182 0.852
## .str 4240.134 310.485 13.657 0.000 4240.134 0.951
## .ill 3203.954 234.610 13.657 0.000 3203.954 0.829
##
## R-Square:
## Estimate
## fit 0.148
## str 0.049
## ill 0.171
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## exer_ill -0.090 0.020 -4.430 0.000 -0.090 -0.096
## hardy_ill -0.112 0.031 -3.651 0.000 -0.112 -0.068
Here we can specify the amount of measurement error variance in our observed variables. This involves 3 steps: first we have to define a latent variable for each of the observed variables. Second, we re-specify the regressions using the latent variables. Finally, we fix theta values equal to 10% of the observed variances in the indicators. This is equivalent to setting reliability to .90 for all observed variables.
Specify the model:
roth2 <- '
exerf =~ exer
hardyf =~ hardy
fitf =~ fit
strf =~ str
illf =~ ill
# factor regs
fitf ~ g1*exerf
strf ~ g2*hardyf
illf ~ b1*fitf + b2*strf
#indirect effects
exer_ill := g1*b1
hardy_ill := g2*b2
# setting measurement error variances to 10% obs variance
exer ~~ 422.225*exer
hardy ~~ 144.400*hardy
fit ~~ 135.424*fit
str ~~ 448.900*str
ill ~~ 390.375*ill
# allowing resids of two parallel mediators to correlate
fitf ~~ strf '
Running the model:
# fit a path model w m.e.
#--had to change optimization method to "BFGS" from default "MLMINB" for single indicators to work
roth2.fit <- lavaan::sem(roth2, sample.cov=covfull, sample.nobs=373, optim.method=list("BFGS"))
The fit of the model to the data is reasonable according to the following fit indices: CFI = .980, TLI = .949, RMSEA = .046, and SRMR = .036. The \(\chi^2\) = 7.163 with df = 4, and p = .128. The results are summarized in Figure 3 which presents standardized coefficients. Note that the structural parameters are a bit larger than their counterparts in Figure 2 because they have been corrected for attenuation by taking the reliability of the observed variables into account. The standard errors for these parameters are also increased due to measurement error (compare full output of the two models). The indirect effect of hardiness on illness is obtained by multiplying the paths from hardyf to strf and from strf to illf (-.25 x .34 = -.085, p < .05). The indirect effect of exercise on illness is similarly obtained (.43 x -.27 = -.116, p < .05).
Plotting results as path diagram with latent and observed variables:
#--making path diagram w m.e.---------------------mar(bot,lft,top,rgt)
fig3 <-semPlot::semPaths(roth2.fit, whatLabels = "std", layout = "tree2",
rotation = 2, style = "lisrel", optimizeLatRes = TRUE,
structural = FALSE, layoutSplit = FALSE,
intercepts = FALSE, residuals = T,
curve = 3, curvature = 3, nCharNodes = 8,
sizeLat = 8, sizeMan = 8, sizeMan2 = 6,
edge.label.cex = 0.9, label.cex=1.1, residScale=10, mar=c(2,3,2,3),
edge.color = "black", edge.label.position = .40, DoNotPlot=T)
#--working with semptools to modify path diagrams
#--my specifications--
my_rotate_resid_list <- c(strf = -120,
fitf = -45,
illf = 45)
my_curve_list <- c("strf ~~ fitf" = 2)
#--making multiple adjustments w piping |>
fig3_1 <- fig3 |> set_curve(my_curve_list) |>
rotate_resid(my_rotate_resid_list) |>
mark_sig(roth2.fit, alpha = c("(n.s.)" = 1.00, "*" = .05))
plot(fig3_1)
Figure 3. Path analysis using single indicator variables with specified measurement error variance
Full output:
=~. The regression
equations and indirect effects are now specified using the latent
variables. The \(R^2\) values for the
observed variables are approximately .90 reflecting fact that we fixed
their measurement error variances to 10% of their original
variances.summary(roth2.fit, fit.measures=T, standardized=T, rsquare=T)
## lavaan 0.6.15 ended normally after 848 iterations
##
## Estimator ML
## Optimization method BFGS
## Number of model parameters 11
##
## Number of observations 373
##
## Model Test User Model:
##
## Test statistic 7.163
## Degrees of freedom 4
## P-value (Chi-square) 0.128
##
## Model Test Baseline Model:
##
## Test statistic 165.944
## Degrees of freedom 10
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.980
## Tucker-Lewis Index (TLI) 0.949
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -9942.302
## Loglikelihood unrestricted model (H1) -9938.720
##
## Akaike (AIC) 19906.603
## Bayesian (BIC) 19949.740
## Sample-size adjusted Bayesian (SABIC) 19914.841
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.046
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.100
## P-value H_0: RMSEA <= 0.050 0.472
## P-value H_0: RMSEA >= 0.080 0.172
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.036
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## exerf =~
## exer 1.000 63.152 0.951
## hardyf =~
## hardy 1.000 35.996 0.949
## fitf =~
## fit 1.000 34.780 0.948
## strf =~
## str 1.000 63.149 0.948
## illf =~
## ill 1.000 58.922 0.948
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## fitf ~
## exerf (g1) 0.234 0.029 8.056 0.000 0.426 0.426
## strf ~
## hardyf (g2) -0.438 0.098 -4.473 0.000 -0.250 -0.250
## illf ~
## fitf (b1) -0.460 0.089 -5.151 0.000 -0.272 -0.272
## strf (b2) 0.319 0.049 6.464 0.000 0.342 0.342
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fitf ~~
## .strf -227.060 114.409 -1.985 0.047 -0.118 -0.118
## exerf ~~
## hardyf -61.709 130.363 -0.473 0.636 -0.027 -0.027
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .exer 422.225 422.225 0.096
## .hardy 144.400 144.400 0.100
## .fit 135.424 135.424 0.101
## .str 448.900 448.900 0.101
## .ill 390.375 390.375 0.101
## exerf 3988.169 322.952 12.349 0.000 1.000 1.000
## hardyf 1295.729 105.454 12.287 0.000 1.000 1.000
## .fitf 990.594 84.130 11.774 0.000 0.819 0.819
## .strf 3739.301 308.673 12.114 0.000 0.938 0.938
## .illf 2745.361 235.005 11.682 0.000 0.791 0.791
##
## R-Square:
## Estimate
## exer 0.904
## hardy 0.900
## fit 0.899
## str 0.899
## ill 0.899
## fitf 0.181
## strf 0.062
## illf 0.209
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## exer_ill -0.108 0.025 -4.364 0.000 -0.116 -0.116
## hardy_ill -0.140 0.038 -3.694 0.000 -0.085 -0.085
Residuals Analysis:
lavResiduals(roth2.fit)
## $type
## [1] "cor.bentler"
##
## $cov
## exer hardy fit str ill
## exer 0.000
## hardy -0.006 0.000
## fit 0.007 0.080 0.004
## str -0.056 -0.007 -0.040 0.009
## ill 0.022 -0.086 -0.016 0.011 0.008
##
## $cov.z
## exer hardy fit str ill
## exer 0.000
## hardy -2.090 0.000
## fit 1.371 1.713 1.078
## str -1.119 -1.201 -1.844 3.852
## ill 0.485 -1.828 -2.038 1.801 2.011
##
## $summary
## cov
## srmr 0.036
## srmr.se 0.010
## srmr.exactfit.z 1.343
## srmr.exactfit.pvalue 0.090
## usrmr 0.029
## usrmr.se 0.017
## usrmr.ci.lower 0.000
## usrmr.ci.upper 0.057
## usrmr.closefit.h0.value 0.050
## usrmr.closefit.z -1.230
## usrmr.closefit.pvalue 0.891
The SF-36 questionnaire measures self-reported functional status. Its 36 items are combined to form 8 subscales yielding two summary measures: physical and mental health. The physical health measure includes four subscales: physical functioning (10 items), role-physical (4 items), bodily pain (2 items), and general health (5 items). The mental health measure includes the other four subscales: vitality (4 items), social functioning (2 items), role-emotional (3 items), and mental health (5 items).
lavaan
package in which each subscale loads on only one hypothesized factor.
I’m working with a raw data file this time so I can display the
covariance/correlation matrix graphically. Figure 4 was produced Using
the psych package. It combines correlation values,
scatter-plots, and histograms.In this example I am using the rio package to import an
SPSS data file. One inconvenience when importing SPSS data files is that
the variable labels are recognized by some packages but not others. In
this example the variable labels actually mess up the tables and path
diagrams so I deleted them from the data file.
## Import Data
hrtsrg <- rio::import(here("data", "WOMEN_HRTSRG_noLabels.SAV"))
attach(hrtsrg)
data <- data.frame(MENHLTH, SOCFUNC, ROLEEMO, VITAL, PHSFUNC, ROLPHYS, BODPAIN, GENHLTH)
Using the psych package to get this rich figure.
psych::pairs.panels(data, lm=T, cex.cor=.9, jiggle=T, factor=3,
show.points = T, ellipses=T, pch=".")
Figure 4. Bivariate summary of SF-36 subscales. Histograms are on the diagonal, scatter-plots showing bivariate regression line and centroid are below the diagnonal, correlations are above the diagonal.
# specify the 2-factor CFA model
model <- '
# latent factors
MH =~ MENHLTH + SOCFUNC + ROLEEMO + VITAL
PH =~ PHSFUNC + ROLPHYS + BODPAIN + GENHLTH'
# fit the model
fit <- lavaan::cfa(model = model, data = data, mimic = "lavaan",
estimator = "ML", missing = "listwise", check.post=T,
std.lv = TRUE, std.ov = FALSE, test = "standard",
se = "standard", bootstrap = 1000)
The fit of the hypothesized model to the data is not great according to the following fit indices: CFI = .871, TLI = .810, RMSEA = .150, and SRMR = .070. The \(\chi^2\) = 125.111 with df = 19, and p < .001. Despite the questionable fit, we can comment on the model parameter estimates. The results are summarized in Figure 5 which presents standardized coefficients. First, all the factor loadings (\(\lambda\)’s) are significant. Second, the \(\theta\) values show the proportion of measurement error variance in each subscale when considered as an indicator of the underlying latent factor (PH or MH). Note that because all parameters are standardized \(\lambda^2_{ij}\) + \(\theta_{ii}\) = 1.0 for any given subscale. In general, the \(\lambda^2_{ij}\) values may be interpreted as reliability, or the proportion of variance that an indicator variable shares with the underlying latent factor it is hypothesized to measure. Finally, notice the large correlation between the factors \(\phi\) = .85, suggesting that the PH and MH constructs may not be distinct from one another. Modification indices below offer some insight into why the hypothesized CFA model fit is not great.
Plotting path diagram:
factors <- fit@pta$vnames$lv[[1]]
size <- .75
fig5 <- semPaths(fit, latents = factors, whatLabels = "std", layout = "tree2",
rotation = 2, style = "lisrel", optimizeLatRes = TRUE,
structural = FALSE, layoutSplit = FALSE,
intercepts = FALSE, residuals = T,
curve = 1, curvature = 3, nCharNodes = 8,
sizeLat = 11 * size, sizeMan = 11 * size, sizeMan2 = 4 * size,
edge.label.cex = 1.2 * size, residScale=12 *size,
edge.color = "black", edge.label.position = .40, DoNotPlot=T)
fig5_1 <- fig5 |> mark_sig(fit, alpha = c("(n.s.)" = 1.00, "*" = .05)) #adding *, ns
plot(fig5_1)
Figure 5. CFA showing eight subscales of the SF-36 loading onto the mental health factor (MH) and the physical health factor (PH)
Full output:
summary(fit, fit.measures=T, standardized=T, rsquare=T)
## lavaan 0.6.15 ended normally after 22 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 17
##
## Number of observations 249
##
## Model Test User Model:
##
## Test statistic 125.511
## Degrees of freedom 19
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 853.850
## Degrees of freedom 28
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.871
## Tucker-Lewis Index (TLI) 0.810
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -4928.803
## Loglikelihood unrestricted model (H1) -4866.048
##
## Akaike (AIC) 9891.606
## Bayesian (BIC) 9951.403
## Sample-size adjusted Bayesian (SABIC) 9897.512
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.150
## 90 Percent confidence interval - lower 0.126
## 90 Percent confidence interval - upper 0.176
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.070
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## MH =~
## MENHLTH 3.345 0.253 13.224 0.000 3.345 0.756
## SOCFUNC 1.730 0.121 14.252 0.000 1.730 0.798
## ROLEEMO 2.438 0.201 12.153 0.000 2.438 0.711
## VITAL 2.436 0.193 12.596 0.000 2.436 0.730
## PH =~
## PHSFUNC 3.160 0.307 10.308 0.000 3.160 0.643
## ROLPHYS 2.818 0.234 12.046 0.000 2.818 0.728
## BODPAIN 1.470 0.150 9.822 0.000 1.470 0.618
## GENHLTH 2.442 0.254 9.617 0.000 2.442 0.607
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## MH ~~
## PH 0.846 0.038 22.119 0.000 0.846 0.846
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .MENHLTH 8.387 0.957 8.762 0.000 8.387 0.428
## .SOCFUNC 1.713 0.214 8.013 0.000 1.713 0.364
## .ROLEEMO 5.826 0.624 9.339 0.000 5.826 0.495
## .VITAL 5.210 0.571 9.121 0.000 5.210 0.468
## .PHSFUNC 14.202 1.515 9.375 0.000 14.202 0.587
## .ROLPHYS 7.064 0.855 8.258 0.000 7.064 0.471
## .BODPAIN 3.500 0.365 9.599 0.000 3.500 0.618
## .GENHLTH 10.210 1.054 9.686 0.000 10.210 0.631
## MH 1.000 1.000 1.000
## PH 1.000 1.000 1.000
##
## R-Square:
## Estimate
## MENHLTH 0.572
## SOCFUNC 0.636
## ROLEEMO 0.505
## VITAL 0.532
## PHSFUNC 0.413
## ROLPHYS 0.529
## BODPAIN 0.382
## GENHLTH 0.369
The modification indices indicate where additional parameters could be added to improve the model fit. The values are estimates of how much the \(\chi^2\) statistic would decrease if the parameter were to be estimated rather than fixed to 0.0. For example, allowing GENHLTH to cross-load onto the MH factor would result in the largest improvement in model fit (32.527). Allowing the residual variances between MENHLTH and PHSFUNC to correlate would also result in improvement in fit (26.178). The assumptions of CFA theory are that indicators load on only one factor and that measurement error variances are uncorrelated, so adding these parameters would violate these assumptions. We are left to conclude that the nice clean two-factor model is not supported; the correlational structure of the SF-36 subscales is more complex than hypothesized.
#summary(fit, fit.measures=T, standardized=T, rsquare=T)
modificationIndices(fit, sort. = TRUE, minimum.value = 3)
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 23 MH =~ GENHLTH 32.527 4.270 4.270 1.062 1.062
## 31 MENHLTH ~~ PHSFUNC 26.178 -4.306 -4.306 -0.395 -0.395
## 24 PH =~ MENHLTH 25.898 -3.608 -3.608 -0.816 -0.816
## 20 MH =~ PHSFUNC 21.195 -4.291 -4.291 -0.873 -0.873
## 29 MENHLTH ~~ ROLEEMO 20.032 2.690 2.690 0.385 0.385
## 41 ROLEEMO ~~ VITAL 17.170 -1.890 -1.890 -0.343 -0.343
## 54 ROLPHYS ~~ GENHLTH 16.933 -3.053 -3.053 -0.360 -0.360
## 50 PHSFUNC ~~ ROLPHYS 14.352 3.468 3.468 0.346 0.346
## 32 MENHLTH ~~ ROLPHYS 13.047 -2.280 -2.280 -0.296 -0.296
## 34 MENHLTH ~~ GENHLTH 12.376 2.468 2.468 0.267 0.267
## 43 ROLEEMO ~~ ROLPHYS 11.784 1.741 1.741 0.271 0.271
## 49 VITAL ~~ GENHLTH 9.706 1.689 1.689 0.232 0.232
## 27 PH =~ VITAL 8.440 1.558 1.558 0.467 0.467
## 51 PHSFUNC ~~ BODPAIN 4.492 1.180 1.180 0.167 0.167
## 46 VITAL ~~ PHSFUNC 3.711 1.252 1.252 0.146 0.146
This example uses the lavaan package to conduct
structural equation modeling and some support packages
(semPlot, semoutput, semTools,
& sjPlot) for displaying results. The model below deals
with mediation among latent variables. This example
uses data from Kelloway’s book (1998, p.122). The model predicts
perceived workplace risk and willingness to participate in
occupational safety programs. The two predictor variables are
organizational climate regarding safety and accident
history. Perceived risk is hypothesized as a mediating variable
between climate and participation, and between accident history and
participation. The results do not support the mediation
hypothesis.
Variables in the model:
Here is my code for specifying the model and running it in lavaan. This example code also shows how to specify parameter names {a1,a2,b,c1,c2} and use them to define the indirect and total effects. The indirect and total effects are printed at the bottom of the Summary Output under Path Regressions. To see the indirect and total effects as displayed in the console, click on the Full Output tab and scroll to the bottom section: ‘Defined Parameters’.
Reading in data:
# Reading correlation matrix
corr <- '
1.00
.76 1.00
.78 .68 1.00
-.05 .01 -.05 1.00
.10 .09 .08 .52 1.00
.01 .08 .14 -.28 -.22 1.00
.20 .18 .29 -.44 -.29 .54 1.00
.22 .25 .29 -.42 -.31 .57 .65 1.00
.002 .002 -.05 .42 .22 -.13 -.26 -.20 1.00
.12 .03 .11 .26 .30 -.12 -.20 -.17 .47 1.00 '
# add variable names and convert to full correlation matrix
corrfull <- getCov(corr, names=c("p1","p2","p3","risk","riskc","plant","super","cow","dir","vic"))
# add SDs and convert to full covariance matrix
covfull <- lavaan::cor2cov(corrfull, sds=c(2.7, 3.1, 2.9, 7.3, 1.5, 1.4, 1.2, 1.2, 1.7, 0.8))
Using sjPlot to create this nice table.
# Uses sjPlot to print a nice looking correlation table
sjPlot::tab_corr(corrfull, na.deletion = "listwise", digits = 2, triangle = "lower",
title = "Correlations among observed variables (N = 115)",
string.diag=c('1','1','1','1','1','1','1','1','1','1'))
| p1 | p2 | p3 | risk | riskc | plant | super | cow | dir | vic | |
|---|---|---|---|---|---|---|---|---|---|---|
| p1 | 1 | |||||||||
| p2 | 0.76 | 1 | ||||||||
| p3 | 0.78 | 0.68 | 1 | |||||||
| risk | -0.05 | 0.01 | -0.05 | 1 | ||||||
| riskc | 0.10 | 0.09 | 0.08 | 0.52 | 1 | |||||
| plant | 0.01 | 0.08 | 0.14 | -0.28 | -0.22 | 1 | ||||
| super | 0.20 | 0.18 | 0.29 | -0.44 | -0.29 | 0.54 | 1 | |||
| cow | 0.22 | 0.25 | 0.29 | -0.42 | -0.31 | 0.57 | 0.65 | 1 | ||
| dir | 0.00 | 0.00 | -0.05 | 0.42 | 0.22 | -0.13 | -0.26 | -0.20 | 1 | |
| vic | 0.12 | 0.03 | 0.11 | 0.26 | 0.30 | -0.12 | -0.20 | -0.17 | 0.47 | 1 |
| Computed correlation used pearson-method with listwise-deletion. | ||||||||||
Specify & run the model:
# specify latent path model w direct & indirect effects (partially mediated model)
kell_p1 <- '
# defining latent variables
CLIMATE =~ plant + super + cow
ACCHIST =~ dir + vic
PERCRISK =~ risk + riskc
WILLPART =~ p1 + p2 + p3
# defining structual relations and assigning parameter names {a1,a2,b,c1,c2}
PERCRISK ~ a1*CLIMATE + a2*ACCHIST
WILLPART ~ b*PERCRISK + c1*CLIMATE + c2*ACCHIST
# defining indirect effects
CLIMATE_a1b := a1*b
ACCHIST_a2b := a2*b
# defining total effects
tot1_CLIMATE := c1 + (a1*b)
tot2_ACCHIST := c2 + (a2*b) '
# fit latent path model w direct & indirect effects
kell_p1.fit <- lavaan::sem(kell_p1, sample.cov=covfull, std.lv=F, sample.nobs=115)
Multiple parts of lavaan output are created as tabs
using semoutput package.
The “Regression Paths” section lists the indirect and total effects as well as their significance tests. Note that both the indirect effects of CLIMATE and ACCHIST on WILLPART are not significant and that only the total effect of CLIMATE on WILLPART is significant.
sem_tables(kell_p1.fit)
| Sample.Size | Chi.Square | df | p.value |
|---|---|---|---|
| 115 | 31.159 | 29 | 0.358 |
| CFI | RMSEA | RMSEA.Lower | RMSEA.Upper | SRMR | AIC | BIC |
|---|---|---|---|---|---|---|
| 0.995 | 0.025 | 0 | 0.077 | 0.049 | 4364.955 | 4436.323 |
| Latent Factor | Indicator | Loadings | sig | p | Lower.CI | Upper.CI | SE | z |
|---|---|---|---|---|---|---|---|---|
| CLIMATE | plant | 0.657 | *** | 0 | 0.532 | 0.782 | 0.064 | 10.326 |
| CLIMATE | super | 0.802 | *** | 0 | 0.704 | 0.901 | 0.050 | 16.023 |
| CLIMATE | cow | 0.832 | *** | 0 | 0.738 | 0.926 | 0.048 | 17.351 |
| ACCHIST | dir | 0.777 | *** | 0 | 0.571 | 0.983 | 0.105 | 7.400 |
| ACCHIST | vic | 0.605 | *** | 0 | 0.415 | 0.795 | 0.097 | 6.257 |
| PERCRISK | risk | 0.844 | *** | 0 | 0.694 | 0.993 | 0.076 | 11.083 |
| PERCRISK | riskc | 0.616 | *** | 0 | 0.464 | 0.769 | 0.078 | 7.926 |
| WILLPART | p1 | 0.921 | *** | 0 | 0.865 | 0.977 | 0.029 | 32.216 |
| WILLPART | p2 | 0.819 | *** | 0 | 0.745 | 0.894 | 0.038 | 21.595 |
| WILLPART | p3 | 0.846 | *** | 0 | 0.777 | 0.914 | 0.035 | 24.068 |
| Predictor | DV | Path Values | SE | z | sig | p | Lower.CI | Upper.CI |
|---|---|---|---|---|---|---|---|---|
| CLIMATE | PERCRISK | -0.448 | 0.108 | -4.158 | *** | 0.000 | -0.659 | -0.237 |
| ACCHIST | PERCRISK | 0.440 | 0.119 | 3.682 | *** | 0.000 | 0.206 | 0.674 |
| PERCRISK | WILLPART | 0.226 | 0.198 | 1.142 | 0.253 | -0.162 | 0.614 | |
| CLIMATE | WILLPART | 0.444 | 0.142 | 3.129 | ** | 0.002 | 0.166 | 0.722 |
| ACCHIST | WILLPART | 0.058 | 0.165 | 0.352 | 0.725 | -0.265 | 0.381 | |
| a1*b | CLIMATE_a1b | -0.101 | 0.096 | -1.055 | 0.292 | -0.289 | 0.087 | |
| a2*b | ACCHIST_a2b | 0.099 | 0.092 | 1.085 | 0.278 | -0.080 | 0.279 | |
| c1+(a1*b) | tot1_CLIMATE | 0.343 | 0.110 | 3.108 | ** | 0.002 | 0.127 | 0.559 |
| c2+(a2*b) | tot2_ACCHIST | 0.157 | 0.126 | 1.248 | 0.212 | -0.090 | 0.404 |
| Factor 1 | Factor 2 | r | sig | p | Lower.CI | Upper.CI | SE |
|---|---|---|---|---|---|---|---|
| CLIMATE | ACCHIST | -0.346 | ** | 0.003 | -0.574 | -0.119 | 0.116 |
| Factor 1 | Factor 2 | var | var.std | sig | p |
|---|---|---|---|---|---|
| CLIMATE | CLIMATE | 0.838 | 1.000 | *** | 0.000 |
| ACCHIST | ACCHIST | 1.729 | 1.000 | ** | 0.002 |
| PERCRISK | PERCRISK | 17.657 | 0.470 |
|
0.011 |
| WILLPART | WILLPART | 5.343 | 0.871 | *** | 0.000 |
| Variable | R-Squared |
|---|---|
| PERCRISK | 0.530229 |
| WILLPART | 0.128877 |
The fit of the model to the data is reasonable according to the following fit indices: CFI = .995, TLI = .992, RMSEA = .025, and SRMR = .049. The \(\chi^2\) = 31.159 with df = 29, and p = .358. The results are summarized in Figure 6 which presents standardized coefficients.
The model explains 53% of the variance in PERCRISK, and 13% of the variance in WILLPART. ACCHIST and CLIMATE are negatively correlated and they have significant and similar effects on PERCRISK. CLIMATE has a significant effect on WILLPART. The effect of ACCHIST on WILLPART is not significant. There is no support for PERCRISK as a mediator as its effect on WILLPART is not significant.
#--making path diagram w m.e.---------------------mar(bot,lft,top,rgt)
fig6 <-semPlot::semPaths(kell_p1.fit, whatLabels = "std", layout = "tree2",
rotation = 2, style = "lisrel", optimizeLatRes = TRUE,
structural = FALSE, layoutSplit = FALSE,
intercepts = FALSE, residuals = T,
curve = 3, curvature = 3, nCharNodes = 8,
sizeLat = 9, sizeMan = 8, sizeMan2 = 6,
edge.label.cex = 0.9, label.cex=1, residScale=10, mar=c(2,3,2,3),
edge.color = "black", edge.label.position = .40, DoNotPlot=T)
#--working with semptools to modify path diagrams
#--making adjustments w piping |>
fig6_1 <- fig6 |> mark_sig(kell_p1.fit, alpha = c("(n.s.)" = 1.00, "*" = .05))
plot(fig6_1)
Figure 6. Path analysis with multiple-indicator latent variables.
summary(kell_p1.fit, fit.measures = TRUE, standardized = TRUE, ci=F, rsquare=T)
## lavaan 0.6.15 ended normally after 87 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 26
##
## Number of observations 115
##
## Model Test User Model:
##
## Test statistic 31.159
## Degrees of freedom 29
## P-value (Chi-square) 0.358
##
## Model Test Baseline Model:
##
## Test statistic 487.719
## Degrees of freedom 45
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.995
## Tucker-Lewis Index (TLI) 0.992
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -2156.477
## Loglikelihood unrestricted model (H1) -2140.898
##
## Akaike (AIC) 4364.955
## Bayesian (BIC) 4436.323
## Sample-size adjusted Bayesian (SABIC) 4354.142
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.025
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.077
## P-value H_0: RMSEA <= 0.050 0.726
## P-value H_0: RMSEA >= 0.080 0.039
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.049
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CLIMATE =~
## plant 1.000 0.916 0.657
## super 1.047 0.155 6.767 0.000 0.959 0.802
## cow 1.086 0.159 6.839 0.000 0.994 0.832
## ACCHIST =~
## dir 1.000 1.315 0.777
## vic 0.366 0.100 3.655 0.000 0.482 0.605
## PERCRISK =~
## risk 1.000 6.131 0.844
## riskc 0.150 0.030 4.972 0.000 0.921 0.616
## WILLPART =~
## p1 1.000 2.477 0.921
## p2 1.021 0.091 11.174 0.000 2.529 0.819
## p3 0.986 0.084 11.672 0.000 2.442 0.846
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## PERCRISK ~
## CLIMATE (a1) -2.999 0.827 -3.628 0.000 -0.448 -0.448
## ACCHIST (a2) 2.050 0.712 2.878 0.004 0.440 0.440
## WILLPART ~
## PERCRISK (b) 0.091 0.083 1.107 0.268 0.226 0.226
## CLIMATE (c1) 1.201 0.426 2.818 0.005 0.444 0.444
## ACCHIST (c2) 0.109 0.312 0.350 0.726 0.058 0.058
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CLIMATE ~~
## ACCHIST -0.417 0.165 -2.530 0.011 -0.346 -0.346
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .plant 1.105 0.171 6.457 0.000 1.105 0.569
## .super 0.508 0.109 4.666 0.000 0.508 0.356
## .cow 0.439 0.108 4.066 0.000 0.439 0.308
## .dir 1.136 0.462 2.457 0.014 1.136 0.396
## .vic 0.402 0.079 5.082 0.000 0.402 0.634
## .risk 15.240 6.656 2.290 0.022 15.240 0.288
## .riskc 1.383 0.232 5.965 0.000 1.383 0.620
## .p1 1.093 0.361 3.026 0.002 1.093 0.151
## .p2 3.131 0.541 5.789 0.000 3.131 0.329
## .p3 2.375 0.451 5.269 0.000 2.375 0.285
## CLIMATE 0.838 0.229 3.659 0.000 1.000 1.000
## ACCHIST 1.729 0.558 3.097 0.002 1.000 1.000
## .PERCRISK 17.657 6.964 2.535 0.011 0.470 0.470
## .WILLPART 5.343 0.926 5.767 0.000 0.871 0.871
##
## R-Square:
## Estimate
## plant 0.431
## super 0.644
## cow 0.692
## dir 0.604
## vic 0.366
## risk 0.712
## riskc 0.380
## p1 0.849
## p2 0.671
## p3 0.715
## PERCRISK 0.530
## WILLPART 0.129
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CLIMATE_a1b -0.274 0.263 -1.042 0.298 -0.101 -0.101
## ACCHIST_a2b 0.187 0.177 1.058 0.290 0.099 0.099
## tot1_CLIMATE 0.927 0.331 2.800 0.005 0.343 0.343
## tot2_ACCHIST 0.296 0.247 1.200 0.230 0.157 0.157
The values below are the residual correlations (i.e., observed correlation - model estimated correlation). They are summarized by the SRMR which is .049.
sem_residuals(kell_p1.fit)
| plant | super | cow | dir | vic | risk | riskc | p1 | p2 | p3 | |
|---|---|---|---|---|---|---|---|---|---|---|
| plant | ||||||||||
| super | 0.01 | |||||||||
| cow | 0.02 | -0.02 | ||||||||
| dir | 0.05 | -0.04 | 0.02 | |||||||
| vic | 0.02 | -0.03 | 0.00 | 0.00 | ||||||
| risk | 0.05 | -0.03 | 0.00 | 0.03 | -0.04 | |||||
| riskc | 0.02 | 0.01 | 0.00 | -0.06 | 0.08 | 0.00 | ||||
| p1 | -0.16 | -0.01 | 0.00 | -0.03 | 0.10 | -0.05 | 0.10 | |||
| p2 | -0.08 | -0.01 | 0.05 | -0.02 | 0.01 | 0.01 | 0.09 | 0.01 | ||
| p3 | -0.02 | 0.09 | 0.09 | -0.08 | 0.09 | -0.05 | 0.08 | 0.00 | -0.01 |
Displays residual correlations and z-scores indicating the magnitude of the residuals. Typically, z values > |2.54| are interpreted as significant and indicate where the model is doing a poor job of reproducing the observed correlations.
lavResiduals(kell_p1.fit)
## $type
## [1] "cor.bentler"
##
## $cov
## plant super cow dir vic risk riskc p1 p2 p3
## plant 0.000
## super 0.013 0.000
## cow 0.024 -0.018 0.000
## dir 0.047 -0.044 0.024 0.000
## vic 0.018 -0.032 0.004 0.000 0.000
## risk 0.053 -0.034 0.001 0.030 -0.044 0.000
## riskc 0.023 0.007 -0.002 -0.065 0.078 0.000 0.000
## p1 -0.164 -0.013 -0.001 -0.026 0.098 -0.045 0.103 0.000
## p2 -0.075 -0.010 0.053 -0.023 0.011 0.014 0.093 0.005 0.000
## p3 -0.020 0.094 0.087 -0.075 0.090 -0.046 0.083 0.001 -0.013 0.000
##
## $cov.z
## plant super cow dir vic risk riskc p1 p2 p3
## plant 0.000
## super 0.542 0.000
## cow 1.156 -2.262 0.000
## dir 0.749 -0.926 0.566 0.000
## vic 0.244 -0.516 0.073 0.000 0.000
## risk 1.025 -0.947 0.038 1.668 -1.942 0.000
## riskc 0.362 0.137 -0.045 -1.362 1.409 0.000 0.000
## p1 -2.584 -0.276 -0.022 -0.669 1.494 -1.404 1.507 0.000
## p2 -1.083 -0.161 0.974 -0.414 0.150 0.273 1.254 1.894 0.000
## p3 -0.300 1.670 1.644 -1.454 1.264 -0.953 1.136 0.470 -2.188 0.000
##
## $summary
## cov
## srmr 0.049
## srmr.se 0.008
## srmr.exactfit.z 0.509
## srmr.exactfit.pvalue 0.305
## usrmr 0.019
## usrmr.se 0.016
## usrmr.ci.lower -0.007
## usrmr.ci.upper 0.045
## usrmr.closefit.h0.value 0.050
## usrmr.closefit.z -1.933
## usrmr.closefit.pvalue 0.973
modificationIndices(kell_p1.fit, sort. = TRUE, minimum.value = 3)
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 71 plant ~~ p1 5.186 -0.353 -0.353 -0.321 -0.321
## 90 dir ~~ risk 5.066 2.518 2.518 0.605 0.605
## 95 vic ~~ risk 4.563 -1.018 -1.018 -0.411 -0.411
## 74 super ~~ cow 4.558 -0.353 -0.353 -0.746 -0.746
## 109 p2 ~~ p3 4.417 -2.488 -2.488 -0.912 -0.912
## 91 dir ~~ riskc 4.029 -0.383 -0.383 -0.306 -0.306
## 107 p1 ~~ p2 3.851 2.689 2.689 1.453 1.453
## 96 vic ~~ riskc 3.561 0.162 0.162 0.217 0.217
## 58 WILLPART =~ plant 3.549 -0.090 -0.224 -0.160 -0.160
## 101 risk ~~ p1 3.503 -1.485 -1.485 -0.364 -0.364
## 41 CLIMATE =~ p3 3.422 0.388 0.355 0.123 0.123
The semTools package has a function
compRelSEM that reads the lavaan output and
returns the reliability estimates for each latent variable. Setting the
options ‘tau.eq=T’ and ‘obs.var=T’ produces equivalent to Cronbach’s
reliability coefficient alpha. Setting ‘tau.eq=F’ produces the omega
coefficient of reliability.
relests<-semTools::compRelSEM(kell_p1.fit, tau.eq=T, obs.var=T, return.total=T)
So, these examples are some of the cool analyses you can do in
lavaan. Other examples include multiple-group models and
latent growth-curve models. There are various R packages for converting
lavaan output into tables and path diagrams in Rmarkdown:
semTools, semoutput, semPlot,
sjPlot, psych, GPArotation,
psychTools, and semptools. Some of these
packages produce html documents, others produce Word and PDF documents.
When rendering Rmarkdown files to Word documents, there does not seem to
be a easy way to include equation numbers. Also, adding partition lines
in a matrix seems to work for html documents but not for Word
documents.