Behind the Certain: How Structural Equation Modelling actually works

Philip D Parker
07 Nov 2013

Outline

Quick history of SEM.
What SEM does.
How to do it by hand.
Advanced topics.

Many thanks to Jeremy Miles whose work is the basis of this presentation.

Quick History of SEM

Factor analysis
Simultaneous equations
LISREL
Exension to Categorical, multilevel, and Baysian contexts

Why use R:

Free!
Hundreds of packages
Used by statistitions
Programing language with easy links to other languages (C++, python, etc.)
Extensive SEM options in R: OpenMX, SEM, Lavaan
R allows for a complete work flow (latex/sweave, markdown, html, html5 slides)

What SEM actually does

The observed data:

          Var1  Var2  Var3  Var4  Var5  Var6
Person1  -1.91  0.42 -0.82 -0.32 -1.13 -0.96
Person2   0.09  0.21  0.26 -0.08 -0.23  0.09
Person3   1.19  0.49 -0.69  0.38  0.58  0.17
Person4   0.73  1.42  1.66  0.06  2.04  1.11
Person5  -0.49 -0.47 -0.73 -0.72  0.16 -1.54
Person6  -0.01 -0.09  0.20 -0.53  0.78  0.67
Person7   0.02 -0.57 -0.51  1.50  0.24  0.24
Person8   0.66  0.31  1.17  0.05  0.24  1.62
Person9   0.89  0.04  0.46 -0.73  0.23  0.31
Person10  1.18  1.31  2.05  0.82 -0.16  0.60

What SEM actually does

The Covariance Matrix:

      Var1  Var2  Var3  Var4  Var5  Var6
Var1 0.804 0.399 0.500 0.367 0.451 0.510
Var2 0.399 0.833 0.433 0.283 0.372 0.377
Var3 0.500 0.433 0.805 0.339 0.551 0.543
Var4 0.367 0.283 0.339 0.733 0.332 0.341
Var5 0.451 0.372 0.551 0.332 0.780 0.556
Var6 0.510 0.377 0.543 0.341 0.556 0.825

What SEM actually does

We also have a model we want to fit to our data:
SEM tests how closely this model produces an expected covariance matrix that is as close as possible to the observed covariances.

What SEM actually does

The expected covariance matrix

The expected covariance matrix formula is:
\[ \Sigma = \Lambda \Phi {\Lambda }' + \Theta \]
But this model is NOT identified. As the latent variables are unobserved (more unknowns than knowns) there are an infinite set of solutions.
Typically one of the loadings is fixed to 1 but in this instance I fixed the variance so I could make things a little simplier:
\[ \Sigma = \Lambda {\Lambda }' + \Theta \]

What SEM actually does

The expected covariance matrix

The model formula:
\[ \Sigma = \Lambda \Phi {\Lambda }' + \Theta \] is a compact representation of: \[ \begin{Bmatrix} \lambda_{11} \\ \lambda_{21} \\ \lambda_{31} \\ \lambda_{41} \\ \lambda_{51} \\ \lambda_{61} \end{Bmatrix} \times \begin{Bmatrix} \lambda_{11} & \lambda_{21} & \lambda_{31} & \lambda_{41} & \lambda_{51} &\lambda_{61} \end{Bmatrix} + \begin{Bmatrix} \delta _{1} & & & & & \\ & \delta _{2} & & & & \\ & & \delta _{3} & & & \\ & & &\delta _{4} & & \\ & & & &\delta _{5} & \\ & & & & &\delta _{6} \end{Bmatrix} \]

What SEM actually does

The expected covariance matrix

Lets guess some values, say .7 for factor loadings. A good guess for errors is \( 1-\lambda^2 \), so .51 for errors. \[ \begin{Bmatrix}.7 \\ .7 \\ .7 \\ .7 \\ .7 \\ .7 \end{Bmatrix} \times \begin{Bmatrix} .7 & .7 & .7 & .7 & .7 &.7 \end{Bmatrix} + \begin{Bmatrix} .51 & & & & & \\ & .51 & & & & \\ & & .51 & & & \\ & & &.51 & & \\ & & & &.51 & \\ & & & & &.51 \end{Bmatrix} \]

What SEM actually does

This gives us:

     Var1 Var2 Var3 Var4 Var5 Var6
Var1 1.00 0.49 0.49 0.49 0.49 0.49
Var2 0.49 1.00 0.49 0.49 0.49 0.49
Var3 0.49 0.49 1.00 0.49 0.49 0.49
Var4 0.49 0.49 0.49 1.00 0.49 0.49
Var5 0.49 0.49 0.49 0.49 1.00 0.49
Var6 0.49 0.49 0.49 0.49 0.49 1.00

Which is closeish to:

     Var1 Var2 Var3 Var4 Var5 Var6
Var1 0.80 0.40 0.50 0.37 0.45 0.51
Var2 0.40 0.83 0.43 0.28 0.37 0.38
Var3 0.50 0.43 0.80 0.34 0.55 0.54
Var4 0.37 0.28 0.34 0.73 0.33 0.34
Var5 0.45 0.37 0.55 0.33 0.78 0.56
Var6 0.51 0.38 0.54 0.34 0.56 0.82

Not Bad! We could do better though.

Maximum liklihood

ML is a gradient descent algorithim that seeks to minimize:
\[ D_{ML} = log\left | \Sigma \right | + tr(S\Sigma ^{-1}) - log\left | S \right | - k \] The goal of ML is to minimize the differences between the expected covariance matrix and the observed one.
plot of chunk unnamed-chunk-5

Quick Warning

ML can give false solutions if it gets stuck' at a local min!

In practice however, the use of multiple start values and allowing the computer to pick start values makes this unlikley.

Re-fittng with ML

I coded up my own \( D_{ml} \) function and used a ML optimizer in R:

   loadings    SE Residual    SE
L1    0.673 0.042    0.351 0.031
L2    0.549 0.046    0.532 0.043
L3    0.748 0.040    0.245 0.025
L4    0.476 0.044    0.507 0.040
L5    0.720 0.040    0.262 0.026
L6    0.742 0.041    0.274 0.027

Some fairly complicated math gives the standard errors but all code is avaliable with these slides.

What SEM actually does

This gives us:

       L1   L2   L3   L4   L5   L6
[1,] 0.80 0.72 0.85 0.67 0.84 0.85
[2,] 0.90 0.83 0.94 0.79 0.93 0.94
[3,] 0.75 0.66 0.81 0.60 0.78 0.80
[4,] 0.83 0.77 0.86 0.73 0.85 0.86
[5,] 0.75 0.66 0.80 0.60 0.78 0.80
[6,] 0.77 0.68 0.83 0.63 0.81 0.83

Which is much closer to:

     Var1 Var2 Var3 Var4 Var5 Var6
Var1 0.80 0.40 0.50 0.37 0.45 0.51
Var2 0.40 0.83 0.43 0.28 0.37 0.38
Var3 0.50 0.43 0.80 0.34 0.55 0.54
Var4 0.37 0.28 0.34 0.73 0.33 0.34
Var5 0.45 0.37 0.55 0.33 0.78 0.56
Var6 0.51 0.38 0.54 0.34 0.56 0.82

Great! We have found an optimal fit given the model we hypothesised…..But did we hypothesise the right model?! In other words is our expected covariance matrix so close to the observed that we can say the difference is due to chance.

Testing Fit

\( \chi^2 \) takes the outcome of my ML discrepency with the formula and is given by:

\[ D_{ml} \times (N-1) \]

 chi-square =  21.73

For the p-value we first need the \( df \), which is given by: \[ df = p - k = (\frac{6\times(6+1)}{2}) - (6+6) = 9 \]

We can then use the \( \chi^2 \) and \( df \) to give the p-value:

 p value =  0.01

Note the role of N in this equation. Bigger sample \( > \) \( \chi^{2} \)

Absolute fit

David Kenny provides a great reference for fit measures.
RMSEA: \[ \frac{\sqrt{\chi^2-df}}{\sqrt{df(N-1)}} \]

RMSEA = 0.063

Incremental fit

Incremental fit requires the estimation of another model; A variance only model.

     Var1 Var2 Var3 Var4 Var5 Var6
Var1  0.8 0.00 0.00 0.00 0.00 0.00
Var2  0.0 0.83 0.00 0.00 0.00 0.00
Var3  0.0 0.00 0.81 0.00 0.00 0.00
Var4  0.0 0.00 0.00 0.73 0.00 0.00
Var5  0.0 0.00 0.00 0.00 0.78 0.00
Var6  0.0 0.00 0.00 0.00 0.00 0.83

We can go further from here if the null RMSEA is <.15

Here it is 0.4312

Incremental fit

CFI: \( \frac{(\chi^2_n - df_n) - (\chi^2_h - df_h)}{\chi^2_n - df_n} \)

TLI: \( \frac{(\chi^2_n /df_n) - (\chi^2_h/df_h)}{\chi^2_n /df_n} \)

In our case:

CFI =  0.987

TLI =  0.979

Fit Overview

CFI and TLI have NO mathematical upper bound. Most programs set values \( > \) 1 to 1.
The null model used by LISREL and Mplus is different.
The default null model is arbitary and could be anything!
If the null model fits too well (i.e. RMSEA < .158) CFI and TLI could be inappropriate.
As with all analyses, START simple and build up (Gelman and Hill, 2008)!!

Advanced Topics

Identification
Internal consistency
Fitting more complex models
Bayes SEM and ESEM
Useful links

Fitting more complex model

Mean Structure (ANOVA, MANOVA, Growth Curve)
Structure/regression (linear, probit, logistic)
Multilevel and Sandwich estimators
Estimators (MLR, ML-IRT,WLSMV)
Mixture models
Reflective, formative, and bifactor models

Alternative Measurement FIT

ESEM:

Integrates EFA with full SEM posibilities
More realistic representation of true population structure
Fit is typically better

BSEM

Flexible and easy to fit
Anything can be done if you can code it (I use JAGS)
Currently no agreed upon fit (I use Bayes Factor)
Difficult models? Save PVs and use those.

Identification and non-arbitary metric

Identification via item/variance and intercept/latent mean
Identificaiton via non-arbitary metric

Empirical Identification

Why?
- Parameters can result in mathematically undefined
- An item is a linear combination of another item.
- When strange constraints are included.
- Simplier models can be underidentified when more complex models are not.
- Poor starting values can result in underidentification.
How to check?
- Run the model to see if it works & check for strange results.
- Check mathematically

Reliability

Cronbach's alpha has many problems:

It requires \( \tau \) equivelence otherwise it reports only the lower bound of reliability.
As a lower bound it is typically too low.
Better estimates are easily avaliable in the Psych package in R
It is not a measure of unidimensionality (does not give information on internal consistency).

Model based approaches from CFA are also possible that do NOT require \( \tau \) equivalence: \[ \Omega = \frac{\sum (\lambda) ^2}{\sum (\lambda) ^2 + \sum \delta } \]

Resources

David Kenny has a wealth of easy to digest information.
Indiana University has a great simple introduction to SEM.
Open MX has a good set of advanced lecture slides
John Fox has excellent resources on SEM in R.
For ongoing advice SEMNET
For a quick intro to BSEM