Behind the Certain: How Structural Equation Modelling actually works

Philip D Parker
07 Nov 2013

Outline


  • Quick history of SEM.
  • What SEM does.
  • How to do it by hand.
  • Advanced topics.

    Many thanks to Jeremy Miles whose work is the basis of this presentation.

Quick History of SEM


  • Factor analysis
  • Simultaneous equations
  • LISREL
  • Exension to Categorical, multilevel, and Baysian contexts

Why use R:


  • Free!
  • Hundreds of packages
  • Used by statistitions
  • Programing language with easy links to other languages (C++, python, etc.)
  • Extensive SEM options in R: OpenMX, SEM, Lavaan
  • R allows for a complete work flow (latex/sweave, markdown, html, html5 slides)

What SEM actually does


The observed data:

          Var1  Var2  Var3  Var4  Var5  Var6
Person1  -1.91  0.42 -0.82 -0.32 -1.13 -0.96
Person2   0.09  0.21  0.26 -0.08 -0.23  0.09
Person3   1.19  0.49 -0.69  0.38  0.58  0.17
Person4   0.73  1.42  1.66  0.06  2.04  1.11
Person5  -0.49 -0.47 -0.73 -0.72  0.16 -1.54
Person6  -0.01 -0.09  0.20 -0.53  0.78  0.67
Person7   0.02 -0.57 -0.51  1.50  0.24  0.24
Person8   0.66  0.31  1.17  0.05  0.24  1.62
Person9   0.89  0.04  0.46 -0.73  0.23  0.31
Person10  1.18  1.31  2.05  0.82 -0.16  0.60

What SEM actually does


The Covariance Matrix:

      Var1  Var2  Var3  Var4  Var5  Var6
Var1 0.804 0.399 0.500 0.367 0.451 0.510
Var2 0.399 0.833 0.433 0.283 0.372 0.377
Var3 0.500 0.433 0.805 0.339 0.551 0.543
Var4 0.367 0.283 0.339 0.733 0.332 0.341
Var5 0.451 0.372 0.551 0.332 0.780 0.556
Var6 0.510 0.377 0.543 0.341 0.556 0.825

What SEM actually does


We also have a model we want to fit to our data:
SEM tests how closely this model produces an expected covariance matrix that is as close as possible to the observed covariances.

What SEM actually does


The expected covariance matrix


The expected covariance matrix formula is:
\[ \Sigma = \Lambda \Phi {\Lambda }' + \Theta \]
But this model is NOT identified. As the latent variables are unobserved (more unknowns than knowns) there are an infinite set of solutions.
Typically one of the loadings is fixed to 1 but in this instance I fixed the variance so I could make things a little simplier:
\[ \Sigma = \Lambda {\Lambda }' + \Theta \]

What SEM actually does


The expected covariance matrix

The model formula:
\[ \Sigma = \Lambda \Phi {\Lambda }' + \Theta \] is a compact representation of: \[ \begin{Bmatrix} \lambda_{11} \\ \lambda_{21} \\ \lambda_{31} \\ \lambda_{41} \\ \lambda_{51} \\ \lambda_{61} \end{Bmatrix} \times \begin{Bmatrix} \lambda_{11} & \lambda_{21} & \lambda_{31} & \lambda_{41} & \lambda_{51} &\lambda_{61} \end{Bmatrix} + \begin{Bmatrix} \delta _{1} & & & & & \\ & \delta _{2} & & & & \\ & & \delta _{3} & & & \\ & & &\delta _{4} & & \\ & & & &\delta _{5} & \\ & & & & &\delta _{6} \end{Bmatrix} \]

What SEM actually does


The expected covariance matrix


Lets guess some values, say .7 for factor loadings. A good guess for errors is \( 1-\lambda^2 \), so .51 for errors. \[ \begin{Bmatrix}.7 \\ .7 \\ .7 \\ .7 \\ .7 \\ .7 \end{Bmatrix} \times \begin{Bmatrix} .7 & .7 & .7 & .7 & .7 &.7 \end{Bmatrix} + \begin{Bmatrix} .51 & & & & & \\ & .51 & & & & \\ & & .51 & & & \\ & & &.51 & & \\ & & & &.51 & \\ & & & & &.51 \end{Bmatrix} \]

What SEM actually does


This gives us:

     Var1 Var2 Var3 Var4 Var5 Var6
Var1 1.00 0.49 0.49 0.49 0.49 0.49
Var2 0.49 1.00 0.49 0.49 0.49 0.49
Var3 0.49 0.49 1.00 0.49 0.49 0.49
Var4 0.49 0.49 0.49 1.00 0.49 0.49
Var5 0.49 0.49 0.49 0.49 1.00 0.49
Var6 0.49 0.49 0.49 0.49 0.49 1.00


Which is closeish to:

     Var1 Var2 Var3 Var4 Var5 Var6
Var1 0.80 0.40 0.50 0.37 0.45 0.51
Var2 0.40 0.83 0.43 0.28 0.37 0.38
Var3 0.50 0.43 0.80 0.34 0.55 0.54
Var4 0.37 0.28 0.34 0.73 0.33 0.34
Var5 0.45 0.37 0.55 0.33 0.78 0.56
Var6 0.51 0.38 0.54 0.34 0.56 0.82




Not Bad! We could do better though.

Maximum liklihood

ML is a gradient descent algorithim that seeks to minimize:
\[ D_{ML} = log\left | \Sigma \right | + tr(S\Sigma ^{-1}) - log\left | S \right | - k \] The goal of ML is to minimize the differences between the expected covariance matrix and the observed one.
plot of chunk unnamed-chunk-5

Quick Warning



ML can give false solutions if it gets stuck' at a local min!

In practice however, the use of multiple start values and allowing the computer to pick start values makes this unlikley.

Re-fittng with ML

I coded up my own \( D_{ml} \) function and used a ML optimizer in R:

   loadings    SE Residual    SE
L1    0.673 0.042    0.351 0.031
L2    0.549 0.046    0.532 0.043
L3    0.748 0.040    0.245 0.025
L4    0.476 0.044    0.507 0.040
L5    0.720 0.040    0.262 0.026
L6    0.742 0.041    0.274 0.027

Some fairly complicated math gives the standard errors but all code is avaliable with these slides.

What SEM actually does


This gives us:

       L1   L2   L3   L4   L5   L6
[1,] 0.80 0.72 0.85 0.67 0.84 0.85
[2,] 0.90 0.83 0.94 0.79 0.93 0.94
[3,] 0.75 0.66 0.81 0.60 0.78 0.80
[4,] 0.83 0.77 0.86 0.73 0.85 0.86
[5,] 0.75 0.66 0.80 0.60 0.78 0.80
[6,] 0.77 0.68 0.83 0.63 0.81 0.83


Which is much closer to:

     Var1 Var2 Var3 Var4 Var5 Var6
Var1 0.80 0.40 0.50 0.37 0.45 0.51
Var2 0.40 0.83 0.43 0.28 0.37 0.38
Var3 0.50 0.43 0.80 0.34 0.55 0.54
Var4 0.37 0.28 0.34 0.73 0.33 0.34
Var5 0.45 0.37 0.55 0.33 0.78 0.56
Var6 0.51 0.38 0.54 0.34 0.56 0.82



Great! We have found an optimal fit given the model we hypothesised…..But did we hypothesise the right model?! In other words is our expected covariance matrix so close to the observed that we can say the difference is due to chance.

Testing Fit


\( \chi^2 \) takes the outcome of my ML discrepency with the formula and is given by:

\[ D_{ml} \times (N-1) \]

 chi-square =  21.73

For the p-value we first need the \( df \), which is given by: \[ df = p - k = (\frac{6\times(6+1)}{2}) - (6+6) = 9 \]

We can then use the \( \chi^2 \) and \( df \) to give the p-value:

 p value =  0.01

Note the role of N in this equation. Bigger sample \( > \) \( \chi^{2} \)

Absolute fit


David Kenny provides a great reference for fit measures.
RMSEA: \[ \frac{\sqrt{\chi^2-df}}{\sqrt{df(N-1)}} \]

RMSEA = 0.063

Incremental fit

Incremental fit requires the estimation of another model; A variance only model.

     Var1 Var2 Var3 Var4 Var5 Var6
Var1  0.8 0.00 0.00 0.00 0.00 0.00
Var2  0.0 0.83 0.00 0.00 0.00 0.00
Var3  0.0 0.00 0.81 0.00 0.00 0.00
Var4  0.0 0.00 0.00 0.73 0.00 0.00
Var5  0.0 0.00 0.00 0.00 0.78 0.00
Var6  0.0 0.00 0.00 0.00 0.00 0.83

We can go further from here if the null RMSEA is <.15

Here it is 0.4312

Incremental fit


CFI: \( \frac{(\chi^2_n - df_n) - (\chi^2_h - df_h)}{\chi^2_n - df_n} \)

TLI: \( \frac{(\chi^2_n /df_n) - (\chi^2_h/df_h)}{\chi^2_n /df_n} \)

In our case:

CFI =  0.987
TLI =  0.979

Fit Overview


  • CFI and TLI have NO mathematical upper bound. Most programs set values \( > \) 1 to 1.
  • The null model used by LISREL and Mplus is different.
  • The default null model is arbitary and could be anything!
  • If the null model fits too well (i.e. RMSEA < .158) CFI and TLI could be inappropriate.
  • As with all analyses, START simple and build up (Gelman and Hill, 2008)!!

Advanced Topics


  • Identification
  • Internal consistency
  • Fitting more complex models
  • Bayes SEM and ESEM
  • Useful links

Fitting more complex model


  • Mean Structure (ANOVA, MANOVA, Growth Curve)
  • Structure/regression (linear, probit, logistic)
  • Multilevel and Sandwich estimators
  • Estimators (MLR, ML-IRT,WLSMV)
  • Mixture models
  • Reflective, formative, and bifactor models

Alternative Measurement FIT


ESEM:

  • Integrates EFA with full SEM posibilities
  • More realistic representation of true population structure
  • Fit is typically better

BSEM

  • Flexible and easy to fit
  • Anything can be done if you can code it (I use JAGS)
  • Currently no agreed upon fit (I use Bayes Factor)
  • Difficult models? Save PVs and use those.

Identification and non-arbitary metric


  • Identification via item/variance and intercept/latent mean
  • Identificaiton via non-arbitary metric

Empirical Identification


  • Why?
    • Parameters can result in mathematically undefined
    • An item is a linear combination of another item.
    • When strange constraints are included.
    • Simplier models can be underidentified when more complex models are not.
    • Poor starting values can result in underidentification.
  • How to check?
    • Run the model to see if it works & check for strange results.
    • Check mathematically

Reliability


Cronbach's alpha has many problems:

  • It requires \( \tau \) equivelence otherwise it reports only the lower bound of reliability.
  • As a lower bound it is typically too low.
  • Better estimates are easily avaliable in the Psych package in R
  • It is not a measure of unidimensionality (does not give information on internal consistency).

Model based approaches from CFA are also possible that do NOT require \( \tau \) equivalence: \[ \Omega = \frac{\sum (\lambda) ^2}{\sum (\lambda) ^2 + \sum \delta } \]

Resources