Equations and example of simple CFA and SEM

Here we have a simple example of a confirmatory factor analysis (CFA) with four items that compose a single construct in this example visual. The first box to check is the model’s identification. We cannot assess how well the model we are describing is doing unless we know that the model is identified. If the model is underidentified than there are more degrees of freedom than parameters being estimated. We can calculate the degrees of freedom using the following formula q(q+1)/2 where q = the number of items. Therefore we have, 4*(4+1)/2 = 10 degrees of freedom and are estimating a total of 8 parameters, which are described below.

library(lavaan)
data(HolzingerSwineford1939)
dat = HolzingerSwineford1939
head(dat)

##   id sex ageyr agemo  school grade       x1   x2    x3       x4   x5
## 1  1   1    13     1 Pasteur     7 3.333333 7.75 0.375 2.333333 5.75
## 2  2   2    13     7 Pasteur     7 5.333333 5.25 2.125 1.666667 3.00
## 3  3   2    13     1 Pasteur     7 4.500000 5.25 1.875 1.000000 1.75
## 4  4   1    13     2 Pasteur     7 5.333333 7.75 3.000 2.666667 4.50
## 5  5   2    12     2 Pasteur     7 4.833333 4.75 0.875 2.666667 4.00
## 6  6   2    14     1 Pasteur     7 5.333333 5.00 2.250 1.000000 3.00
##          x6       x7   x8       x9
## 1 1.2857143 3.391304 5.75 6.361111
## 2 1.2857143 3.782609 6.25 7.916667
## 3 0.4285714 3.260870 3.90 4.416667
## 4 2.4285714 3.000000 5.30 4.861111
## 5 2.5714286 3.695652 6.30 5.916667
## 6 0.8571429 4.347826 6.65 7.500000

\[{x_{1} = \lambda_{11}(\xi_{1})+ \delta_{1}}~~~ (1.1)\] \[{x_{2} = \lambda_{21}(\xi_{1})+ \delta_{2}}~~~ (1.2)\] \[{x_{3} = \lambda_{31}(\xi_{1})+ \delta_{3}}~~~ (1.3)\] \[{x_{4} = \lambda_{41}(\xi_{1})+ \delta_{4}}~~~ (1.4)\] \[ \begin{bmatrix} \theta_{11} & 0 & 0 & 0\\ 0 & \theta_{22} & 0 & 0 \\ 0 & 0 & \theta_{33} & 0 \\ 0 & 0 & 0 & \theta_{44}\\ \end{bmatrix} {(2)} \] \[ \begin{bmatrix} \phi_{1}\\ \end{bmatrix}{(3)} \] \[ \begin{bmatrix} 1 \\ \lambda_{21} \\ \lambda_{31}\\ \lambda_{41}\\ \end{bmatrix} {(4)} \]

Equations 1.1 through 1.4 are the equations for the items. In this example, each item has a fixed intercept at 1, with a regression weight of lambda multiplied by the latent variable xi. Then we add the unique error term for that item delta.

Equation 2 is the variance covariance matrix for the deltas (i.e. the error terms). In this model, I am assuming that error terms are uncorrelated and the variances are estimated by the model.

Equation 3 is the variance covariance matrix for the phis, which in this case there is only one, because we are only estimating the variance for the latent construct visual and no other covariances or variances, because there are on other latent constructs in this example.

Equation 4 contains the four parameter estimates for the factor loadings from the latent variable on the observed x variables (i.e. the items). The first lambda value is one, because we fix its value at one to allow estimation for the other lambda values.

Now we can conduct the CFA with one construct loading onto four items using the code below.

model1 = 'visual =~ x1 + x2 + x3 +x4'
fit1 <-lavaan:::cfa(model1, data=dat)
summary(fit1, fit.measures = TRUE)

## lavaan (0.5-23.1097) converged normally after  22 iterations
## 
##   Number of observations                           301
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               11.798
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.003
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              157.076
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.935
##   Tucker-Lewis Index (TLI)                       0.805
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1812.314
##   Loglikelihood unrestricted model (H1)      -1806.415
## 
##   Number of free parameters                          8
##   Akaike (AIC)                                3640.629
##   Bayesian (BIC)                              3670.286
##   Sample-size adjusted Bayesian (BIC)         3644.914
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.128
##   90 Percent Confidence Interval          0.064  0.202
##   P-value RMSEA <= 0.05                          0.025
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.045
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   visual =~                                           
##     x1                1.000                           
##     x2                0.547    0.105    5.208    0.000
##     x3                0.704    0.119    5.930    0.000
##     x4                0.540    0.104    5.206    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .x1                0.527    0.132    3.993    0.000
##    .x2                1.133    0.103   11.018    0.000
##    .x3                0.863    0.096    9.028    0.000
##    .x4                1.108    0.101   11.020    0.000
##     visual            0.831    0.161    5.156    0.000

Number of observations = Total number of data points, which is 301 Estimator = This says that I used the maximum likelihood estimator to find the parameter estimates Minimum Function Test Statistic = Test statistic to assess if the model implied specifications and data are statistically significantly different. This statistic is almost always significant meaning we need to look to other fit indices such as CFI, TFI, RMSEA, and SRMR to assess the model’s fit.

Model test baseline model = Tests whether the model is a better fit for the data than a null or empty model.

CFI/TFI = A value of .935 means that our model is 93.5% better than the null. RMSEA = This is the average amount of residual or error the model between the data implied and the model covariance structure. SRMR = Standardized version of the RMSEA.

Latent variables = These are interpreted just like a covariate in a regression model. The first item is 1, because we need one item to serve as the comparison for all other items. Therefore, we are only estimating 8 parameters not 9, because the parameter estimate for item one is fixed.

Finally, we can plot the model to help us visualize what it looks like using the semPlot package.

library(semPlot)
semPaths(fit1, title = FALSE, curvePivot = TRUE)

Now we can move to an example using SEM. Here we will expand the model to include an additional latent variable textual and then add the SEM part, which evaluates the direct effect that visual has on textual.

The measurement equations are the same, except that the four textual items are indicated by y’s instead of x’s. One new equation for this model is equation 9 where we have the regression coefficient beta(12) indicating the direct effect that we are hypothesizing that visual has on textual. The other is the actual equation (6) for the direct effect that textual (eta2) has on the visual (eta1).

model2 = 'visual =~ x1 + x2 + x3 +x4
          textual =~ x5 + x6 + x7 +x8
          visual ~ textual'
fit2 = sem(model2, data = dat)
summary(fit2)

## lavaan (0.5-23.1097) converged normally after  36 iterations
## 
##   Number of observations                           301
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic              207.212
##   Degrees of freedom                                19
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   visual =~                                           
##     x1                1.000                           
##     x2                0.496    0.154    3.222    0.001
##     x3                0.480    0.148    3.242    0.001
##     x4                2.033    0.283    7.187    0.000
##   textual =~                                          
##     x5                1.000                           
##     x6                0.840    0.050   16.676    0.000
##     x7                0.164    0.060    2.714    0.007
##     x8                0.164    0.056    2.939    0.003
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   visual ~                                            
##     textual           0.439    0.063    6.975    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .x1                1.103    0.093   11.860    0.000
##    .x2                1.319    0.108   12.185    0.000
##    .x3                1.216    0.100   12.184    0.000
##    .x4                0.297    0.087    3.406    0.001
##    .x5                0.456    0.059    7.676    0.000
##    .x6                0.347    0.043    8.023    0.000
##    .x7                1.151    0.094   12.224    0.000
##    .x8                0.989    0.081   12.216    0.000
##    .visual            0.023    0.020    1.119    0.263
##     textual           1.204    0.138    8.715    0.000

semPaths(fit2, title = FALSE, curvePivot = TRUE)

\[{x_{1} = \lambda_{11}(\xi_{1})+ \delta_{1}}~~~ (5.1)\] \[{x_{2} = \lambda_{21}(\xi_{1})+ \delta_{2}}~~~ (5.2)\] \[{x_{3} = \lambda_{31}(\xi_{1})+ \delta_{3}}~~~ (5.3)\] \[{x_{4} = \lambda_{41}(\xi_{1})+ \delta_{4}}~~~ (5.4)\] \[{y_{1} = \lambda_{12}(\xi_{1})+ \delta_{1}}~~~ (5.5)\] \[{y_{2} = \lambda_{22}(\xi_{1})+ \delta_{2}}~~~ (5.6)\] \[{y_{3} = \lambda_{32}(\xi_{1})+ \delta_{3}}~~~ (5.7)\] \[{y_{4} = \lambda_{42}(\xi_{1})+ \delta_{4}}~~~ (5.8)\] \[{\xi_{2} = \beta_{12}(\xi_{1})}~~~ (6)\] \[ \begin{bmatrix} \theta_{11} & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & \theta_{22} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \theta_{33} & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \theta_{44} & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & \theta_{55} & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & \theta_{66} & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & \theta_{77} & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & \theta_{88}\\ \end{bmatrix} {(7)} \]

\[ \begin{bmatrix} \phi_{11} & 0 \\ 0 & \phi_{22} \\ \end{bmatrix} {(8)} \]

\[ \begin{bmatrix} 0 & \beta_{12} \\ 0 & 0 \\ \end{bmatrix} {(9)} \]

\[ \begin{bmatrix} 1 & 0\\ \lambda_{21} & 0 \\ \lambda_{31} & 0\\ \lambda_{41} & 0\\ 0 & 1\\ 0 & \lambda_{22}\\ 0 & \lambda_{32}\\ 0 & \lambda_{42}\\ \end{bmatrix} {(10)} \]