Here we have a simple example of a confirmatory factor analysis (CFA) with four items that compose a single construct in this example visual. The first box to check is the model’s identification. We cannot assess how well the model we are describing is doing unless we know that the model is identified. If the model is underidentified than there are more degrees of freedom than parameters being estimated. We can calculate the degrees of freedom using the following formula q(q+1)/2 where q = the number of items. Therefore we have, 4*(4+1)/2 = 10 degrees of freedom and are estimating a total of 8 parameters, which are described below.
library(lavaan)
data(HolzingerSwineford1939)
dat = HolzingerSwineford1939
head(dat)
## id sex ageyr agemo school grade x1 x2 x3 x4 x5
## 1 1 1 13 1 Pasteur 7 3.333333 7.75 0.375 2.333333 5.75
## 2 2 2 13 7 Pasteur 7 5.333333 5.25 2.125 1.666667 3.00
## 3 3 2 13 1 Pasteur 7 4.500000 5.25 1.875 1.000000 1.75
## 4 4 1 13 2 Pasteur 7 5.333333 7.75 3.000 2.666667 4.50
## 5 5 2 12 2 Pasteur 7 4.833333 4.75 0.875 2.666667 4.00
## 6 6 2 14 1 Pasteur 7 5.333333 5.00 2.250 1.000000 3.00
## x6 x7 x8 x9
## 1 1.2857143 3.391304 5.75 6.361111
## 2 1.2857143 3.782609 6.25 7.916667
## 3 0.4285714 3.260870 3.90 4.416667
## 4 2.4285714 3.000000 5.30 4.861111
## 5 2.5714286 3.695652 6.30 5.916667
## 6 0.8571429 4.347826 6.65 7.500000
\[{x_{1} = \lambda_{11}(\xi_{1})+ \delta_{1}}~~~ (1.1)\] \[{x_{2} = \lambda_{21}(\xi_{1})+ \delta_{2}}~~~ (1.2)\] \[{x_{3} = \lambda_{31}(\xi_{1})+ \delta_{3}}~~~ (1.3)\] \[{x_{4} = \lambda_{41}(\xi_{1})+ \delta_{4}}~~~ (1.4)\] \[ \begin{bmatrix} \theta_{11} & 0 & 0 & 0\\ 0 & \theta_{22} & 0 & 0 \\ 0 & 0 & \theta_{33} & 0 \\ 0 & 0 & 0 & \theta_{44}\\ \end{bmatrix} {(2)} \] \[ \begin{bmatrix} \phi_{1}\\ \end{bmatrix}{(3)} \] \[ \begin{bmatrix} 1 \\ \lambda_{21} \\ \lambda_{31}\\ \lambda_{41}\\ \end{bmatrix} {(4)} \]
Equations 1.1 through 1.4 are the equations for the items. In this example, each item has a fixed intercept at 1, with a regression weight of lambda multiplied by the latent variable xi. Then we add the unique error term for that item delta.
Equation 2 is the variance covariance matrix for the deltas (i.e. the error terms). In this model, I am assuming that error terms are uncorrelated and the variances are estimated by the model.
Equation 3 is the variance covariance matrix for the phis, which in this case there is only one, because we are only estimating the variance for the latent construct visual and no other covariances or variances, because there are on other latent constructs in this example.
Equation 4 contains the four parameter estimates for the factor loadings from the latent variable on the observed x variables (i.e. the items). The first lambda value is one, because we fix its value at one to allow estimation for the other lambda values.
Now we can conduct the CFA with one construct loading onto four items using the code below.
model1 = 'visual =~ x1 + x2 + x3 +x4'
fit1 <-lavaan:::cfa(model1, data=dat)
summary(fit1, fit.measures = TRUE)
## lavaan (0.5-23.1097) converged normally after 22 iterations
##
## Number of observations 301
##
## Estimator ML
## Minimum Function Test Statistic 11.798
## Degrees of freedom 2
## P-value (Chi-square) 0.003
##
## Model test baseline model:
##
## Minimum Function Test Statistic 157.076
## Degrees of freedom 6
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.935
## Tucker-Lewis Index (TLI) 0.805
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1812.314
## Loglikelihood unrestricted model (H1) -1806.415
##
## Number of free parameters 8
## Akaike (AIC) 3640.629
## Bayesian (BIC) 3670.286
## Sample-size adjusted Bayesian (BIC) 3644.914
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.128
## 90 Percent Confidence Interval 0.064 0.202
## P-value RMSEA <= 0.05 0.025
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.045
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## visual =~
## x1 1.000
## x2 0.547 0.105 5.208 0.000
## x3 0.704 0.119 5.930 0.000
## x4 0.540 0.104 5.206 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .x1 0.527 0.132 3.993 0.000
## .x2 1.133 0.103 11.018 0.000
## .x3 0.863 0.096 9.028 0.000
## .x4 1.108 0.101 11.020 0.000
## visual 0.831 0.161 5.156 0.000
Number of observations = Total number of data points, which is 301 Estimator = This says that I used the maximum likelihood estimator to find the parameter estimates Minimum Function Test Statistic = Test statistic to assess if the model implied specifications and data are statistically significantly different. This statistic is almost always significant meaning we need to look to other fit indices such as CFI, TFI, RMSEA, and SRMR to assess the model’s fit.
Model test baseline model = Tests whether the model is a better fit for the data than a null or empty model.
CFI/TFI = A value of .935 means that our model is 93.5% better than the null. RMSEA = This is the average amount of residual or error the model between the data implied and the model covariance structure. SRMR = Standardized version of the RMSEA.
Latent variables = These are interpreted just like a covariate in a regression model. The first item is 1, because we need one item to serve as the comparison for all other items. Therefore, we are only estimating 8 parameters not 9, because the parameter estimate for item one is fixed.
Finally, we can plot the model to help us visualize what it looks like using the semPlot package.
library(semPlot)
semPaths(fit1, title = FALSE, curvePivot = TRUE)
Now we can move to an example using SEM. Here we will expand the model to include an additional latent variable textual and then add the SEM part, which evaluates the direct effect that visual has on textual.
The measurement equations are the same, except that the four textual items are indicated by y’s instead of x’s. One new equation for this model is equation 9 where we have the regression coefficient beta(12) indicating the direct effect that we are hypothesizing that visual has on textual. The other is the actual equation (6) for the direct effect that textual (eta2) has on the visual (eta1).
model2 = 'visual =~ x1 + x2 + x3 +x4
textual =~ x5 + x6 + x7 +x8
visual ~ textual'
fit2 = sem(model2, data = dat)
summary(fit2)
## lavaan (0.5-23.1097) converged normally after 36 iterations
##
## Number of observations 301
##
## Estimator ML
## Minimum Function Test Statistic 207.212
## Degrees of freedom 19
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## visual =~
## x1 1.000
## x2 0.496 0.154 3.222 0.001
## x3 0.480 0.148 3.242 0.001
## x4 2.033 0.283 7.187 0.000
## textual =~
## x5 1.000
## x6 0.840 0.050 16.676 0.000
## x7 0.164 0.060 2.714 0.007
## x8 0.164 0.056 2.939 0.003
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## visual ~
## textual 0.439 0.063 6.975 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .x1 1.103 0.093 11.860 0.000
## .x2 1.319 0.108 12.185 0.000
## .x3 1.216 0.100 12.184 0.000
## .x4 0.297 0.087 3.406 0.001
## .x5 0.456 0.059 7.676 0.000
## .x6 0.347 0.043 8.023 0.000
## .x7 1.151 0.094 12.224 0.000
## .x8 0.989 0.081 12.216 0.000
## .visual 0.023 0.020 1.119 0.263
## textual 1.204 0.138 8.715 0.000
semPaths(fit2, title = FALSE, curvePivot = TRUE)
\[{x_{1} = \lambda_{11}(\xi_{1})+ \delta_{1}}~~~ (5.1)\] \[{x_{2} = \lambda_{21}(\xi_{1})+ \delta_{2}}~~~ (5.2)\] \[{x_{3} = \lambda_{31}(\xi_{1})+ \delta_{3}}~~~ (5.3)\] \[{x_{4} = \lambda_{41}(\xi_{1})+ \delta_{4}}~~~ (5.4)\] \[{y_{1} = \lambda_{12}(\xi_{1})+ \delta_{1}}~~~ (5.5)\] \[{y_{2} = \lambda_{22}(\xi_{1})+ \delta_{2}}~~~ (5.6)\] \[{y_{3} = \lambda_{32}(\xi_{1})+ \delta_{3}}~~~ (5.7)\] \[{y_{4} = \lambda_{42}(\xi_{1})+ \delta_{4}}~~~ (5.8)\] \[{\xi_{2} = \beta_{12}(\xi_{1})}~~~ (6)\] \[
\begin{bmatrix}
\theta_{11} & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
0 & \theta_{22} & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & \theta_{33} & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & \theta_{44} & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & \theta_{55} & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & \theta_{66} & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & \theta_{77} & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 0 & \theta_{88}\\
\end{bmatrix} {(7)}
\]
\[ \begin{bmatrix} \phi_{11} & 0 \\ 0 & \phi_{22} \\ \end{bmatrix} {(8)} \]
\[ \begin{bmatrix} 0 & \beta_{12} \\ 0 & 0 \\ \end{bmatrix} {(9)} \]
\[ \begin{bmatrix} 1 & 0\\ \lambda_{21} & 0 \\ \lambda_{31} & 0\\ \lambda_{41} & 0\\ 0 & 1\\ 0 & \lambda_{22}\\ 0 & \lambda_{32}\\ 0 & \lambda_{42}\\ \end{bmatrix} {(10)} \]