We will be using a built-in dataset from lavaan package, the HolzingerSwineford1939 dataset. Confirmatory Factor Analysis (CFA) is usually conducted to test the hypothesis of relationships between observed (manifest) variables and latent variables based on the already established theory.
Formulating model syntax is how we describe the model before computing. For more information on syntax writing, see here
setwd("D:/Class Materials & Work/Summer 2020 practice/SEM/CFA")
library(lavaan)
library(semPlot)
The model consists of nine variables, three of each indicate one of the three latent variables (factor) of visual, textual, and speed. See picture below for visual representation of the model:
The model can be syntactically described as:
# specify the model
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
We can now fit the model as follows:
# fit the model
fit <- cfa(HS.model, data=HolzingerSwineford1939)
The cfa() function is a dedicated function for fitting confirmatory factor analysis models. The first argument is the user-specified model. The second argument is the dataset that contains the observed variables. Once the model has been fitted, the summary() function provides a nice summary of the fitted model:
# display summary output
summary(fit, fit.measures=TRUE)
## lavaan 0.6-6 ended normally after 35 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 21
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 85.306
## Degrees of freedom 24
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 918.852
## Degrees of freedom 36
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.931
## Tucker-Lewis Index (TLI) 0.896
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3737.745
## Loglikelihood unrestricted model (H1) -3695.092
##
## Akaike (AIC) 7517.490
## Bayesian (BIC) 7595.339
## Sample-size adjusted Bayesian (BIC) 7528.739
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.092
## 90 Percent confidence interval - lower 0.071
## 90 Percent confidence interval - upper 0.114
## P-value RMSEA <= 0.05 0.001
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.065
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## visual =~
## x1 1.000
## x2 0.554 0.100 5.554 0.000
## x3 0.729 0.109 6.685 0.000
## textual =~
## x4 1.000
## x5 1.113 0.065 17.014 0.000
## x6 0.926 0.055 16.703 0.000
## speed =~
## x7 1.000
## x8 1.180 0.165 7.152 0.000
## x9 1.082 0.151 7.155 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## visual ~~
## textual 0.408 0.074 5.552 0.000
## speed 0.262 0.056 4.660 0.000
## textual ~~
## speed 0.173 0.049 3.518 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .x1 0.549 0.114 4.833 0.000
## .x2 1.134 0.102 11.146 0.000
## .x3 0.844 0.091 9.317 0.000
## .x4 0.371 0.048 7.779 0.000
## .x5 0.446 0.058 7.642 0.000
## .x6 0.356 0.043 8.277 0.000
## .x7 0.799 0.081 9.823 0.000
## .x8 0.488 0.074 6.573 0.000
## .x9 0.566 0.071 8.003 0.000
## visual 0.809 0.145 5.564 0.000
## textual 0.979 0.112 8.737 0.000
## speed 0.384 0.086 4.451 0.000
semPaths(fit, what = 'std', layout = 'tree', edge.label.cex=.9, curvePivot = TRUE)
Dotted lines represent estimates that are constrained to a certain value, which is necessary to make the model identifiable or to give a latent factor a certain scale. The function fit.measures = TRUE adds additional information to the result from Model test baseline model to SRMR.
The typical workflow in lavaan package is as follows:
1. Specify your model using the lavaan model syntax.
2. Fit the model in accordance with the dataset, e.g., cfa(), sem(), or growth().
3. Extract information from the fitted model. See this for more information on data extraction.