Confirmatory Factor Analysis

We will be using a built-in dataset from lavaan package, the HolzingerSwineford1939 dataset. Confirmatory Factor Analysis (CFA) is usually conducted to test the hypothesis of relationships between observed (manifest) variables and latent variables based on the already established theory.

Formulating model syntax is how we describe the model before computing. For more information on syntax writing, see here

setwd("D:/Class Materials & Work/Summer 2020 practice/SEM/CFA")

library(lavaan)
library(semPlot)

The model consists of nine variables, three of each indicate one of the three latent variables (factor) of visual, textual, and speed. See picture below for visual representation of the model:

The model of mental ability test score
The model can be syntactically described as:

# specify the model
HS.model <- ' visual  =~ x1 + x2 + x3      
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

We can now fit the model as follows:

# fit the model
fit <- cfa(HS.model, data=HolzingerSwineford1939)

The cfa() function is a dedicated function for fitting confirmatory factor analysis models. The first argument is the user-specified model. The second argument is the dataset that contains the observed variables. Once the model has been fitted, the summary() function provides a nice summary of the fitted model:

# display summary output
summary(fit, fit.measures=TRUE)

## lavaan 0.6-6 ended normally after 35 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         21
##                                                       
##   Number of observations                           301
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                85.306
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               918.852
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.931
##   Tucker-Lewis Index (TLI)                       0.896
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3737.745
##   Loglikelihood unrestricted model (H1)      -3695.092
##                                                       
##   Akaike (AIC)                                7517.490
##   Bayesian (BIC)                              7595.339
##   Sample-size adjusted Bayesian (BIC)         7528.739
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.092
##   90 Percent confidence interval - lower         0.071
##   90 Percent confidence interval - upper         0.114
##   P-value RMSEA <= 0.05                          0.001
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.065
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   visual =~                                           
##     x1                1.000                           
##     x2                0.554    0.100    5.554    0.000
##     x3                0.729    0.109    6.685    0.000
##   textual =~                                          
##     x4                1.000                           
##     x5                1.113    0.065   17.014    0.000
##     x6                0.926    0.055   16.703    0.000
##   speed =~                                            
##     x7                1.000                           
##     x8                1.180    0.165    7.152    0.000
##     x9                1.082    0.151    7.155    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   visual ~~                                           
##     textual           0.408    0.074    5.552    0.000
##     speed             0.262    0.056    4.660    0.000
##   textual ~~                                          
##     speed             0.173    0.049    3.518    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .x1                0.549    0.114    4.833    0.000
##    .x2                1.134    0.102   11.146    0.000
##    .x3                0.844    0.091    9.317    0.000
##    .x4                0.371    0.048    7.779    0.000
##    .x5                0.446    0.058    7.642    0.000
##    .x6                0.356    0.043    8.277    0.000
##    .x7                0.799    0.081    9.823    0.000
##    .x8                0.488    0.074    6.573    0.000
##    .x9                0.566    0.071    8.003    0.000
##     visual            0.809    0.145    5.564    0.000
##     textual           0.979    0.112    8.737    0.000
##     speed             0.384    0.086    4.451    0.000

semPaths(fit, what = 'std', layout = 'tree', edge.label.cex=.9, curvePivot = TRUE)

Dotted lines represent estimates that are constrained to a certain value, which is necessary to make the model identifiable or to give a latent factor a certain scale. The function fit.measures = TRUE adds additional information to the result from Model test baseline model to SRMR.

The typical workflow in lavaan package is as follows:
1. Specify your model using the lavaan model syntax.
2. Fit the model in accordance with the dataset, e.g., cfa(), sem(), or growth().
3. Extract information from the fitted model. See this for more information on data extraction.

Confirmatory Factor Analysis

Tarid Wongvorachan

August 25th, 2020