Confirmatory Factor Analysis using lavaan in R

Author

Long Bui

Introduction

Confirmatory Factor Analysis (CFA) is a statistical technique used to test the hypothesis that the relationships between observed variables and their underlying latent constructs are consistent with the researcher’s theoretical understanding.

In this tutorial, we will perform a CFA using the lavaan package in R.

Step by Step Guide:

Step 1: Install and Load the lavaan Package

First, we need to install and load the lavaan package. This package is designed for latent variable analysis, including CFA.

# Set up the environment and install lavaan if necessary
if (!require("lavaan")) {
  install.packages("lavaan", dependencies = TRUE)
}
library(lavaan)

Step 2: Load data

library(haven) # for loadind sav file
data <- read_sav("data.sav") # replace with your actual path

Step 3: Define the CFA model

model <- '
  subscale1 =~ HLQ1.2 + HLQ1.8 + HLQ1.17 + HLQ1.22
'

Step 4: Fit the CFA model

# Fit the CFA model
fit <- sem(model, data = data)

Step 5: Summarised the Results

Finally, we summarize the results of the CFA model fit. This includes fit measures and standardized estimates. We will also create a nicely formatted results table using the kableExtra package.

# Load necessary packages for better table presentation
if (!require("kableExtra")) {
  install.packages("kableExtra", dependencies = TRUE)
}
library(kableExtra)
# Summarize the fit of the model
fit_summary <- summary(fit, fit.measures = TRUE, standardized = TRUE)

# Extract relevant parts for a nice results table
estimates <- parameterEstimates(fit, standardized = TRUE)

# Create a nice table
kable(estimates, format = "html", booktabs = TRUE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
lhs op rhs est se z pvalue ci.lower ci.upper std.lv std.all
subscale1 =~ HLQ1.2 1.0000000 0.0000000 NA NA 1.0000000 1.0000000 0.5629426 0.8560607
subscale1 =~ HLQ1.8 1.0077205 0.0403007 25.005031 0 0.9287325 1.0867084 0.5672888 0.8942465
subscale1 =~ HLQ1.17 0.5109890 0.0424015 12.051187 0 0.4278835 0.5940945 0.2876575 0.4856309
subscale1 =~ HLQ1.22 0.8492850 0.0393102 21.604683 0 0.7722384 0.9263316 0.4780987 0.7713611
HLQ1.2 ~~ HLQ1.2 0.1155289 0.0108463 10.651441 0 0.0942705 0.1367873 0.1155289 0.2671601
HLQ1.8 ~~ HLQ1.8 0.0806168 0.0097828 8.240689 0 0.0614429 0.0997906 0.0806168 0.2003233
HLQ1.17 ~~ HLQ1.17 0.2681171 0.0160711 16.683177 0 0.2366183 0.2996159 0.2681171 0.7641626
HLQ1.22 ~~ HLQ1.22 0.1555883 0.0110488 14.081949 0 0.1339331 0.1772435 0.1555883 0.4050020
subscale1 ~~ subscale1 0.3169044 0.0255341 12.411013 0 0.2668584 0.3669504 1.0000000 1.0000000

Detailed Explanation of Results

Model Test User Model:

  • Chi-Square Test Statistic: 7.778

  • Degrees of Freedom (df): 2

  • P-value (Chi-square): 0.020

This indicates a significant chi-square test, suggesting that the model does not perfectly fit the data. However, chi-square tests are sensitive to sample size.

Model Test Baseline Model:

  • Chi-Square Test Statistic: 1116.539

  • Degrees of Freedom (df): 6

  • P-value: 0.000

The baseline model, which assumes no relationships among the variables, has a very poor fit (high chi-square value and low p-value).

User Model versus Baseline Model:

  • Comparative Fit Index (CFI): 0.995

  • Tucker-Lewis Index (TLI): 0.984

Both CFI and TLI are above the commonly accepted threshold of 0.90, indicating a good fit.

Loglikelihood and Information Criteria:

  • Loglikelihood user model (H0): -1725.295

  • Loglikelihood unrestricted model (H1): NA (not applicable or not provided)

  • Akaike Information Criterion (AIC): 3466.591

  • Bayesian Information Criterion (BIC): 3501.766

  • Sample-size adjusted BIC (SABIC): 3476.369

Lower AIC, BIC, and SABIC values indicate a better fit relative to other models, but since we only have one model here, these values help in future model comparisons.

Root Mean Square Error of Approximation (RMSEA):

  • RMSEA: 0.069

  • 90% Confidence Interval (Lower): 0.023

  • 90% Confidence Interval (Upper): 0.124

  • P-value H_0: RMSEA <= 0.050: 0.205

  • P-value H_0: RMSEA >= 0.080: 0.434

The RMSEA value of 0.069 is below the threshold of 0.08, indicating an acceptable fit. The confidence interval is relatively narrow, and the p-values indicate that RMSEA is not significantly greater than 0.08.

Standardized Root Mean Square Residual (SRMR):

  • SRMR: 0.018

An SRMR value below 0.08 indicates a good fit, and 0.018 suggests an excellent fit.

Parameter Estimates:

Latent Variables Loadings (subscale1 =~):

  1. HLQ1.2:

    • Estimate: 1.000 (fixed for identification)

    • Standardized Loading (Std.lv): 0.563

    • Standardized Loading (Std.all): 0.856

  2. HLQ1.8:

    • Estimate: 1.008

    • Standard Error (Std.Err): 0.040

    • z-value: 25.005 (significant)

    • P-value: 0.000

    • Standardized Loading (Std.lv): 0.567

    • Standardized Loading (Std.all): 0.894

  3. HLQ1.17:

    • Estimate: 0.511

    • Standard Error (Std.Err): 0.042

    • z-value: 12.051 (significant)

    • P-value: 0.000

    • Standardized Loading (Std.lv): 0.288

    • Standardized Loading (Std.all): 0.486

  4. HLQ1.22:

    • Estimate: 0.849

    • Standard Error (Std.Err): 0.039

    • z-value: 21.605 (significant)

    • P-value: 0.000

    • Standardized Loading (Std.lv): 0.478

    • Standardized Loading (Std.all): 0.771

Variances:

  1. HLQ1.2:

    • Estimate: 0.116

    • Standard Error (Std.Err): 0.011

    • z-value: 10.651 (significant)

    • P-value: 0.000

    • Standardized Variance (Std.lv): 0.116

    • Standardized Variance (Std.all): 0.267

  2. HLQ1.8:

    • Estimate: 0.081

    • Standard Error (Std.Err): 0.010

    • z-value: 8.241 (significant)

    • P-value: 0.000

    • Standardized Variance (Std.lv): 0.081

    • Standardized Variance (Std.all): 0.200

  3. HLQ1.17:

    • Estimate: 0.268

    • Standard Error (Std.Err): 0.016

    • z-value: 16.683 (significant)

    • P-value: 0.000

    • Standardized Variance (Std.lv): 0.268

    • Standardized Variance (Std.all): 0.764

  4. HLQ1.22:

    • Estimate: 0.156

    • Standard Error (Std.Err): 0.011

    • z-value: 14.082 (significant)

    • P-value: 0.000

    • Standardized Variance (Std.lv): 0.156

    • Standardized Variance (Std.all): 0.405

  5. subscale1:

    • Estimate: 0.317

    • Standard Error (Std.Err): 0.026

    • z-value: 12.411 (significant)

    • P-value: 0.000

    • Standardized Variance (Std.lv): 1.000 (fixed for identification)

    • Standardized Variance (Std.all): 1.000 (fixed for identification)

Summary:

  • Model Fit: The overall model fit indices (CFI, TLI, RMSEA, and SRMR) indicate a good fit between the model and the data.

  • Factor Loadings: The loadings of the observed variables on the latent variable “subscale1” are all significant, indicating that each observed variable is a good indicator of the latent construct.

  • Variances: The variances of the observed variables are significant, indicating variability in the measures.

  • Standardized Estimates: These provide a way to compare the relative strengths of the relationships in a standardized form.

In conclusion, the CFA results suggest that the specified model has a good fit to the data, with significant factor loadings and acceptable fit indices, supporting the construct validity of the latent variable “subscale1”.