Confirmatory Factor Analysis using lavaan in R

Author

Long Bui

Introduction

Confirmatory Factor Analysis (CFA) is a statistical technique used to test the hypothesis that the relationships between observed variables and their underlying latent constructs are consistent with the researcher’s theoretical understanding.

In this tutorial, we will perform a CFA using the lavaan package in R.

Step by Step Guide:

Step 1: Install and Load the lavaan Package

First, we need to install and load the lavaan package. This package is designed for latent variable analysis, including CFA.

# Set up the environment and install lavaan if necessary
if (!require("lavaan")) {
  install.packages("lavaan", dependencies = TRUE)
}
library(lavaan)

Step 2: Load data

library(haven) # for loadind sav file
data <- read_sav("data.sav") # replace with your actual path

Step 3: Define the CFA model

model <- '
  subscale1 =~ HLQ1.2 + HLQ1.8 + HLQ1.17 + HLQ1.22
'

Step 4: Fit the CFA model

# Fit the CFA model
fit <- sem(model, data = data)

Step 5: Summarised the Results

Finally, we summarize the results of the CFA model fit. This includes fit measures and standardized estimates. We will also create a nicely formatted results table using the kableExtra package.

# Load necessary packages for better table presentation
if (!require("kableExtra")) {
  install.packages("kableExtra", dependencies = TRUE)
}
library(kableExtra)

# Summarize the fit of the model
fit_summary <- summary(fit, fit.measures = TRUE, standardized = TRUE)

# Extract relevant parts for a nice results table
estimates <- parameterEstimates(fit, standardized = TRUE)

# Create a nice table
kable(estimates, format = "html", booktabs = TRUE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

lhs	op	rhs	est	se	z	pvalue	ci.lower	ci.upper	std.lv	std.all
subscale1	=~	HLQ1.2	1.0000000	0.0000000	NA	NA	1.0000000	1.0000000	0.5629426	0.8560607
subscale1	=~	HLQ1.8	1.0077205	0.0403007	25.005031	0	0.9287325	1.0867084	0.5672888	0.8942465
subscale1	=~	HLQ1.17	0.5109890	0.0424015	12.051187	0	0.4278835	0.5940945	0.2876575	0.4856309
subscale1	=~	HLQ1.22	0.8492850	0.0393102	21.604683	0	0.7722384	0.9263316	0.4780987	0.7713611
HLQ1.2	~~	HLQ1.2	0.1155289	0.0108463	10.651441	0	0.0942705	0.1367873	0.1155289	0.2671601
HLQ1.8	~~	HLQ1.8	0.0806168	0.0097828	8.240689	0	0.0614429	0.0997906	0.0806168	0.2003233
HLQ1.17	~~	HLQ1.17	0.2681171	0.0160711	16.683177	0	0.2366183	0.2996159	0.2681171	0.7641626
HLQ1.22	~~	HLQ1.22	0.1555883	0.0110488	14.081949	0	0.1339331	0.1772435	0.1555883	0.4050020
subscale1	~~	subscale1	0.3169044	0.0255341	12.411013	0	0.2668584	0.3669504	1.0000000	1.0000000

Detailed Explanation of Results

Model Test User Model:

Chi-Square Test Statistic: 7.778
Degrees of Freedom (df): 2
P-value (Chi-square): 0.020

This indicates a significant chi-square test, suggesting that the model does not perfectly fit the data. However, chi-square tests are sensitive to sample size.

Model Test Baseline Model:

Chi-Square Test Statistic: 1116.539
Degrees of Freedom (df): 6
P-value: 0.000

The baseline model, which assumes no relationships among the variables, has a very poor fit (high chi-square value and low p-value).

User Model versus Baseline Model:

Comparative Fit Index (CFI): 0.995
Tucker-Lewis Index (TLI): 0.984

Both CFI and TLI are above the commonly accepted threshold of 0.90, indicating a good fit.

Loglikelihood and Information Criteria:

Loglikelihood user model (H0): -1725.295
Loglikelihood unrestricted model (H1): NA (not applicable or not provided)
Akaike Information Criterion (AIC): 3466.591
Bayesian Information Criterion (BIC): 3501.766
Sample-size adjusted BIC (SABIC): 3476.369

Lower AIC, BIC, and SABIC values indicate a better fit relative to other models, but since we only have one model here, these values help in future model comparisons.

Root Mean Square Error of Approximation (RMSEA):

RMSEA: 0.069
90% Confidence Interval (Lower): 0.023
90% Confidence Interval (Upper): 0.124
P-value H_0: RMSEA <= 0.050: 0.205
P-value H_0: RMSEA >= 0.080: 0.434

The RMSEA value of 0.069 is below the threshold of 0.08, indicating an acceptable fit. The confidence interval is relatively narrow, and the p-values indicate that RMSEA is not significantly greater than 0.08.

Standardized Root Mean Square Residual (SRMR):

SRMR: 0.018

An SRMR value below 0.08 indicates a good fit, and 0.018 suggests an excellent fit.

Parameter Estimates:

Latent Variables Loadings (subscale1 =~):

HLQ1.2:
- Estimate: 1.000 (fixed for identification)
- Standardized Loading (Std.lv): 0.563
- Standardized Loading (Std.all): 0.856
HLQ1.8:
- Estimate: 1.008
- Standard Error (Std.Err): 0.040
- z-value: 25.005 (significant)
- P-value: 0.000
- Standardized Loading (Std.lv): 0.567
- Standardized Loading (Std.all): 0.894
HLQ1.17:
- Estimate: 0.511
- Standard Error (Std.Err): 0.042
- z-value: 12.051 (significant)
- P-value: 0.000
- Standardized Loading (Std.lv): 0.288
- Standardized Loading (Std.all): 0.486
HLQ1.22:
- Estimate: 0.849
- Standard Error (Std.Err): 0.039
- z-value: 21.605 (significant)
- P-value: 0.000
- Standardized Loading (Std.lv): 0.478
- Standardized Loading (Std.all): 0.771

Variances:

HLQ1.2:
- Estimate: 0.116
- Standard Error (Std.Err): 0.011
- z-value: 10.651 (significant)
- P-value: 0.000
- Standardized Variance (Std.lv): 0.116
- Standardized Variance (Std.all): 0.267
HLQ1.8:
- Estimate: 0.081
- Standard Error (Std.Err): 0.010
- z-value: 8.241 (significant)
- P-value: 0.000
- Standardized Variance (Std.lv): 0.081
- Standardized Variance (Std.all): 0.200
HLQ1.17:
- Estimate: 0.268
- Standard Error (Std.Err): 0.016
- z-value: 16.683 (significant)
- P-value: 0.000
- Standardized Variance (Std.lv): 0.268
- Standardized Variance (Std.all): 0.764
HLQ1.22:
- Estimate: 0.156
- Standard Error (Std.Err): 0.011
- z-value: 14.082 (significant)
- P-value: 0.000
- Standardized Variance (Std.lv): 0.156
- Standardized Variance (Std.all): 0.405
subscale1:
- Estimate: 0.317
- Standard Error (Std.Err): 0.026
- z-value: 12.411 (significant)
- P-value: 0.000
- Standardized Variance (Std.lv): 1.000 (fixed for identification)
- Standardized Variance (Std.all): 1.000 (fixed for identification)

Summary:

Model Fit: The overall model fit indices (CFI, TLI, RMSEA, and SRMR) indicate a good fit between the model and the data.
Factor Loadings: The loadings of the observed variables on the latent variable “subscale1” are all significant, indicating that each observed variable is a good indicator of the latent construct.
Variances: The variances of the observed variables are significant, indicating variability in the measures.
Standardized Estimates: These provide a way to compare the relative strengths of the relationships in a standardized form.

In conclusion, the CFA results suggest that the specified model has a good fit to the data, with significant factor loadings and acceptable fit indices, supporting the construct validity of the latent variable “subscale1”.