# Set up the environment and install lavaan if necessary
if (!require("lavaan")) {
install.packages("lavaan", dependencies = TRUE)
}
library(lavaan)Confirmatory Factor Analysis using lavaan in R
Introduction
Confirmatory Factor Analysis (CFA) is a statistical technique used to test the hypothesis that the relationships between observed variables and their underlying latent constructs are consistent with the researcher’s theoretical understanding.
In this tutorial, we will perform a CFA using the lavaan package in R.
Step by Step Guide:
Step 1: Install and Load the lavaan Package
First, we need to install and load the lavaan package. This package is designed for latent variable analysis, including CFA.
Step 2: Load data
library(haven) # for loadind sav file
data <- read_sav("data.sav") # replace with your actual pathStep 3: Define the CFA model
model <- '
subscale1 =~ HLQ1.2 + HLQ1.8 + HLQ1.17 + HLQ1.22
'Step 4: Fit the CFA model
# Fit the CFA model
fit <- sem(model, data = data)Step 5: Summarised the Results
Finally, we summarize the results of the CFA model fit. This includes fit measures and standardized estimates. We will also create a nicely formatted results table using the kableExtra package.
# Load necessary packages for better table presentation
if (!require("kableExtra")) {
install.packages("kableExtra", dependencies = TRUE)
}
library(kableExtra)# Summarize the fit of the model
fit_summary <- summary(fit, fit.measures = TRUE, standardized = TRUE)
# Extract relevant parts for a nice results table
estimates <- parameterEstimates(fit, standardized = TRUE)
# Create a nice table
kable(estimates, format = "html", booktabs = TRUE) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))| lhs | op | rhs | est | se | z | pvalue | ci.lower | ci.upper | std.lv | std.all |
|---|---|---|---|---|---|---|---|---|---|---|
| subscale1 | =~ | HLQ1.2 | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.5629426 | 0.8560607 |
| subscale1 | =~ | HLQ1.8 | 1.0077205 | 0.0403007 | 25.005031 | 0 | 0.9287325 | 1.0867084 | 0.5672888 | 0.8942465 |
| subscale1 | =~ | HLQ1.17 | 0.5109890 | 0.0424015 | 12.051187 | 0 | 0.4278835 | 0.5940945 | 0.2876575 | 0.4856309 |
| subscale1 | =~ | HLQ1.22 | 0.8492850 | 0.0393102 | 21.604683 | 0 | 0.7722384 | 0.9263316 | 0.4780987 | 0.7713611 |
| HLQ1.2 | ~~ | HLQ1.2 | 0.1155289 | 0.0108463 | 10.651441 | 0 | 0.0942705 | 0.1367873 | 0.1155289 | 0.2671601 |
| HLQ1.8 | ~~ | HLQ1.8 | 0.0806168 | 0.0097828 | 8.240689 | 0 | 0.0614429 | 0.0997906 | 0.0806168 | 0.2003233 |
| HLQ1.17 | ~~ | HLQ1.17 | 0.2681171 | 0.0160711 | 16.683177 | 0 | 0.2366183 | 0.2996159 | 0.2681171 | 0.7641626 |
| HLQ1.22 | ~~ | HLQ1.22 | 0.1555883 | 0.0110488 | 14.081949 | 0 | 0.1339331 | 0.1772435 | 0.1555883 | 0.4050020 |
| subscale1 | ~~ | subscale1 | 0.3169044 | 0.0255341 | 12.411013 | 0 | 0.2668584 | 0.3669504 | 1.0000000 | 1.0000000 |
Detailed Explanation of Results
Model Test User Model:
Chi-Square Test Statistic: 7.778
Degrees of Freedom (df): 2
P-value (Chi-square): 0.020
This indicates a significant chi-square test, suggesting that the model does not perfectly fit the data. However, chi-square tests are sensitive to sample size.
Model Test Baseline Model:
Chi-Square Test Statistic: 1116.539
Degrees of Freedom (df): 6
P-value: 0.000
The baseline model, which assumes no relationships among the variables, has a very poor fit (high chi-square value and low p-value).
User Model versus Baseline Model:
Comparative Fit Index (CFI): 0.995
Tucker-Lewis Index (TLI): 0.984
Both CFI and TLI are above the commonly accepted threshold of 0.90, indicating a good fit.
Loglikelihood and Information Criteria:
Loglikelihood user model (H0): -1725.295
Loglikelihood unrestricted model (H1): NA (not applicable or not provided)
Akaike Information Criterion (AIC): 3466.591
Bayesian Information Criterion (BIC): 3501.766
Sample-size adjusted BIC (SABIC): 3476.369
Lower AIC, BIC, and SABIC values indicate a better fit relative to other models, but since we only have one model here, these values help in future model comparisons.
Root Mean Square Error of Approximation (RMSEA):
RMSEA: 0.069
90% Confidence Interval (Lower): 0.023
90% Confidence Interval (Upper): 0.124
P-value H_0: RMSEA <= 0.050: 0.205
P-value H_0: RMSEA >= 0.080: 0.434
The RMSEA value of 0.069 is below the threshold of 0.08, indicating an acceptable fit. The confidence interval is relatively narrow, and the p-values indicate that RMSEA is not significantly greater than 0.08.
Standardized Root Mean Square Residual (SRMR):
- SRMR: 0.018
An SRMR value below 0.08 indicates a good fit, and 0.018 suggests an excellent fit.
Parameter Estimates:
Latent Variables Loadings (subscale1 =~):
HLQ1.2:
Estimate: 1.000 (fixed for identification)
Standardized Loading (Std.lv): 0.563
Standardized Loading (Std.all): 0.856
HLQ1.8:
Estimate: 1.008
Standard Error (Std.Err): 0.040
z-value: 25.005 (significant)
P-value: 0.000
Standardized Loading (Std.lv): 0.567
Standardized Loading (Std.all): 0.894
HLQ1.17:
Estimate: 0.511
Standard Error (Std.Err): 0.042
z-value: 12.051 (significant)
P-value: 0.000
Standardized Loading (Std.lv): 0.288
Standardized Loading (Std.all): 0.486
HLQ1.22:
Estimate: 0.849
Standard Error (Std.Err): 0.039
z-value: 21.605 (significant)
P-value: 0.000
Standardized Loading (Std.lv): 0.478
Standardized Loading (Std.all): 0.771
Variances:
HLQ1.2:
Estimate: 0.116
Standard Error (Std.Err): 0.011
z-value: 10.651 (significant)
P-value: 0.000
Standardized Variance (Std.lv): 0.116
Standardized Variance (Std.all): 0.267
HLQ1.8:
Estimate: 0.081
Standard Error (Std.Err): 0.010
z-value: 8.241 (significant)
P-value: 0.000
Standardized Variance (Std.lv): 0.081
Standardized Variance (Std.all): 0.200
HLQ1.17:
Estimate: 0.268
Standard Error (Std.Err): 0.016
z-value: 16.683 (significant)
P-value: 0.000
Standardized Variance (Std.lv): 0.268
Standardized Variance (Std.all): 0.764
HLQ1.22:
Estimate: 0.156
Standard Error (Std.Err): 0.011
z-value: 14.082 (significant)
P-value: 0.000
Standardized Variance (Std.lv): 0.156
Standardized Variance (Std.all): 0.405
subscale1:
Estimate: 0.317
Standard Error (Std.Err): 0.026
z-value: 12.411 (significant)
P-value: 0.000
Standardized Variance (Std.lv): 1.000 (fixed for identification)
Standardized Variance (Std.all): 1.000 (fixed for identification)
Summary:
Model Fit: The overall model fit indices (CFI, TLI, RMSEA, and SRMR) indicate a good fit between the model and the data.
Factor Loadings: The loadings of the observed variables on the latent variable “subscale1” are all significant, indicating that each observed variable is a good indicator of the latent construct.
Variances: The variances of the observed variables are significant, indicating variability in the measures.
Standardized Estimates: These provide a way to compare the relative strengths of the relationships in a standardized form.
In conclusion, the CFA results suggest that the specified model has a good fit to the data, with significant factor loadings and acceptable fit indices, supporting the construct validity of the latent variable “subscale1”.