BioMedical Data Analysis: Latent Variable Modeling (1)
Introduction
In order to get familiar with the general pipeline of conducting EFA/CFA, SEM and finite mixture model (LCA) SEM in R, we will be using one of the well-known Holzinger & Swineford (1939) data sets. The data set is included in the ‘lavaan’ package, so we can load it with the data() function once the package itself is installed. Once we cover all the basics using the current data set we will move on to a ‘real-world’ one where the factor structure and overall path model might not be as perfectly clear and defined. Here we will be covering most of the theory and detailed interpretation, while mostly focusing on data in our second part of this practice session.
We begin this practice session by loading (or installing, if necessary) all the packages that we will be using during this lesson.
# Latent Profile Analysis (LPA) / Mixture Modeling
library(tidyLPA) # latent profile analysis
library(tidySEM) # mixture SEM / profile visualization
# Data Import & Wrangling
library(foreign) # import SPSS/Stata/etc. files
library(tidyverse) # data wrangling ecosystem
library(dplyr) # data manipulation
library(fastDummies) # dummy variable creation
# Descriptive Statistics & EFA
library(psych) # reliability, EFA, psychometrics
library(moments) # skewness, kurtosis
library(nFactors) # factor retention decisions
# SEM / CFA / Path Modeling
library(lavaan) # SEM/CFA/path analysis
library(semTools) # additional SEM utilities
library(lavaanPlot) # SEM visualization
library(semPlot) # SEM path diagrams
library(performance) # model performance metrics
# Network & Correlation Visualization
library(qgraph) # network/path visualization
library(corrplot) # correlation matrix plots
library(GGally) # pairplots and correlation visualization
# General Visualization
library(ggplot2) # plotting
library(ggpubr) # publication-ready ggplot tools
# Reporting & Table Formatting
library(knitr) # report/table generation
library(kableExtra) # styled tables
library(formattable) # formatted tablesDataset and preliminary EDA
Once the required libraries are all set, we proceed to load the aforementioned data set directly from the ‘lavaan’ library we just installed without needing to download any additional files.
data(HolzingerSwineford1939) # Loading the dataset
data <- HolzingerSwineford1939 # This is the name we will be using
summary(HolzingerSwineford1939) # Getting the general summary ## id sex ageyr agemo
## Min. : 1.0 Min. :1.000 Min. :11 Min. : 0.000
## 1st Qu.: 82.0 1st Qu.:1.000 1st Qu.:12 1st Qu.: 2.000
## Median :163.0 Median :2.000 Median :13 Median : 5.000
## Mean :176.6 Mean :1.515 Mean :13 Mean : 5.375
## 3rd Qu.:272.0 3rd Qu.:2.000 3rd Qu.:14 3rd Qu.: 8.000
## Max. :351.0 Max. :2.000 Max. :16 Max. :11.000
##
## school grade x1 x2
## Grant-White:145 Min. :7.000 Min. :0.6667 Min. :2.250
## Pasteur :156 1st Qu.:7.000 1st Qu.:4.1667 1st Qu.:5.250
## Median :7.000 Median :5.0000 Median :6.000
## Mean :7.477 Mean :4.9358 Mean :6.088
## 3rd Qu.:8.000 3rd Qu.:5.6667 3rd Qu.:6.750
## Max. :8.000 Max. :8.5000 Max. :9.250
## NA's :1
## x3 x4 x5 x6
## Min. :0.250 Min. :0.000 Min. :1.000 Min. :0.1429
## 1st Qu.:1.375 1st Qu.:2.333 1st Qu.:3.500 1st Qu.:1.4286
## Median :2.125 Median :3.000 Median :4.500 Median :2.0000
## Mean :2.250 Mean :3.061 Mean :4.341 Mean :2.1856
## 3rd Qu.:3.125 3rd Qu.:3.667 3rd Qu.:5.250 3rd Qu.:2.7143
## Max. :4.500 Max. :6.333 Max. :7.000 Max. :6.1429
##
## x7 x8 x9
## Min. :1.304 Min. : 3.050 Min. :2.778
## 1st Qu.:3.478 1st Qu.: 4.850 1st Qu.:4.750
## Median :4.087 Median : 5.500 Median :5.417
## Mean :4.186 Mean : 5.527 Mean :5.374
## 3rd Qu.:4.913 3rd Qu.: 6.100 3rd Qu.:6.083
## Max. :7.435 Max. :10.000 Max. :9.250
##
dim(data) # Taking a look at the data structure## [1] 301 15
As it can be seen, the data set only contains 301 observations and 15 variables. Our variables of interest are those from x1 to x9 (those we will be using for our factor analysis specifically), while sex, ageyr, agemo, school and grade can still be useful for later stages of the analysis.
head(data, n = 10) # Let's take a look at the first 10 rows of the dataset## id sex ageyr agemo school grade x1 x2 x3 x4 x5 x6
## 1 1 1 13 1 Pasteur 7 3.333333 7.75 0.375 2.333333 5.75 1.2857143
## 2 2 2 13 7 Pasteur 7 5.333333 5.25 2.125 1.666667 3.00 1.2857143
## 3 3 2 13 1 Pasteur 7 4.500000 5.25 1.875 1.000000 1.75 0.4285714
## 4 4 1 13 2 Pasteur 7 5.333333 7.75 3.000 2.666667 4.50 2.4285714
## 5 5 2 12 2 Pasteur 7 4.833333 4.75 0.875 2.666667 4.00 2.5714286
## 6 6 2 14 1 Pasteur 7 5.333333 5.00 2.250 1.000000 3.00 0.8571429
## 7 7 1 12 1 Pasteur 7 2.833333 6.00 1.000 3.333333 6.00 2.8571429
## 8 8 2 12 2 Pasteur 7 5.666667 6.25 1.875 3.666667 4.25 1.2857143
## 9 9 2 13 0 Pasteur 7 4.500000 5.75 1.500 2.666667 5.75 2.7142857
## 10 11 2 12 5 Pasteur 7 3.500000 5.25 0.750 2.666667 5.00 2.5714286
## x7 x8 x9
## 1 3.391304 5.75 6.361111
## 2 3.782609 6.25 7.916667
## 3 3.260870 3.90 4.416667
## 4 3.000000 5.30 4.861111
## 5 3.695652 6.30 5.916667
## 6 4.347826 6.65 7.500000
## 7 4.695652 6.20 4.861111
## 8 3.391304 5.15 3.666667
## 9 4.521739 4.65 7.361111
## 10 4.130435 4.55 4.361111
str(data) # Another way to take a look inside of our data file## 'data.frame': 301 obs. of 15 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 11 ...
## $ sex : int 1 2 2 1 2 2 1 2 2 2 ...
## $ ageyr : int 13 13 13 13 12 14 12 12 13 12 ...
## $ agemo : int 1 7 1 2 2 1 1 2 0 5 ...
## $ school: Factor w/ 2 levels "Grant-White",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ grade : int 7 7 7 7 7 7 7 7 7 7 ...
## $ x1 : num 3.33 5.33 4.5 5.33 4.83 ...
## $ x2 : num 7.75 5.25 5.25 7.75 4.75 5 6 6.25 5.75 5.25 ...
## $ x3 : num 0.375 2.125 1.875 3 0.875 ...
## $ x4 : num 2.33 1.67 1 2.67 2.67 ...
## $ x5 : num 5.75 3 1.75 4.5 4 3 6 4.25 5.75 5 ...
## $ x6 : num 1.286 1.286 0.429 2.429 2.571 ...
## $ x7 : num 3.39 3.78 3.26 3 3.7 ...
## $ x8 : num 5.75 6.25 3.9 5.3 6.3 6.65 6.2 5.15 4.65 4.55 ...
## $ x9 : num 6.36 7.92 4.42 4.86 5.92 ...
Sex is clearly coded as binary (1 and 2), similarly to the ‘school’ variable. Our variables of interest to be used in the factor analysis are all numeric with different ranges (as could be observed above). Technically there is no way for us to know if the actual scale for them all was from 0 to a 10, but it does seem like it is a possibility given the minimal and maximal values obtained. Judging by the fact that we do have a dedicated variable to grade and school, it can be assumed that we are dealing with data about students. We will intentionally keep the nature of mysterious x1-x9 variables a secret for now so that we have no existent knowledge prior to EFA - let the data speak for itself for now!
colSums(is.na(select(data, x1:x9))) # Checking for NAs in our columns of interest## x1 x2 x3 x4 x5 x6 x7 x8 x9
## 0 0 0 0 0 0 0 0 0
Checking factorability of the data
Even with no prior knowledge about our data, we can and need to check if it can even be used for EFA to be carried out.
Significancy of the multiple correlations - can be checked by using Bartlett’s test of sphericity. According to the results, multiple correlations among the variables are significant.
Sampling adequacy - can be checked by using Kaiser-Meyer-Olkin test. According to the results, sample size is enough for factor analytic solutions.
check_factorstructure(select(data, x1:x9))## # Is the data suitable for Factor Analysis?
##
##
## - Sphericity: Bartlett's test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis (Chisq(36) = 904.10, p < .001).
## - KMO: The Kaiser, Meyer, Olkin (KMO) overall measure of sampling adequacy suggests that data seems appropriate for factor analysis (KMO = 0.75). The individual KMO scores are: x1 (0.81), x2 (0.78), x3 (0.73), x4 (0.76), x5 (0.74), x6 (0.81), x7 (0.59), x8 (0.68), x9 (0.79).
- Bartlett’s test evaluates whether the correlation matrix significantly differs from an identity matrix, where an identity matrix would mean:
- variables are essentially uncorrelated,
- each variable only correlates with itself,
- and there is no underlying latent structure to extract.
In our case, the test is significant (p<.001). Therefore, the null hypothesis that the correlation matrix is an identity matrix is rejected - that is what we are looking for. The observed variables appear to contain common underlying dimensions (latent factors), rather than representing completely independent constructs.
- The KMO statistic assesses whether the patterns of correlations are compact enough for factor analysis to produce reliable factors. It compares: observed correlations vs. partial correlations. Good factor structures typically show:
- high correlations,
- low partial correlations.
In our case, a KMO of 0.75 indicates good/moderately strong sampling adequacy.
The general threshold for KMO value interpretation would be:
- < 0.50 - Unacceptable
- 0.50–0.59 - Poor
- 0.60–0.69 - Mediocre
- 0.70–0.79 - Good
- 0.80–0.89 - Great
- ≥ 0.90 - Excellent
We can also see the values returned for each variable. In particular, x7 and x8 are the ones who have a comparatively lower score, the rest seem to be doing better.
# Kaiser-Meyer-Olkin factor adequacy test carried out separately
# KMO(select(data, x1:x9))
# cortest.bartlett(select(data, x1:x9), n = 301)Variable distributions
Correlation Plot
Since we now know our variables are correlated and there is some kind of an underlying latent structure behind it all, we can build a correlation plot and take a look at specific groups of highly correlated variables.
# Correlation matrix
corrplot(
cor(select(data, x1:x9)),
method = "color",
addCoef.col = "darkgrey",
number.cex = 0.7
)We can clearly see some groups emerge. Specifically, the most highly correlated group of factors is the one in the center: x4, x5, and x6. Apart from those, the first three and the last three variables also form a correlation group.
Distributions for each variable
We proceed to take a look at the distribution of our variables of interest.
# Reshape data
plot_data <- data %>%
select(x1:x9) %>%
pivot_longer(
cols = everything(),
names_to = "Variable",
values_to = "Value"
)
# Summary stats for each variable
stats <- plot_data %>%
group_by(Variable) %>%
summarise(
mean = mean(Value, na.rm = TRUE),
median = median(Value, na.rm = TRUE)
)
ggplot(plot_data, aes(x = Value)) +
# Histogram
geom_histogram(
aes(y = after_stat(density)),
bins = 25,
fill = "seagreen4",
color = "white",
alpha = 0.9
) +
# Density curve
geom_density(
color = "darkgreen",
linewidth = 1.2
) +
# Mean line
geom_vline(
data = stats,
aes(xintercept = mean),
color = "#E63946",
linetype = "solid",
linewidth = 1
) +
# Median line
geom_vline(
data = stats,
aes(xintercept = median),
color = "yellow",
linetype = "solid",
linewidth = 1
) +
facet_wrap(~Variable, scales = "free", ncol = 3) +
labs(
title = "Distribution of Holzinger-Swineford Indicators",
subtitle = "Red line = Mean | Yellow line = Median",
x = NULL,
y = "Density"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(
face = "bold",
size = 20,
hjust = 0.5
),
plot.subtitle = element_text(
size = 11,
color = "gray40",
hjust = 0.5
),
strip.text = element_text(
face = "bold",
size = 12
),
panel.grid.minor = element_blank(),
panel.grid.major = element_line(
color = "gray90",
linewidth = 0.3
)
)We can take a look at all the distributions at a single glance and compare them to each other, also assessing normality or how skewed these are. For instance, for x1: the distribution appears approximately normal and symmetric, bell-shaped centered around ~5, with mean, median, and mode closely aligned, and no major skewness or outliers visible. On the other hand, distribution of variable x3 doesn’t have that central peak and is rather equally spread out along the range of possible values. As for the distribution of x6, it is clearly skewed with a lot more values falling on the left side of the range with concentration around approximately ~2.
Checking Normality, Skewness & kurtosis
What we are truly interested here is normality of our distributions. We check normality before conducting Exploratory Factor Analysis (EFA) because many factor extraction methods and statistical tests used in EFA assume that variables are approximately normally distributed. For instance, even with correlations that factor analysis relies on, if variables are strongly non-normal, correlations can become distorted, standard errors may become inaccurate, factor loadings may be biased, significance tests may become unreliable, and final model fit statistics can be affected.
As for skewness and kurtosis, we want to check the actual numbers instead of just trying to tell by the distribution plots, since these are the ones that help evaluate whether variables deviate from normality.
# Shapiro-Wilk Test
options(scipen = 999)
result <- sapply(data[, paste0("x", 1:9)], function(x) {
shapiro.test(x)$p.value
})
data.frame(
Variable = names(result),
Test = "Shapiro-Wilk",
p_value = result
)## Variable Test p_value
## x1 x1 Shapiro-Wilk 0.15820085214896
## x2 x2 Shapiro-Wilk 0.00000584813069
## x3 x3 Shapiro-Wilk 0.00000002486883
## x4 x4 Shapiro-Wilk 0.00105113053486
## x5 x5 Shapiro-Wilk 0.00008864986799
## x6 x6 Shapiro-Wilk 0.00000003830633
## x7 x7 Shapiro-Wilk 0.05604021089974
## x8 x8 Shapiro-Wilk 0.00044776137026
## x9 x9 Shapiro-Wilk 0.30696527363525
Interpretation:
- p > .05 → normality not rejected
- p < .05 → non-normal
And as we can see here, x1, x7 and x9 are the only ones considered normal according to the test (normality not rejected). What we need to keep in mind here, however, is that in practice people usually combine:
- skew/kurtosis
- Q-Q plots
- sample size considerations
It is not recommended to rely solely only on p-values partially because when it comes to large samples, Shapiro-Wilk almost always becomes significant even for small deviations from normality. With N=301, Shapiro–Wilk is still fairly sensitive, though not as extreme as with very large samples (e.g., thousands). So significant p-values do indicate some deviation from normality, but that does not automatically mean that the analysis is invalid. What matters more is how severe the non-normality is. For SEM specifically, skewness and kurtosis are often more important than strict Shapiro tests.
# Check Normality, Skewness & Kurtosis
result <- data %>%
summarise(across(
x1:x9,
list(
skew = ~moments::skewness(., na.rm = TRUE),
kurtosis = ~moments::kurtosis(., na.rm = TRUE)
)
)) %>%
pivot_longer(
cols = everything(),
names_to = c("Variable", ".value"),
names_sep = "_"
)
print(result)## # A tibble: 9 × 3
## Variable skew kurtosis
## <chr> <dbl> <dbl>
## 1 x1 -0.256 3.33
## 2 x2 0.472 3.35
## 3 x3 0.385 2.11
## 4 x4 0.269 3.10
## 5 x5 -0.352 2.46
## 6 x6 0.862 3.84
## 7 x7 0.250 2.71
## 8 x8 0.528 4.20
## 9 x9 0.205 3.31
Rules of thumb used:
- |skew| < 2 acceptable
- |kurtosis| < 7 acceptable
Variable-by-Variable Interpretation:
- x1 - Slight negative skew (-0.256), meaning responses lean somewhat toward higher values. Kurtosis (3.33) is close to normal, indicating near-normal tail behavior.
- x2 - Mild positive skew (0.472), suggesting somewhat more low-to-middle scores with a right tail. Kurtosis (3.35) indicates slightly heavier tails than normal.
- x3 - Mild positive skew (0.385). Kurtosis (2.11) is noticeably below 3, indicating a flatter-than-normal distribution with lighter tails.
- x4 - Very small positive skew (0.269). Kurtosis (3.10) is extremely close to normal. Distribution appears approximately normal.
- x5 - Slight negative skew (-0.352), indicating somewhat higher responses overall. Kurtosis (2.46) suggests a mildly flat distribution.
- x6 - Largest skewness (0.862), indicating moderate positive skewness. Responses cluster toward lower values with a longer right tail. Kurtosis (3.84) suggests heavier tails and more extreme values than normal.
- x7 - Very mild positive skew (0.250). Kurtosis (2.71) is slightly flatter than normal. No major concerns.
- x8 - Moderate positive skew (0.528). Kurtosis (4.20) is the highest among variables, indicating a somewhat peaked distribution with heavier tails and potentially more extreme observations.
- x9 - Very mild positive skew (0.205). Kurtosis (3.31) is close to normal. Distribution appears acceptable.
Overall, all skewness values fall within: −1 < skewness < 1
This generally indicates:
- acceptable univariate normality,
- no severe asymmetry,
- and no variables with problematic skew.
Still we can note that the most skewed variables are:
- x6 (0.862)
- x8 (0.528)
- x2 (0.472)
But even these, luckily, remain within commonly accepted thresholds. Now moving on to the Q-Q plot.
# Q-Q Plot
par(mfrow = c(3,3))
for(i in 1:9){
qqnorm(data[[paste0("x", i)]],
main = paste0("x", i))
qqline(data[[paste0("x", i)]])
}The general rule of thumb here is that if points roughly follow the line = approximately normal, which luckily seems to be the case here.
Thuss, overall, after looking at Q-Q plots and skew and kurtosis results:
- Skewness values are all below 1 → very acceptable
- Kurtosis values are around normal-range levels
- None appear severely non-normal
So despite several significant Shapiro tests, our data does not appear seriously non-normal.
Once again, this is extremely common:
- Shapiro detects even small deviations
- Practical SEM assumptions are usually judged using skew/kurtosis and robustness of estimators
For our case: data looks fine for further analysis!
EFA
Choosing the number of factors
Now finally moving onto carrying out EFA! We already observed that there are three groups of highly correlated variables. Naturally, one can assume we will end up having three factors, but it is still better to check the optimal number of factors using statistical methods.
First we turn to the easiest of the rules to follow when it comes to determining the number of factors - Kaiser Criterion. According to the Kaiser Criterion, factors with eigenvalues greater than 1 are retained, as such factors explain more variance than a single observed variable. Factors with eigenvalues below 1 are considered insufficiently informative and are therefore excluded from the final factor solution.
As Kaiser said himself: “The number of factors problem is easy, I solve it everyday before breakfast. The problem is the right number”.
ev = eigen(cor(select(data, x1:x9)))
ev$value## [1] 3.2163442 1.6387132 1.3651593 0.6989185 0.5843475 0.4996872 0.4731021
## [8] 0.2860024 0.2377257
So, if we take a look at eigenvalues, it becomes clear that three factors are the suggested architecture for our data.
Now let’s try to determine the number of factors using the most widely used parallel analysis - sreeplot.
fa.parallel(select(data, x1:x9), fa = "fa", fm = "ml")## Parallel analysis suggests that the number of factors = 3 and the number of components = NA
This indicates that we have 3 factors which are above the level of random noise. 3 seems like a good number to go with!
We can actually make a table of all the possible methods of determining the number of factors for given data and identify which number of factors was suggested by the majority.
n <- parameters::n_factors(select(data, x1:x9))
n## # Method Agreement Procedure:
##
## The choice of 3 dimensions is supported by 7 (36.84%) methods out of 19 (CNG, Optimal coordinates, Parallel analysis, Kaiser criterion, Scree (SE), VSS complexity 2, BIC).
# Checking the specific methods and suggestions
as.data.frame(n)## n_Factors Method Family
## 1 1 Acceleration factor Scree
## 2 1 Scree (R2) Scree_SE
## 3 2 Bentler Bentler
## 4 2 t Multiple_regression
## 5 2 p Multiple_regression
## 6 2 Velicer's MAP Velicers_MAP
## 7 3 CNG CNG
## 8 3 Optimal coordinates Scree
## 9 3 Parallel analysis Scree
## 10 3 Kaiser criterion Scree
## 11 3 Scree (SE) Scree_SE
## 12 3 VSS complexity 2 VSS
## 13 3 BIC BIC
## 14 4 beta Multiple_regression
## 15 4 BIC (adjusted) BIC
## 16 7 Bartlett Barlett
## 17 7 Anderson Barlett
## 18 7 Lawley Barlett
## 19 8 VSS complexity 1 VSS
Here we once again get a confirmation that the screeplot suggested a 3-factor solution, and check the suggestions made by the rest of the different methods. It’s always good to have a variety of ‘opinions’ on the number of factors to retain, but luckily our data this time is rather simple and clear in terms of the factor structure. That is precisely why we are using this data set for our demonstration.
summary(n)## n_Factors n_Methods Variance_Cumulative
## 1 1 2 0.3222717
## 2 2 4 0.4659134
## 3 3 7 0.5666210
## 4 4 2 0.5881727
## 5 7 3 0.6163591
## 6 8 1 0.6172336
plot(n) +
theme_minimal() +
scale_fill_manual(values = c("#1B5E20", "#81C784"))As can be clearly seen on the barchart - the majority of methods is recommending for us to build a three-factor solution.
To extract factors from our variables, we first need to answer the main methodological questions:
- Which extraction method should be used? (The most commonly used methods are Maximum Likelihood (ML) and Principal Axis Factoring (PAF), sometimes also referred to as principal factor extraction.)
- Which type of correlation should be used for the data? (Since all variables in our dataset are continuous, Pearson correlations are appropriate.)
- How many factors should be extracted? (In our case, we extract three factors.)
- Which type of rotation should be used? (This becomes relevant when more than one factor is extracted.)
Once again to revisit the information discussed during out theory class, rotation is applied to improve the interpretability of the factor solution and to increase factor loadings. Conceptually, rotation can be understood as a transformation of the factor axes in multidimensional space. The underlying mathematical idea is closely related to eigenvectors and the geometric representation of factors. Without rotation, factor solutions are often mathematically correct but difficult to interpret because variables may load moderately on several factors simultaneously. Rotation helps produce a “cleaner” and more interpretable structure.
There are three main approaches to rotation:
Orthogonal Rotation - used when we assume that the extracted factors will not correlate with one another. The most commonly used orthogonal rotation method is: Varimax. Orthogonal rotation is often preferred when: theoretical constructs are assumed to be unrelated, or when a simpler independent factor structure is desired.
Oblique Rotation - used when we assume that the extracted factors will correlate with one another. The most commonly used oblique rotation method is: Oblimin. This approach is often more realistic in the social sciences because psychological, behavioral, and social constructs are rarely completely independent.
In practice, rotation is almost always better because it makes factors interpretable. For psychology, social sciences, marketing, behavioral data, etc., oblique rotations are usually preferred, because latent constructs are rarely truly independent. In our case we will also apply ‘oblimin’ rotation from the oblique rotation family, allowing our factors to be correlated.
Building the EFA model
# Specifying the factor solution architecture
hs.efa3 <- fa(select(data, x1:x9), nfactors = 3, rotate = "oblimin", fm = "ml", cor = 'cor')
hs.efa3## Factor Analysis using method = ml
## Call: fa(r = select(data, x1:x9), nfactors = 3, rotate = "oblimin",
## fm = "ml", cor = "cor")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML1 ML3 ML2 h2 u2 com
## x1 0.19 0.60 0.03 0.49 0.51 1.2
## x2 0.04 0.51 -0.12 0.25 0.75 1.1
## x3 -0.07 0.69 0.02 0.46 0.54 1.0
## x4 0.84 0.02 0.01 0.72 0.28 1.0
## x5 0.89 -0.07 0.01 0.76 0.24 1.0
## x6 0.81 0.08 -0.01 0.69 0.31 1.0
## x7 0.04 -0.15 0.72 0.50 0.50 1.1
## x8 -0.03 0.10 0.70 0.53 0.47 1.0
## x9 0.03 0.37 0.46 0.46 0.54 1.9
##
## ML1 ML3 ML2
## SS loadings 2.24 1.34 1.28
## Proportion Var 0.25 0.15 0.14
## Cumulative Var 0.25 0.40 0.54
## Proportion Explained 0.46 0.28 0.26
## Cumulative Proportion 0.46 0.74 1.00
##
## With factor correlations of
## ML1 ML3 ML2
## ML1 1.00 0.33 0.22
## ML3 0.33 1.00 0.27
## ML2 0.22 0.27 1.00
##
## Mean item complexity = 1.2
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 36 with the objective function = 3.05 with Chi Square = 904.1
## df of the model are 12 and the objective function was 0.08
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic n.obs is 301 with the empirical chi square 8.03 with prob < 0.78
## The total n.obs was 301 with Likelihood Chi Square = 22.38 with prob < 0.034
##
## Tucker Lewis Index of factoring reliability = 0.964
## RMSEA index = 0.053 and the 90 % confidence intervals are 0.015 0.088
## BIC = -46.11
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## ML1 ML3 ML2
## Correlation of (regression) scores with factors 0.94 0.84 0.85
## Multiple R square of scores with factors 0.89 0.71 0.72
## Minimum correlation of possible factor scores 0.78 0.42 0.45
# Current factor solution diagram with rotation
fa.diagram(hs.efa3)# Alternative graphical representation
qgraph(
hs.efa3$loadings,
layout = "spring",
# Green palette
posCol = "#2E7D32",
negCol = "#81C784",
color = c("#A5D6A7", "#66BB6A", "#388E3C"),
labels = colnames(select(data, x1:x9)),
edge.labels = TRUE,
theme = "colorblind"
)Here on the last graph the advantage is that it also shows that in EFA we do allow all our variables to load on the available factors, instead of restricting it in the same way we do for CFA where only pre-selected variables are allowed to load on pre-determined factors in the way that we designed them to. Note that to build this second graph we are using ‘hs.efa3$loadings’, while correlations are stored separately, usually in ‘Phi’, which is why the correlation that was present on the previous graph is not visible here.
If we look at the description for our constructed factor solution, we can conclude the following: results suggest that the three-factor solution provides a generally good fit to the data and that the extracted factors are meaningful, interpretable, and statistically adequate for representing the observed variables.
- Mean item complexity = 1.2.
A value close to 1 indicates that variables mainly load on a single factor. Higher values suggest stronger cross-loadings across multiple factors. So our value of 1.2 suggests a relatively simple structure, with low cross-loading, and reasonably clean factor separation. This is desirable because it improves interpretability.
- Hypothesis Test: Are 3 Factors Sufficient?
The model tests whether three factors adequately reproduce the observed correlations. The null model assumes no correlations among variables. The extremely large chi-square indicates a strong relationships exist among variables, confirming the need for latent factors (904.1 in our case).
- RMSR (Residuals)
RMSR measures the average discrepancy between observed correlations and model-implied correlations.
Guidelines:
- < .05 → excellent,
- < .08 → acceptable.
In our case, RMSR=0.02, df-corrected RMSR=0.03. Thus meaning that our RMSR values are extremely low, indicating that the model reproduces the observed correlations very accurately, and residual errors are minimal. This is one of the strongest indicators of good fit in our results!
- Empirical Chi-Square: “Does the model reproduce the observed data well enough in practice?”
χ^2 = 8.03, p=0.78
This suggests no significant difference between observed and reproduced correlations, therefore the model fits very well empirically. A non-significant chi-square is generally desirable.
- Likelihood Chi-Square: “Is the model perfectly correct?”
χ^2 = 22.38, p=0.034
This test is statistically significant, which technically suggests imperfect fit.
However chi-square is highly sensitive to sample size, and with n=301, even small deviations may become significant. Therefore, researchers typically interpret chi-square together with RMSEA, TLI, RMSR, and theoretical interpretability. Given the other indices, this significant chi-square is not especially concerning.
- Tucker–Lewis Index (TLI)
TLI evaluates model fit relative to model complexity.
Guidelines:
.90 → good,
.95 → excellent.
In our case: TLI=0.964 which indicates an excellent factor solution quality, strong reliability of the latent structure, and very good model fit.
- RMSEA
RMSEA evaluates approximate model fit.
Guidelines:
- < .05 → very good
- .05–.08 → acceptable
- .10 → poor
In our case RMSEA = 0.053 which indicates good approximate fit, close to excellent. Additionally, the lower CI bound (.015) is extremely favorable. The upper bound (.088) slightly approaches the acceptable threshold, but overall the interval still supports reasonable fit.
- BIC
BIC is most useful when comparing competing models: lower BIC = preferred model. In our case BIC is currently −46.11, but we don’t yet have a model to compare this value with.
- Fit Based on Off-Diagonal Values
In our case: = 1
This indicates a near-perfect reproduction of correlations between variables, exceptionally strong correspondence between observed and modeled covariance structure.
This is an extremely strong fit indicator.
- Factor Score Adequacy
These statistics evaluate how accurately estimated factor scores represent the underlying latent factors.
Correlation of Factor Scores with Factors:
- ML1 - 0.94
- ML3 - 0.84
- ML2 - 0.85
Interpretation: factor scores strongly represent the latent variables, especially ML1, since values above .80 are generally considered very good.
Multiple R² of Scores with Factors:
- ML1 - 0.89
- ML3 - 0.71
- ML2 - 0.72
Interpretation: estimated scores explain substantial variance in the true latent factors, once again, especially for ML1.
ML1 appears to be the strongest and most stable factor!
Minimum Correlation of Possible Factor Scores:
- ML1 - 0.78
- ML3 - 0.42
- ML2 - 0.45
These values estimate the worst-case correlation between factor score estimates.
Interpretation:
- ML1 has highly stable factor scores,
- ML2 and ML3 are somewhat weaker and less stable, but still acceptable.
This suggests that Factor 1 is especially well-defined, while Factors 2 and 3 may contain somewhat more measurement uncertainty.
Reliability test
Now moving onto reliability tests for the three separate identified factors. Specifically, we will look at Cronbach’s alpha for each factor (ML3, ML1, ML2) following our EFA grouping. In other words, we will be testing the internal consistency reliability of the variables belonging to each extracted factor.
Cronbach’s alpha evaluates how consistently a set of variables measures the same underlying latent construct. Conceptually, it answers: “Do these items behave as if they belong to the same scale/factor?”. If variables strongly correlate with each other, alpha becomes larger.
The general threshold here is as follows:
- < .50 Poor
- .50–.59 Weak
- .60–.69 Questionable
- .70–.79 Acceptable
- .80–.89 Good
- ≥ .90 Excellent (sometimes too redundant)
ML3 = data %>% select(x1, x2, x3)
psych::alpha(ML3)##
## Reliability analysis
## Call: psych::alpha(x = ML3)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.63 0.63 0.54 0.36 1.7 0.037 4.4 0.88 0.34
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.55 0.63 0.69
## Duhachek 0.55 0.63 0.70
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## x1 0.51 0.51 0.34 0.34 1.03 0.057 NA 0.34
## x2 0.61 0.61 0.44 0.44 1.58 0.045 NA 0.44
## x3 0.46 0.46 0.30 0.30 0.85 0.062 NA 0.30
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## x1 301 0.77 0.77 0.58 0.45 4.9 1.2
## x2 301 0.73 0.72 0.47 0.37 6.1 1.2
## x3 301 0.78 0.78 0.62 0.48 2.3 1.1
ML1 = data %>% select(x4, x5, x6)
psych::alpha(ML1)##
## Reliability analysis
## Call: psych::alpha(x = ML1)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.88 0.88 0.84 0.72 7.7 0.011 3.2 1.1 0.72
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.86 0.88 0.90
## Duhachek 0.86 0.88 0.91
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## x4 0.83 0.84 0.72 0.72 5.1 0.019 NA 0.72
## x5 0.83 0.83 0.70 0.70 4.8 0.020 NA 0.70
## x6 0.84 0.85 0.73 0.73 5.5 0.018 NA 0.73
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## x4 301 0.90 0.90 0.82 0.78 3.1 1.2
## x5 301 0.92 0.91 0.84 0.79 4.3 1.3
## x6 301 0.89 0.90 0.81 0.77 2.2 1.1
ML2 = data %>% select(x7, x8, x9)
psych::alpha(ML2)##
## Reliability analysis
## Call: psych::alpha(x = ML2)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.69 0.69 0.6 0.43 2.2 0.031 5 0.81 0.45
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.62 0.69 0.74
## Duhachek 0.63 0.69 0.75
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## x7 0.62 0.62 0.45 0.45 1.6 0.044 NA 0.45
## x8 0.51 0.51 0.34 0.34 1.0 0.057 NA 0.34
## x9 0.65 0.65 0.49 0.49 1.9 0.040 NA 0.49
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## x7 301 0.79 0.78 0.59 0.49 4.2 1.1
## x8 301 0.82 0.82 0.69 0.57 5.5 1.0
## x9 301 0.75 0.76 0.55 0.46 5.4 1.0
In our case we get:
- ML3 - 0.63
- ML1 - 0.88
- Ml2 - 0.69
So it looks good for ML1 with 0.88, but the value for the remaining factors falls under the ‘Questionable’ category.
Our alternative can be McDonald’s Omega reliability coefficients.
Omega is considered a more modern and theoretically accurate reliability estimate than Cronbach’s alpha, especially for multidimensional scales and factor models. Omega estimates how much variance in observed scores is explained by: the general latent construct, and/or group-specific factors. Unlike alpha, omega: does not assume equal factor loadings, works better with factor analysis, and handles multidimensionality more appropriately.
Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British journal of psychology, 105(3), 399-412.
omega(select(data, x1:x9), rotate = "oblimin", fm = "ml", nfactors = 3, cor = "cor")## Omega
## Call: omegah(m = m, nfactors = nfactors, fm = fm, key = key, flip = flip,
## digits = digits, title = title, sl = sl, labels = labels,
## plot = plot, n.obs = n.obs, rotate = rotate, Phi = Phi, option = option,
## covar = covar, cor = "cor")
## Alpha: 0.76
## G.6: 0.81
## Omega Hierarchical: 0.45
## Omega H asymptotic: 0.53
## Omega Total 0.85
##
## Schmid Leiman Factor loadings greater than 0.2
## g F1* F2* F3* h2 u2 p2
## x1 0.49 0.46 0.49 0.51 0.50
## x2 0.30 0.39 0.25 0.75 0.35
## x3 0.41 0.53 0.46 0.54 0.38
## x4 0.45 0.72 0.72 0.28 0.28
## x5 0.41 0.76 0.76 0.24 0.23
## x6 0.46 0.69 0.69 0.31 0.30
## x7 0.23 0.65 0.50 0.50 0.11
## x8 0.35 0.64 0.53 0.47 0.23
## x9 0.45 0.28 0.42 0.46 0.54 0.44
##
## With Sums of squares of:
## g F1* F2* F3*
## 1.46 1.62 0.75 1.02
##
## general/max 0.9 max/min = 2.15
## mean percent general = 0.31 with sd = 0.12 and cv of 0.38
## Explained Common Variance of the general factor = 0.3
##
## The degrees of freedom are 12 and the fit is 0.08
## The number of observations was 301 with Chi Square = 22.38 with prob < 0.034
## The root mean square of the residuals is 0.02
## The df corrected root mean square of the residuals is 0.03
## RMSEA index = 0.053 and the 10 % confidence intervals are 0.015 0.088
## BIC = -46.11
##
## Compare this with the adequacy of just a general factor and no group factors
## The degrees of freedom for just the general factor are 27 and the fit is 1.75
## The number of observations was 301 with Chi Square = 517.19 with prob < 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000044
## The root mean square of the residuals is 0.2
## The df corrected root mean square of the residuals is 0.23
##
## RMSEA index = 0.246 and the 10 % confidence intervals are 0.228 0.265
## BIC = 363.1
##
## Measures of factor score adequacy
## g F1* F2* F3*
## Correlation of scores with factors 0.68 0.84 0.67 0.78
## Multiple R square of scores with factors 0.46 0.71 0.44 0.61
## Minimum correlation of factor score estimates -0.07 0.43 -0.11 0.22
##
## Total, General and Subset omega for each subset
## g F1* F2* F3*
## Omega total for total scores and subscales 0.85 0.89 0.65 0.72
## Omega general for total scores and subscales 0.45 0.24 0.28 0.19
## Omega group for total scores and subscales 0.35 0.65 0.37 0.53
Our omega results suggest that the overall scale is reliable, but the reliability primarily comes from multiple subfactors rather than from one strong general factor.
To be specific, our omega total is equal to 0.85 and indicates good overall reliability. Omega Total Interpretation Threshold
.70 Acceptable
.80 Good
.90 Excellent
Our omega hierarchical is equal to 0.45 showcasing that only about 45% of score variance can be attributed to one overarching latent construct, while the remaining reliable variance comes from group/subfactors. Omega Hierarchical estimates how much variance is explained specifically by a single general factor.
In other words, our scale is reliable overall, but it is not strongly unidimensional, meaning that the variables do not mainly measure one single latent trait, instead, reliability is substantially driven by three separate factors we identified earlier. This perfectly matches our EFA findings!
Given the fact that the gap between our omega total and omega hierarchical is 0.40, this means that about 40% of reliable variance comes from the group-specific factors rather than a single global construct. That is strong evidence of multidimensionality.
As for omega H asymptotic (0.53), it is a more theoretical estimate of hierarchical omega under ideal asymptotic conditions. The value can be interpreted as: even under ideal conditions, the general factor explains only about half of the common variance, suggesting that multidimensionality remains substantial. The fact that asymptotic omega is still relatively moderate, rather than very high, further supports the conclusion that: our scale contains several meaningful subdimensions rather than one dominant construct.
Thus, overall omega results strongly reinforce the structure that we established. The findings suggest: there may be some broad overarching construct, but the three subdomains are important and explain substantial unique variance. For example, if these variables measure different cognitive abilities: a general intelligence factor may exist, but its dimensions remain meaningfully distinct.
These results suggest that using separate subscale/factor scores is likely more appropriate, than relying only on one total combined score. That is because the total scale is reliable, but much of that reliability comes from distinct subfactors.
So we move onto the next part with the established model!
CFA
As was explained during the theory lecture, CFA is a way of testing how well measured variables represent a smaller number of constructs. The difference between confirmatory factor analysis (CFA) and exploratory factor analysis (EFA) is how these two method derived factors. Wile in EFA the factors are derived from statistical analysis based on loading and eigenvalue, CFA’s factors are derived from theoretical analysis. In this respect, in CFA researcher must assign and specify both the number of factors that exist within set of variables and which factor each variable will load highly on before results can be computed. CFA is a model-data fit test based on multivariate regression. Outputs are coefficients of paths and fit indices. If the paths are significant and indices indicate acceptable or high degree of fit, that means the structural model is confirmed by data.
We will be using the same data to run a CFA (our CFA model is much more restricted than the EFA model), but it’s important to note that it is not good practice to use a CFA to ‘confirm’ the findings of an EFA - it’s better to use an EFA to help determine the number of factors, but then to collect more data to test one or more competing models based on that general factorization.
Since we do need theory to back up our model, let’s finally take a look at what our data is all about and what the three factors that were identified could potentially be.
Building the theory-backed CFA model
The Holzinger and Swineford (1939) dataset that we were using consists of mental ability test scores of seventh- and eighth-grade children from two different schools (Pasteur and Grant-White) that we saw as two possible options for our binary variable. In the original dataset there are scores for a total of 26 tests. However, a smaller subset with 9 variables is more widely used.
Following the discovered structure, these were the variables that loaded on each of our factors.
Source: https://rdrr.io/cran/lavaan/man/HolzingerSwineford1939.html
ML3 factor:
x1 - Visual perception test from Spearman VPT Part I (geometric patterns)
x2 - Cubes, Simplification of Brighams Spatial Relations Test
x3 - Lozenges from Thorndike-Shapes flipped over then identify target
ML1 factor:
x4 - Paragraph Comprehension Test
x5 - Sentence Completion Test
x6 - Word Meaning Test
ML2 factor:
x7 - Speeded addition test
x8 - Speeded counting of dots in shape
x9 - Speeded discrimation of straight and curved caps
Now if we try to interpret our factors, it becomes clear that this three-factor architecture makes logical theoretical sense: ML3 seems to be assessing visual abilities, ML1 has to do with language, speech, text comprehension abilities, and ML2 is all about speed. We use the following names in constructing our new model:
# Model Specification
hs.model <-'visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9'
# Model Fit
hs.fit <- cfa(hs.model, data = data)Note that here we do not specify factor rotation - by CFA default behavior factors are already allowed to correlate directly. We will see that to be the case on our visualization.
summary(hs.fit, fit.measures = TRUE)## lavaan 0.6-19 ended normally after 35 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 21
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 85.306
## Degrees of freedom 24
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 918.852
## Degrees of freedom 36
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.931
## Tucker-Lewis Index (TLI) 0.896
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3737.745
## Loglikelihood unrestricted model (H1) -3695.092
##
## Akaike (AIC) 7517.490
## Bayesian (BIC) 7595.339
## Sample-size adjusted Bayesian (SABIC) 7528.739
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.092
## 90 Percent confidence interval - lower 0.071
## 90 Percent confidence interval - upper 0.114
## P-value H_0: RMSEA <= 0.050 0.001
## P-value H_0: RMSEA >= 0.080 0.840
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.065
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## visual =~
## x1 1.000
## x2 0.554 0.100 5.554 0.000
## x3 0.729 0.109 6.685 0.000
## textual =~
## x4 1.000
## x5 1.113 0.065 17.014 0.000
## x6 0.926 0.055 16.703 0.000
## speed =~
## x7 1.000
## x8 1.180 0.165 7.152 0.000
## x9 1.082 0.151 7.155 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## visual ~~
## textual 0.408 0.074 5.552 0.000
## speed 0.262 0.056 4.660 0.000
## textual ~~
## speed 0.173 0.049 3.518 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .x1 0.549 0.114 4.833 0.000
## .x2 1.134 0.102 11.146 0.000
## .x3 0.844 0.091 9.317 0.000
## .x4 0.371 0.048 7.779 0.000
## .x5 0.446 0.058 7.642 0.000
## .x6 0.356 0.043 8.277 0.000
## .x7 0.799 0.081 9.823 0.000
## .x8 0.488 0.074 6.573 0.000
## .x9 0.566 0.071 8.003 0.000
## visual 0.809 0.145 5.564 0.000
## textual 0.979 0.112 8.737 0.000
## speed 0.384 0.086 4.451 0.000
# alternative
# summary(fit_first, fit.measures = TRUE, standardized = TRUE, rsquare = T)fits <- fitMeasures(hs.fit)
data.frame(fits = round(fits, 2))## fits
## npar 21.00
## fmin 0.14
## chisq 85.31
## df 24.00
## pvalue 0.00
## baseline.chisq 918.85
## baseline.df 36.00
## baseline.pvalue 0.00
## cfi 0.93
## tli 0.90
## nnfi 0.90
## rfi 0.86
## nfi 0.91
## pnfi 0.60
## ifi 0.93
## rni 0.93
## logl -3737.74
## unrestricted.logl -3695.09
## aic 7517.49
## bic 7595.34
## ntotal 301.00
## bic2 7528.74
## rmsea 0.09
## rmsea.ci.lower 0.07
## rmsea.ci.upper 0.11
## rmsea.ci.level 0.90
## rmsea.pvalue 0.00
## rmsea.close.h0 0.05
## rmsea.notclose.pvalue 0.84
## rmsea.notclose.h0 0.08
## rmr 0.08
## rmr_nomean 0.08
## srmr 0.07
## srmr_bentler 0.07
## srmr_bentler_nomean 0.07
## crmr 0.07
## crmr_nomean 0.07
## srmr_mplus 0.07
## srmr_mplus_nomean 0.07
## cn_05 129.49
## cn_01 152.65
## gfi 0.94
## agfi 0.89
## pgfi 0.50
## mfi 0.90
## ecvi 0.42
Now if we were to interpret the results we got from our newly constructed CFA model summary:
- Model Chi-Square Test: χ²(24) = 85.31, p < .001
The chi-square test is statistically significant, suggesting that the model does not perfectly reproduce the observed covariance matrix. However, chi-square is sensitive to sample size, so additional fit indices should also be considered.
- Comparative Fit Index (CFI): CFI = 0.931
This indicates acceptable to good model fit, with the value approaching the recommended .95 threshold.
- Tucker–Lewis Index (TLI): TLI = 0.896
The TLI is slightly below the conventional .90 cutoff, suggesting that the model fit is marginally acceptable but could potentially be improved.
4.RMSEA: RMSEA = 0.092, 90% CI = [0.071, 0.114]
The RMSEA indicates mediocre to relatively poor approximate fit. The upper confidence interval exceeds .10, suggesting some model misfit may still be present.
- SRMR: SRMR = 0.065
The SRMR value indicates acceptable fit, as it remains below the recommended .08 threshold. Residual discrepancies between observed and reproduced correlations are reasonably small.
- Information Criteria
AIC = 7517.49 BIC = 7595.34 SABIC = 7528.74
These indices could be used later for comparing competing models. Lower values would indicate a preferable balance between model fit and complexity.
- Factor Loadings
All factor loadings are statistically significant (p < .001), indicating that all observed variables meaningfully contribute to their respective latent constructs.
- Visual factor loadings range from 0.554 to 0.729, indicating moderate relationships.
- Textual factor loadings range from 0.926 to 1.113, indicating very strong relationships and a particularly well-defined factor.
- Speed factor loadings range from 1.082 to 1.180, also indicating strong measurement quality.
- Latent Factor Correlations
All latent factor correlations are statistically significant:
Visual ↔︎ Textual = 0.408 Visual ↔︎ Speed = 0.262 Textual ↔︎ Speed = 0.173
This suggests that the latent constructs are related but still sufficiently distinct from one another. The strongest relationship exists between visual and textual abilities.
- Overall Interpretation
Overall, the three-factor CFA model demonstrates acceptable but not excellent fit. CFI and SRMR support reasonable model performance, while TLI and especially RMSEA suggest some remaining model misfit. Nevertheless, all latent constructs are clearly represented by significant factor loadings, and the factor structure appears theoretically meaningful and interpretable.
Overall, there is definitely some room for improvement! That is exactly what we plan to do later on.
Construct validity
One of the advantage of SEM/CFA is its ability to assess the construct validity of a proposed measurement theory. Construct validity is the extent to which a set of measurement items that reflects the theoretical latent construct which designed to measure.
Convergent validity : indicators of a specific construct should converge or share a high proportion of variance in common.
Discrimant Validity : the extent to which a construct is truly distinct from other constructs
Face Validity : assessment of the correspondence of the variable to be included in a summated scale and its conceptual definition
Nomological validity : examining whether the correlations among the constructs in a measurement theory make sense.
Standardized factor loadings are used to compute variance extracted and construct reliability in order to asses the convergent validity.
inspect(hs.fit, what = "std")## $lambda
## visual textul speed
## x1 0.772 0.000 0.000
## x2 0.424 0.000 0.000
## x3 0.581 0.000 0.000
## x4 0.000 0.852 0.000
## x5 0.000 0.855 0.000
## x6 0.000 0.838 0.000
## x7 0.000 0.000 0.570
## x8 0.000 0.000 0.723
## x9 0.000 0.000 0.665
##
## $theta
## x1 x2 x3 x4 x5 x6 x7 x8 x9
## x1 0.404
## x2 0.000 0.821
## x3 0.000 0.000 0.662
## x4 0.000 0.000 0.000 0.275
## x5 0.000 0.000 0.000 0.000 0.269
## x6 0.000 0.000 0.000 0.000 0.000 0.298
## x7 0.000 0.000 0.000 0.000 0.000 0.000 0.676
## x8 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.477
## x9 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.558
##
## $psi
## visual textul speed
## visual 1.000
## textual 0.459 1.000
## speed 0.471 0.283 1.000
Standardized Factor Loadings
The standardized loadings indicate how strongly each observed variable reflects its latent construct.
- Visual factor
x1 = 0.772 → strong loading x2 = 0.424 → relatively weak loading x3 = 0.581 → moderate loading
This suggests that x1 is the strongest indicator of the visual construct, while x2 contributes less strongly.
- Textual factor
x4 = 0.852 x5 = 0.855 x6 = 0.838
All textual indicators show very strong loadings, indicating a highly reliable and well-defined latent factor.
- Speed factor
x7 = 0.570 → moderate loading x8 = 0.723 → strong loading x9 = 0.665 → moderately strong loading
The speed factor is reasonably well measured, with x8 serving as the strongest indicator.
Residual Variances
Residual variances represent the portion of variance in each observed variable not explained by the latent factor.
- x2 has a particularly high residual variance (0.821), suggesting that much of its variance is unexplained by the visual factor.
- x4–x6 have relatively low residual variances (0.269–0.298), indicating that the textual factor explains these indicators very well.
- x7–x9 show moderate residual variances, suggesting acceptable but less perfect measurement quality compared to the textual factor.
Overall, the textual construct appears to be measured most accurately.
Latent Factor Correlations
Visual ↔︎ Textual = 0.459 Visual ↔︎ Speed = 0.471 Textual ↔︎ Speed = 0.283
These correlations indicate that the latent constructs are positively related but still distinct from one another. The strongest relationship exists between visual and speed abilities.
# Easy graph example
lavaanPlot(model = hs.fit,
stand = TRUE, coefs = T)# Alternative method of illustrating the model structure
semPaths(hs.fit, "std", layout = "tree", intercepts = F, residuals = T, nDigits = 2,
label.cex = 1, edge.label.cex=.95, fade = F)Now that we know that there is room for improvement for our model, we need to consider what are the different ways for us to do so.
Model Modification
Residual covariance indicates the degree of interrelationship between residuals (errors). Covariances with absolute values greater than 0.1 are generally considered concerning (this is also often associated with a high SRMR).
Ideally, all residual covariances should have absolute values below 0.1. So we proceed to check those wit a condition.
resid_cov = lavResiduals(hs.fit)$cov
(resid_cov > 0.1) | (resid_cov < -0.1)## x1 x2 x3 x4 x5 x6 x7 x8 x9
## x1 FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE
## x2 FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## x3 FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
## x4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## x5 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## x6 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## x7 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## x8 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## x9 TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
Due to our condition, TRUE label here indicates potentially problematic residual covariance between two observed variables, meaning the model may not fully account for their shared variance. This can suggest omitted relationships, correlated measurement errors, or model misspecification.
The problematic residual covariance pairs are:
- x1 ↔︎ x7
- x1 ↔︎ x9
- x2 ↔︎ x7
- x3 ↔︎ x5
- x3 ↔︎ x9
Overall, while most residual covariances are below the ±0.1 threshold, which is desirable, the variable pairs listed above still share unexplained covariance after accounting for the latent factors. This suggests that the current CFA model does not completely capture all relationships among the observed variables. In particular, x1, x3, x7, and x9 appear repeatedly, indicating these indicators may contribute most strongly to the remaining model misfit. Those should thus be our focus when it comes to trying to improve our model.
We move on to assessing the potential areas for improvement with ‘modindices’ function. Modification indices indicate which additional parameters, if freely estimated, would most improve model fit. Larger MI values suggest stronger potential model misspecification.
Key columns:
- mi → expected drop in chi-square if the parameter is added
- epc → expected parameter estimate
- sepc.all → fully standardized expected effect size (usually the easiest to interpret)
modifs <- modindices(hs.fit, sort = TRUE)
head(modifs, 15)## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 30 visual =~ x9 36.411 0.577 0.519 0.515 0.515
## 76 x7 ~~ x8 34.145 0.536 0.536 0.859 0.859
## 28 visual =~ x7 18.631 -0.422 -0.380 -0.349 -0.349
## 78 x8 ~~ x9 14.946 -0.423 -0.423 -0.805 -0.805
## 33 textual =~ x3 9.151 -0.272 -0.269 -0.238 -0.238
## 55 x2 ~~ x7 8.918 -0.183 -0.183 -0.192 -0.192
## 31 textual =~ x1 8.903 0.350 0.347 0.297 0.297
## 51 x2 ~~ x3 8.532 0.218 0.218 0.223 0.223
## 59 x3 ~~ x5 7.858 -0.130 -0.130 -0.212 -0.212
## 26 visual =~ x5 7.441 -0.210 -0.189 -0.147 -0.147
## 50 x1 ~~ x9 7.335 0.138 0.138 0.247 0.247
## 65 x4 ~~ x6 6.220 -0.235 -0.235 -0.646 -0.646
## 66 x4 ~~ x7 5.920 0.098 0.098 0.180 0.180
## 48 x1 ~~ x7 5.420 -0.129 -0.129 -0.195 -0.195
## 77 x7 ~~ x9 5.183 -0.187 -0.187 -0.278 -0.278
Interpretation of the most important results:
- visual =~ x9, MI = 36.41, standardized loading = 0.515
This suggests x9 may also load substantially on the visual factor, even though it currently belongs only to the speed factor. This indicates possible cross-loading and overlap between visual and speed constructs. As we know from the variable description, due to the fact that x9 is a speeded test of discrimation of straight and curved caps, it can be hypothesized rater easily how it loads on the visual factor.
- x7 ~~ x8, MI = 34.15, standardized residual covariance = 0.859
This is a very large residual covariance. x7 and x8 share substantial unexplained variance beyond the latent speed factor. This may indicate very similar item wording, shared method effects, or an omitted subdimension. Lets look at these two: x7 - Speeded addition test, x8 - Speeded counting of dots in shape - it looks like both have to do not only with speed but also one’s math abilities.
- visual =~ x7, MI = 18.63, standardized loading = −0.349
x7 may also negatively load on the visual factor. This again suggests overlap or model misspecification involving the visual and speed constructs.
- x8 ~~ x9, MI = 14.95, standardized residual covariance = −0.805
x8 and x9 show strong residual association not captured by the current model. This is another sign that the speed indicators may contain additional shared variance. If we look at these two: x8 - Speeded counting of dots in shape, x9 - Speeded discrimation of straight and curved caps - it looks like both have to do something with geometric shapes and forms.
- textual =~ x3, MI = 9.15, standardized loading = −0.238
x3 may partially relate to the textual factor, suggesting weak cross-loading between visual and textual constructs. Our variables x3 is ‘Lozenges from Thorndike-Shapes flipped over then identify target’ test which is a test for spatial ability, so it can be a bit confusing as to why there is a potential loading on the textual factor.
- x2 ~~ x7 and x2 ~~ x3
These suggest localized residual associations among indicators that are not fully explained by the latent factors.
- x4 ~~ x6, MI = 6.22, standardized covariance = −0.646
Strong residual covariance within the textual factor. This may indicate that x4 and x6 are especially similar or share unique variance beyond the latent construct.
Let’s try to make one of the modifications to our model based on point #2. In particular, x7 and x8 shared substantial unexplained variance beyond the latent speed factor which indicated that there was a possibility of a very similar item wording, shared method effects, etc. We concluded that the two might also both have to do with one’s math abilities. x7 was also one of out ‘problematic variables’ in terms of residual covariance.
We proceed to refit the model anew.
model = '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
x7 ~~ x8
'
fit_final = cfa(model, data)
summary(fit_final, fit.measures = TRUE, standardized = TRUE, rsquare = T)## lavaan 0.6-19 ended normally after 43 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 22
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 53.272
## Degrees of freedom 23
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 918.852
## Degrees of freedom 36
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.966
## Tucker-Lewis Index (TLI) 0.946
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3721.728
## Loglikelihood unrestricted model (H1) -3695.092
##
## Akaike (AIC) 7487.457
## Bayesian (BIC) 7569.013
## Sample-size adjusted Bayesian (SABIC) 7499.242
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.066
## 90 Percent confidence interval - lower 0.043
## 90 Percent confidence interval - upper 0.090
## P-value H_0: RMSEA <= 0.050 0.118
## P-value H_0: RMSEA >= 0.080 0.175
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.047
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## visual =~
## x1 1.000 0.885 0.759
## x2 0.576 0.098 5.898 0.000 0.509 0.433
## x3 0.752 0.103 7.289 0.000 0.665 0.589
## textual =~
## x4 1.000 0.989 0.851
## x5 1.115 0.066 17.015 0.000 1.103 0.856
## x6 0.926 0.056 16.682 0.000 0.916 0.838
## speed =~
## x7 1.000 0.383 0.352
## x8 1.244 0.194 6.414 0.000 0.477 0.471
## x9 2.515 0.641 3.924 0.000 0.963 0.956
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x7 ~~
## .x8 0.353 0.067 5.239 0.000 0.353 0.389
## visual ~~
## textual 0.400 0.073 5.511 0.000 0.457 0.457
## speed 0.184 0.054 3.423 0.001 0.544 0.544
## textual ~~
## speed 0.102 0.036 2.854 0.004 0.270 0.270
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x1 0.576 0.101 5.678 0.000 0.576 0.424
## .x2 1.122 0.100 11.171 0.000 1.122 0.812
## .x3 0.832 0.087 9.552 0.000 0.832 0.653
## .x4 0.372 0.048 7.791 0.000 0.372 0.276
## .x5 0.444 0.058 7.600 0.000 0.444 0.267
## .x6 0.357 0.043 8.287 0.000 0.357 0.298
## .x7 1.036 0.090 11.501 0.000 1.036 0.876
## .x8 0.795 0.080 9.988 0.000 0.795 0.778
## .x9 0.088 0.188 0.466 0.641 0.088 0.086
## visual 0.783 0.135 5.810 0.000 1.000 1.000
## textual 0.978 0.112 8.729 0.000 1.000 1.000
## speed 0.147 0.056 2.615 0.009 1.000 1.000
##
## R-Square:
## Estimate
## x1 0.576
## x2 0.188
## x3 0.347
## x4 0.724
## x5 0.733
## x6 0.702
## x7 0.124
## x8 0.222
## x9 0.914
The new model demonstrates substantially better fit than the original model after adding the residual covariance between x7 and x8 (x7 ~~ x8). Nearly all major fit indices improved, suggesting that the modification successfully captured previously unexplained shared variance between these two indicators.
fit_comparison <- data.frame(
`Fit Index` = c("χ²", "df", "CFI", "TLI", "RMSEA", "SRMR", "AIC", "BIC"),
`Old Model` = c(85.31, 24, 0.931, 0.896, 0.092, 0.065, 7517.49, 7595.34),
`New Model` = c(53.27, 23, 0.966, 0.946, 0.066, 0.047, 7487.46, 7569.01),
Interpretation = c(
"Large reduction in model misfit",
"One additional free parameter added",
"Improved from acceptable to excellent fit",
"Improved from marginal to very good fit",
"Improved from mediocre/poor to acceptable",
"Improved residual fit",
"Lower → better model",
"Lower → preferred model"
)
)
kable(
fit_comparison,
caption = "Global Fit Improvement: Old vs New CFA Model",
align = "c"
) %>%
kable_styling(
full_width = FALSE,
bootstrap_options = c("striped", "hover", "condensed")
)| Fit.Index | Old.Model | New.Model | Interpretation |
|---|---|---|---|
| χ² | 85.310 | 53.270 | Large reduction in model misfit |
| df | 24.000 | 23.000 | One additional free parameter added |
| CFI | 0.931 | 0.966 | Improved from acceptable to excellent fit |
| TLI | 0.896 | 0.946 | Improved from marginal to very good fit |
| RMSEA | 0.092 | 0.066 | Improved from mediocre/poor to acceptable |
| SRMR | 0.065 | 0.047 | Improved residual fit |
| AIC | 7517.490 | 7487.460 | Lower → better model |
| BIC | 7595.340 | 7569.010 | Lower → preferred model |
The new model clearly outperforms the original model. The strongest improvements are visible in RMSEA, CFI, and TLI, indicating that the added covariance resolved an important area of localized misfit.
The new model added: x7 ~~ x8 = 0.353 (standardized = 0.389, p < .001); x7 and x8 share substantial variance beyond what is explained by the latent speed factor.
As for the factor loadings:
- the visual factor structure remains mostly unchanged: x2 continues to be the weakest indicator
- the textual factor remains the strongest and most reliable latent construct in both models
- after accounting for the covariance between x7 and x8: x7 and x8 became much weaker indicators of speed, while x9 became an extremely dominant indicator.This suggests that much of the relationship between x7 and x8 was previously being incorrectly attributed to the latent speed factor itself.
Residual Variances: a major change happens for x9 that now has a residual of 0.088 (non-significant, p = .641).The speed factor now explains almost all variance in x9. This is also reflected in: standardized loading = 0.956, R² = 0.914. Thus, x9 is now an exceptionally strong indicator of the speed construct. Meanwhile x7 and x8 retain high residual variances,suggesting weaker measurement quality for those indicators.
As for R squared, it remains small for x7, x2, and x8 meaning that large portions of variance remain unexplained for these indicators.
Now we plot the new model.
lavaanPlot(model = fit_final,
stand = TRUE, coefs = T)semPlot::semPaths(fit_final, what = "est", rotation = 2,
style = "lisrel", font = 2)
title("Row Estimations")semPlot::semPaths(fit_final, what = "std", rotation = 2,
style = "lisrel", font = 2, title = TRUE)
title("Standardized Estimations")resid_cov = lavResiduals(fit_final)$cov
(resid_cov > 0.1) | (resid_cov < -0.1)## x1 x2 x3 x4 x5 x6 x7 x8 x9
## x1 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## x2 FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## x3 FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## x4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## x5 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## x6 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## x7 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## x8 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## x9 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Our problematic pairings now are:
- x2 ↔︎ x7
- x3 ↔︎ x5
Realiability Test
- Reliability of Latent Constructs
We first evaluate the reliability of the latent variables estimated in the CFA model. Composite reliability assesses the internal consistency of the indicators measuring each latent construct.
compRelSEM(fit_final)## visual textual speed
## 0.615 0.885 0.558
- Values above .70 are generally considered acceptable.
- Higher values indicate that the observed indicators consistently measure the same latent construct.
- This helps confirm that the CFA factors are reliable enough for further analysis.
Unfortunately, ours only exceeds 0.7 for the textual factor.
- Extract Latent Factor Scores
Next, we estimate factor scores for each participant using the fitted CFA model.
factor_scores <- as.data.frame(
lavPredict(
fit_final,
type = "lv",
method = "regression"
)
)
head(factor_scores)## visual textual speed
## 1 -0.6176105 -0.12062052 0.3193135
## 2 0.3072522 -0.99794707 0.8844514
## 3 -0.7355160 -1.86325594 -0.3802884
## 4 0.3803586 0.02137191 -0.1758895
## 5 -0.3662708 -0.12287656 0.1847846
## 6 0.1987489 -1.32304330 0.7486605
lavPredict() estimates latent variable scores for each observation. This creates estimated values for visual, textual, and speed factors. These scores represent each participant’s position on the latent constructs estimated by the CFA model.Compared to observed variables, factor scores reduce measurement error, combine information from multiple indicators, and provide cleaner estimates of the latent traits.
ggplot(factor_scores, aes(x = textual)) +
geom_histogram(
bins = 20,
fill = "#1B5E20",
color = "black"
) +
labs(
title = "Distribution of Textual Factor Scores",
x = "Factor Scores",
y = "Frequency"
) +
theme_minimal()We can take a look at the distribution of textual factor scores and then proceed to using it for further analysis as one of te predictors.
Model Comparison
One more thing to briefly mention before we move on to SEM! We need to know how to compare models with different structures.
So let’s say for the sake of comparison we build a model with only two factors: cognitive (constructed from textual and visual) and speed (since this one has to do with speed of one’s reaction).
hs.model2 <- " cognitive =~ x1 + x2 + x3 + x4 + x5 + x6
speed =~ x7 + x8 + x9 "
hs.fit2 <- cfa(hs.model2, data = data)
hs.fit2## lavaan 0.6-19 ended normally after 35 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 19
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 181.337
## Degrees of freedom 26
## P-value (Chi-square) 0.000
summary(hs.fit2, fit.measures = T)## lavaan 0.6-19 ended normally after 35 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 19
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 181.337
## Degrees of freedom 26
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 918.852
## Degrees of freedom 36
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.824
## Tucker-Lewis Index (TLI) 0.756
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3785.760
## Loglikelihood unrestricted model (H1) -3695.092
##
## Akaike (AIC) 7609.521
## Bayesian (BIC) 7679.956
## Sample-size adjusted Bayesian (SABIC) 7619.699
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.141
## 90 Percent confidence interval - lower 0.122
## 90 Percent confidence interval - upper 0.161
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.113
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## cognitive =~
## x1 1.000
## x2 0.509 0.156 3.254 0.001
## x3 0.477 0.150 3.189 0.001
## x4 1.992 0.272 7.314 0.000
## x5 2.194 0.300 7.303 0.000
## x6 1.852 0.254 7.294 0.000
## speed =~
## x7 1.000
## x8 1.148 0.163 7.054 0.000
## x9 0.891 0.124 7.189 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## cognitive ~~
## speed 0.095 0.029 3.287 0.001
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .x1 1.112 0.093 11.922 0.000
## .x2 1.318 0.108 12.193 0.000
## .x3 1.219 0.100 12.196 0.000
## .x4 0.373 0.047 7.853 0.000
## .x5 0.474 0.059 8.059 0.000
## .x6 0.351 0.043 8.221 0.000
## .x7 0.732 0.083 8.821 0.000
## .x8 0.428 0.082 5.201 0.000
## .x9 0.658 0.071 9.300 0.000
## cognitive 0.246 0.067 3.661 0.000
## speed 0.451 0.095 4.733 0.000
# Way of visualization 1
semPaths(hs.fit2)# Way of visualization 2
lavaan.diagram(hs.fit2)Although we can look at specific fit measures to determine whether either model is good, we can also compare them directly with each other!
anova(hs.fit, hs.fit2)##
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff RMSEA Df diff
## hs.fit 24 7517.5 7595.3 85.305
## hs.fit2 26 7609.5 7680.0 181.337 96.031 0.39522 2
## Pr(>Chisq)
## hs.fit
## hs.fit2 < 0.00000000000000022 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
First things first: this shows that the two models are significantly different.
As it can be seen, hs.fit2 has higher chi-square, AIC, and BIC values, indicating substantially worse fit - the larger first model is a better fit! BIC is 7680 for the smaller model vs 7595 for the larger model, and AIC has a similar difference.
The larger model (fit) accounts for the data better than the smaller model (fit2) which this justifies the 3-factor model over the alternative 2-factor model.
SEM
Confirmatory factor analysis typically identifies a single set of factors and tries to model the data in that way. But the lavaan library offers more complex structural equation modeling. For instance, one could combine several observable measures to create a factor, which then in turn can be used as a regression predictor or outcome variable to establish relationships between hidden/latent variables. So our factors become a part of a bigger picture just as it was explained during the lecture.
The lavaan syntax lets us describe models with 4 operators:
- ~ Regression (structural model)
- =~ Latent variable definition (also measurement model)
- ~~ Specifying and fixing residual correlations and covariances
- ~1 Specifying intercepts
The most common use for the third operator is that we are essentially adding the possibility for measured variables to be correlated (vs uncorrelated), or forcing them to be independent, capturing something not otherwise captured by latent variables. That’s exactly what we did with out two variables x7 and x8 above!
Structural Equation Modeling (SEM) attempts to divide the model into two parts: the measurement model and an underlying causal model relating the latent variables. The measurement model is what we most commonly think of in terms of it being a result of a factor analysis: we have measured variables we want to organize into a single latent construct. On the other hand, the causal model is a path model that showcases relationships between these latent variables.
SEM: Initial CFA structure inserted
We can fit our CFA model into a SEM path model to practice. First, without changing anything about it and inserting it exactly as it was.
hs.sem <- "visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
x7 ~~ x8 "
fit_sem <- sem(hs.sem, data = data)
summary(fit_sem, fit.measures = T, standardized = T)## lavaan 0.6-19 ended normally after 43 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 22
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 53.272
## Degrees of freedom 23
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 918.852
## Degrees of freedom 36
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.966
## Tucker-Lewis Index (TLI) 0.946
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3721.728
## Loglikelihood unrestricted model (H1) -3695.092
##
## Akaike (AIC) 7487.457
## Bayesian (BIC) 7569.013
## Sample-size adjusted Bayesian (SABIC) 7499.242
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.066
## 90 Percent confidence interval - lower 0.043
## 90 Percent confidence interval - upper 0.090
## P-value H_0: RMSEA <= 0.050 0.118
## P-value H_0: RMSEA >= 0.080 0.175
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.047
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## visual =~
## x1 1.000 0.885 0.759
## x2 0.576 0.098 5.898 0.000 0.509 0.433
## x3 0.752 0.103 7.289 0.000 0.665 0.589
## textual =~
## x4 1.000 0.989 0.851
## x5 1.115 0.066 17.015 0.000 1.103 0.856
## x6 0.926 0.056 16.682 0.000 0.916 0.838
## speed =~
## x7 1.000 0.383 0.352
## x8 1.244 0.194 6.414 0.000 0.477 0.471
## x9 2.515 0.641 3.924 0.000 0.963 0.956
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x7 ~~
## .x8 0.353 0.067 5.239 0.000 0.353 0.389
## visual ~~
## textual 0.400 0.073 5.511 0.000 0.457 0.457
## speed 0.184 0.054 3.423 0.001 0.544 0.544
## textual ~~
## speed 0.102 0.036 2.854 0.004 0.270 0.270
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x1 0.576 0.101 5.678 0.000 0.576 0.424
## .x2 1.122 0.100 11.171 0.000 1.122 0.812
## .x3 0.832 0.087 9.552 0.000 0.832 0.653
## .x4 0.372 0.048 7.791 0.000 0.372 0.276
## .x5 0.444 0.058 7.600 0.000 0.444 0.267
## .x6 0.357 0.043 8.287 0.000 0.357 0.298
## .x7 1.036 0.090 11.501 0.000 1.036 0.876
## .x8 0.795 0.080 9.988 0.000 0.795 0.778
## .x9 0.088 0.188 0.466 0.641 0.088 0.086
## visual 0.783 0.135 5.810 0.000 1.000 1.000
## textual 0.978 0.112 8.729 0.000 1.000 1.000
## speed 0.147 0.056 2.615 0.009 1.000 1.000
# Path diagram
semPaths(fit_sem, what = "col", whatLabels = "std", style = "mx", rotation = 1, layout = "spring", nCharNodes = 7, edge.label.cex = 0.9, shapeMan = "rectangle", sizeMan = 8, sizeMan2 = 5)SEM: Relations between factors
Now let’s say we want to specify the relationship in-between these latent factors, not just allow them to be correlated. Given our specific factors for this data, we can try to think of which factors could potentially theoretically predict others. For instance, can one’s performance on textual tasks be used to predict the same individuals performance on visual task? Or is it the other way around? Is speed purely physical or does it actually have to do something with other factors? Is performance on textual tasks completely unrelated to performance on visual tasks? Which of the three can be considered to be operating the most similar to a predictor?
Let’s work with the following structure first: we are here assuming that speed will be used as a predictor for the other two factors. The reasoning behind that is that processing speed is often considered a more fundamental cognitive resource due to the fact that faster information processing may improve visual reasoning, reading/comprehension, and task efficiency.
hs.sem2 <- " visual =~ x2 + x1 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
x7 ~~ x8
textual ~ speed
visual ~ speed
"
fit_sem2 <- sem(hs.sem2, data = data)
summary(fit_sem2, fit.measures = T, standardized = T)## lavaan 0.6-19 ended normally after 48 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 22
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 53.272
## Degrees of freedom 23
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 918.852
## Degrees of freedom 36
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.966
## Tucker-Lewis Index (TLI) 0.946
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3721.728
## Loglikelihood unrestricted model (H1) -3695.092
##
## Akaike (AIC) 7487.457
## Bayesian (BIC) 7569.013
## Sample-size adjusted Bayesian (SABIC) 7499.242
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.066
## 90 Percent confidence interval - lower 0.043
## 90 Percent confidence interval - upper 0.090
## P-value H_0: RMSEA <= 0.050 0.118
## P-value H_0: RMSEA >= 0.080 0.175
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.047
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## visual =~
## x2 1.000 0.509 0.433
## x1 1.737 0.295 5.897 0.000 0.885 0.759
## x3 1.307 0.229 5.695 0.000 0.665 0.589
## textual =~
## x4 1.000 0.989 0.851
## x5 1.115 0.066 17.015 0.000 1.103 0.856
## x6 0.926 0.056 16.682 0.000 0.916 0.838
## speed =~
## x7 1.000 0.383 0.352
## x8 1.244 0.194 6.414 0.000 0.477 0.471
## x9 2.515 0.641 3.924 0.000 0.963 0.956
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## textual ~
## speed 0.697 0.196 3.546 0.000 0.270 0.270
## visual ~
## speed 0.724 0.178 4.070 0.000 0.544 0.544
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x7 ~~
## .x8 0.353 0.067 5.239 0.000 0.353 0.389
## .visual ~~
## .textual 0.156 0.043 3.594 0.000 0.384 0.384
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x2 1.122 0.100 11.171 0.000 1.122 0.812
## .x1 0.576 0.101 5.678 0.000 0.576 0.424
## .x3 0.832 0.087 9.552 0.000 0.832 0.653
## .x4 0.372 0.048 7.791 0.000 0.372 0.276
## .x5 0.444 0.058 7.600 0.000 0.444 0.267
## .x6 0.357 0.043 8.287 0.000 0.357 0.298
## .x7 1.036 0.090 11.501 0.000 1.036 0.876
## .x8 0.795 0.080 9.988 0.000 0.795 0.778
## .x9 0.088 0.188 0.466 0.641 0.088 0.086
## .visual 0.182 0.058 3.155 0.002 0.704 0.704
## .textual 0.907 0.106 8.569 0.000 0.927 0.927
## speed 0.147 0.056 2.615 0.009 1.000 1.000
semPaths(fit_sem2, what = "col", whatLabels = "std", style = "mx", rotation = 1,
layout = "tree", nCharNodes = 7, shapeMan = "rectangle", sizeMan = 8, sizeMan2 = 5)SEM: Using factor scores in regression
Since in linear regression we wouldn’t be able to use both visual and textual abilities as our outcome at the same time for the speed ability as their predictor, we can use a one of the two for now and demonstrate how a simple linear regression model can be built. We will proceed with using students visual scores as our outcome variable assuming that speed-related abilities have a strong association with one’s abilities of completing shape-related visual tasks. The basic logic here would be: visual_score = β0 + β1(speed_score) + error. Naturally, we can control for age (directly related to grade) and gender or school type.
# Simpler SEM
hs.sem_simple <- "
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual =~ x2 + x1 + x3
x7 ~~ x8
visual ~ speed
visual ~~ textual
speed ~~ textual
"
# We specify that there is correlation between visual visual and textual abilities as well as textual and speed abilities, but textual abilities are not used as a predictor in this specific model
fit_sem_simple <- sem(hs.sem_simple, data = data)
# Extract latent factor scores from SEM/CFA model
factor_scores <- as.data.frame(
lavPredict(
fit_final,
type = "lv",
method = "regression"
)
)
# Combine original control variables with factor scores
regression_data <- cbind(
factor_scores,
data[, c("ageyr", "grade", "sex", "school")]
)
# Convert categorical controls to factors
regression_data$sex <- as.factor(regression_data$sex)
regression_data$school <- as.factor(regression_data$school)
# Linear regression with controls
lm_textual_controls <- lm(
visual ~ speed + ageyr + sex + school,
data = regression_data
)
# Regression summary
summary(lm_textual_controls)##
## Call:
## lm(formula = visual ~ speed + ageyr + sex + school, data = regression_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.65241 -0.36637 0.00073 0.42336 1.45676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.34772 0.42822 3.147 0.00182 **
## speed 1.36511 0.08974 15.212 < 0.0000000000000002 ***
## ageyr -0.09636 0.03287 -2.932 0.00364 **
## sex2 -0.21717 0.06654 -3.264 0.00123 **
## schoolPasteur 0.03176 0.06775 0.469 0.63954
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5673 on 296 degrees of freedom
## Multiple R-squared: 0.4462, Adjusted R-squared: 0.4387
## F-statistic: 59.62 on 4 and 296 DF, p-value: < 0.00000000000000022
semPaths(fit_sem_simple, what = "col", whatLabels = "std", style = "mx", rotation = 1,
layout = "tree", nCharNodes = 7, shapeMan = "rectangle", sizeMan = 8, sizeMan2 = 5)Let’s see if we can interpret this model output. Firstly, judging by R-squared and adjusted R-squared, our model explains more than 40% of variance
- Intercept: β₀ = 1.34
The intercept represents the expected textual factor score when all predictors equal zero and for the reference categories of categorical variables. While statistically significant, the intercept itself is usually not substantively important in this context.
- Speed: β = 1.36511
Speed appears to be a strong and statistically significant positive predictor of visual performance. We can say that higher processing speed is associated with higher task performance on visual shape-related tasks. Specifically, a one-unit increase in the latent speed factor is associated with an estimated 1.36511-unit increase in the visual factor score, holding age, sex, and school type constant. This also supports the overall theoretical assumption that processing speed contributes positively to verbal/textual cognitive performance.
- Age: β = -0.09636
Age is a also statistically significant negative predictor of visual performance. After controlling for sex, and school type, older students tend to have slightly lower visual factor scores. This negative effect may reflect overlap between age and other educational or developmental variables in the data set.
- Sex: β = -0.21717
Sex in our case is also a statistically significant predictor of visual performance. Specifically, the negative coefficient suggests that the group coded as the higher category in the data set (e.g., sex = 2) demonstrates lower visual factor scores compared to the reference group, while holding other predictors constant. This indicates that differences in visual task performance may exist between the two gender groups included in the data set.
- School Type: β = 0.03176
School type is not a statistically significant predictor of visual performance. This suggests that, after controlling for the other variables in the model, there is no strong evidence that students from different schools differ meaningfully in their visual task performance.
SEM: CFA structure inserted with factor predictor
We can also then proceed to add other variables from the data set to our path model structure. We do not have a lot of variables to choose from for this specific task so we will only construct a very simplistic structure. To be specific, once again, we have age (years and months) of students, their gender, school that they are attending, and their grade. These variables may theoretically influence cognitive task performance and therefore can be included as predictors or control variables within the SEM framework. However, it is rather hard to imagine any of these variables being an outcome in our path model. While this case is specific to our data, in presence of other variables in a data set, it would be totally possible to include those as outcome variables as well.
In our case:
- Age and grade level may relate to cognitive development and learning experience.
- School type may capture differences in educational environment or instruction quality.
- Gender may account for potential group differences in cognitive task performance.
For instance, we may specify a model where:
Speed predicts visual performance, while age, gender, and school type are included as additional predictors of visual ability. This allows us to move beyond pure measurement modeling and toward a more explanatory structural model.
# SEM with latent predictor + observed control variables
hs.sem_controls <- "
# Measurement model
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual =~ x2 + x1 + x3
visual ~~ textual
speed ~~ textual
# Residual covariance
x7 ~~ x8
# Structural model
visual ~ speed + ageyr + sex + school
"
# Fit SEM model
fit_sem_controls <- sem(
hs.sem_controls,
data = data
)
# Model summary
summary(
fit_sem_controls,
fit.measures = TRUE,
standardized = TRUE,
rsquare = TRUE
)## lavaan 0.6-19 ended normally after 45 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 25
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 175.220
## Degrees of freedom 47
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 1065.151
## Degrees of freedom 63
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.872
## Tucker-Lewis Index (TLI) 0.829
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3709.552
## Loglikelihood unrestricted model (H1) -3621.943
##
## Akaike (AIC) 7469.105
## Bayesian (BIC) 7561.782
## Sample-size adjusted Bayesian (SABIC) 7482.497
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.095
## 90 Percent confidence interval - lower 0.080
## 90 Percent confidence interval - upper 0.110
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 0.954
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.093
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## textual =~
## x4 1.000 0.991 0.853
## x5 1.110 0.065 17.038 0.000 1.100 0.854
## x6 0.925 0.055 16.735 0.000 0.916 0.838
## speed =~
## x7 1.000 0.377 0.347
## x8 1.238 0.193 6.427 0.000 0.467 0.462
## x9 2.601 0.668 3.894 0.000 0.981 0.974
## visual =~
## x2 1.000 0.521 0.441
## x1 1.626 0.259 6.268 0.000 0.846 0.717
## x3 1.439 0.234 6.140 0.000 0.749 0.657
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## visual ~
## speed 0.742 0.176 4.208 0.000 0.538 0.538
## ageyr -0.043 0.031 -1.402 0.161 -0.082 -0.086
## sex -0.264 0.070 -3.752 0.000 -0.507 -0.253
## school 0.210 0.068 3.087 0.002 0.404 0.202
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## textual ~~
## .visual 0.177 0.044 4.006 0.000 0.437 0.437
## speed 0.099 0.035 2.822 0.005 0.264 0.264
## .x7 ~~
## .x8 0.359 0.067 5.350 0.000 0.359 0.392
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x4 0.369 0.048 7.757 0.000 0.369 0.273
## .x5 0.449 0.058 7.699 0.000 0.449 0.270
## .x6 0.356 0.043 8.298 0.000 0.356 0.298
## .x7 1.041 0.090 11.569 0.000 1.041 0.880
## .x8 0.804 0.079 10.146 0.000 0.804 0.786
## .x9 0.052 0.196 0.264 0.792 0.052 0.051
## .x2 1.124 0.100 11.262 0.000 1.124 0.806
## .x1 0.676 0.089 7.590 0.000 0.676 0.486
## .x3 0.740 0.083 8.949 0.000 0.740 0.569
## textual 0.982 0.112 8.756 0.000 1.000 1.000
## speed 0.142 0.055 2.599 0.009 1.000 1.000
## .visual 0.167 0.053 3.178 0.001 0.616 0.616
##
## R-Square:
## Estimate
## x4 0.727
## x5 0.730
## x6 0.702
## x7 0.120
## x8 0.214
## x9 0.949
## x2 0.194
## x1 0.514
## x3 0.431
## visual 0.384
semPaths(fit_sem_controls, what = "col", whatLabels = "std", style = "mx", rotation = 1,
layout = "tree", nCharNodes = 7, shapeMan = "rectangle", sizeMan = 8, sizeMan2 = 5)We can see that other predictors now appears on our path diagram. Let’s interpret the results of our model.
- χ²(47) = 175.22, p < .001 The significant chi-square suggests that the model does not perfectly reproduce the observed covariance matrix.
- CFI = 0.872, TLI = 0.829 Both indices fall slightly below the commonly recommended .90 threshold, indicating insufficient overall model fit.
- RMSEA = 0.095 (90% CI: 0.080–0.110) This suggests relatively poor approximate fit, as values above .08 are generally considered problematic.
- SRMR = 0.093 Slightly above the preferred .08 threshold, suggesting elevated residual discrepancies between observed and model-implied correlations.
Overall, the model fit is weaker than desirable and suggests that the specified structural relationships may not fully capture the covariance structure of the data.
Structural Model Interpretation (fully standardized coefficients):
- Speed → Visual: β = 0.538, p < .001
Processing speed is a strong and statistically significant positive predictor of visual performance, thus implying that individuals with higher processing speed tend to demonstrate higher visual task performance. This supports the theoretical assumption that cognitive processing efficiency contributes positively to visual reasoning ability.
- Age → Visual: β = −0.086, p = .161
Age is not a statistically significant predictor of visual performance. After controlling for the other variables in the model, age does not appear to meaningfully influence visual ability.
- Sex → Visual: β = −0.253, p < .001
Sex is a statistically significant negative predictor of visual performance. The group coded as the higher category in the data set demonstrates lower visual factor scores compared to the reference group.
- School Type → Visual: β = 0.202, p = .002
School type is a statistically significant positive predictor of visual performance. Students from one school category demonstrate higher visual performance compared to the reference school group. This suggests that school environment or educational context may influence visual cognitive performance.
Approximately 38.4% of the variance in visual ability is explained by: speed, age, sex, and school type.
Note here that although the logic behind our earlier linear model and SEM model is similar, the models are actually not mathematically identical.
For linear regression we are using estimated factor scores from lavPredict() as if they were perfectly observed variables. So regression treats visual_score and speed_score as error-free observed data. Factor scores from lavPredict() are not “true” latent variables. They are rather regression-based approximations, shrinkage estimates, imperfect representations of the latent construct. In SEM, on the other hand, we are not using factor scores directly. Instead, SEM estimates latent variables internally, models measurement error explicitly, and estimates structural paths simultaneously with the measurement model. This thus changes the coefficients.
So which one of the two should we choose? There is no single correct answer, these two are entirely different models used for different purposes in a different way. In our case, we should generally prefer the SEM model over the regression on factor scores simply because our entire analysis is built around the discovery of latent constructs. Regression is still useful and can seem a lot more straightforward in interpretation.
SEM: Gender differences
Low let’s take gender out as our control variable and instead visualize how different is the path diagram of the same model for two genders. In this case, we are not carrying out LCA since groups are pre-determined and don’t form based on patters in data. The point of this part is to show that SEM structure of the same theoretical model can differ by group - something that we will later be exploring using LCA.
# Fit the model separately by gender
hs.sem_gender <- "
# Measurement model
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual =~ x2 + x1 + x3
visual ~~ textual
speed ~~ textual
# Residual covariance
x7 ~~ x8
# Structural model
visual ~ speed + ageyr + school
"
fit_sem_gender <- sem(
hs.sem_gender,
data = data,
group = "sex"
)
# Plot side by side
semPaths(
fit_sem_gender,
what = "std",
whatLabels = "std",
style = "mx",
layout = "tree",
residuals = FALSE,
intercepts = FALSE,
nCharNodes = 0,
sizeMan = 7,
sizeLat = 9,
panelGroups = TRUE
)Now let’s interpret the differences.
While the two diagrams represent SEM/CFA models with the same latent structure (textual, speed, and visual), the difference lies in different estimated relationships and strengths such as factor loadings, latent correlations/paths, and the role of the observed covariates (ageyr, school).
Stronger relationship between textual and speed in the model on the right (0.15 VS 0.32)
Measurement loadings are more balanced in the right model
The factor loadings in the right model appear: more consistent, more moderate,and less extreme. To be specific, in case of our left model x9 loading is very large (~1.12) and not normal since we usually expect these to be bounded between -1 and 1 when standartized. Our earlier models already showed that x9 had an extremely strong loading, residual variance for x9 being extremely small. So when we split the sample by gender the sample size per group becomes smaller, estimation became less stable, and x9 becomes too dominant within one subgroup (on the left). This pushes the standardized loading above 1. Even apart from that, the balance between x9 and x7 in case of the left model is bad: x9 loading is very large (~1.12), x7 appears weak (~0.20). This implies that x9 dominates the factor, while x7 contributes very little. In case of the right model, on the other hand, x7 increases (~0.27), x8 stronger (~0.54), x9 remains strong (~0.95). Thus the right model has a more stable measurement structure where all indicators contribute more evenly to the latent construct.
- The visual factor is stronger in the second model
In case of the left model, the visual factor has weaker loadings overall, some nearly negligible paths, very faint paths involving ageyr and school. On the other hand, for the right model the loadings from visual to x1, x2, x3 are visibly stronger and clearer.
- Covariate effects (ageyr, school) differ
In both models effects involving ageyr are weak, school has a small association. However, in case of the right model, some paths are almost absent/faded, while in case of the left model the school relationship is slightly clearer and more positive.
- The right model appears statistically cleaner
Visually, the right model shows fewer extremely weak paths, more coherent loading patterns, less imbalance among indicators. The factor loadings overall are stronger and more balanced, despite this model having weak effects of our two additional variables.
LCA / LPA
We move onto the last part of this class - LCA (latent class analysis). Currently, up until now, our CFA/SEM models assumed continuous latent dimensions: textual ability, visual ability, speed. On the contrary, LCA instead assumes that there are distinct hidden groups (classes) of students, where students inside each class have similar response/performance patterns across the 9 tests. So, in other words, instead of: “How much latent ability does a student have?”, we will now be asking “What type/profile of student does this person belong to?”.
To be specific, wile still using the same cognitive ability tests x1–x9 we could potentially identify profiles such as:
- High performers across all domains
- Strong textual but weak speed
- Strong visual but average textual
- Generally low-performing group etc.
Ideally for LCA our variables would not be continious, so what we are doing here is theoretically closer to LPA (Latent Profile Analysis) which is a continious version of LCA. Likert scale results are usually most commonly used for LCA, while for abilities we could be using students skill level ranked as low / medium / high. Technically we could convert our current available variables to such categories and proceed with LCA as usual, but converting continuous scores into categories leads to loss of information, reduced precision, and potentially reduced statistical power. So for our data set it is preferable to stick to LPA and keep the variables as they are. Naturally, when conducing LCA the functions that we use might be different, but the overall logic will remain the same. We could also switch to a entirely different data set more appropriate for LCA, but we will keep working with the current data for the sake of keeping the overall practice session more coherent.
LPA: 5-class solution
# Select intelligence variables
lpa_data <- data %>%
select(x1:x9)
# Estimate models with different numbers of classes
lpa_results <- lpa_data %>%
estimate_profiles(1:5)
# Compare Fit
compare_solutions(lpa_results)## Compare tidyLPA solutions:
##
## Model Classes BIC
## 1 1 8411.764
## 1 2 8098.389
## 1 3 7946.965
## 1 4 7925.971
## 1 5 7896.040
##
## Best model according to BIC is Model 1 with 5 classes.
##
## An analytic hierarchy process, based on the fit indices AIC, AWE, BIC, CLC, and KIC (Akogul & Erisoglu, 2017), suggests the best solution is Model 1 with 5 classes.
We get the following warning: “Warning message: The solution with the maximum number of classes under consideration was considered to be the best solution according to one or more fit indices. Examine your results with care and consider estimating more classes”. This warning is very common in LPA/LCA and does not automatically mean something is wrong. It simply means that the “best” model according to at least one fit index was the model with the largest number of classes we allowed the algorithm to estimate (the 5-class solution looked best). Fit indices in mixture models often improve as more classes are added because more classes means more flexibility, and more parameters implies better fit to the data. So AIC especially tends to keep decreasing, while BIC is trying to restrict the model and keep it simpler to prevent overfitting and tiny meaningless classes. We absolutely can examine a larger number of classes as it is suggested by R, but we can also look at our five formed classes first and see if those are meaningful to begin with.
# Extract Best Model
model_5 <- lpa_results[[5]]
# View Class Membership
class_profiles <- tidyLPA::get_data(model_5) %>%
dplyr::group_by(Class) %>%
dplyr::summarise(dplyr::across(x1:x9, mean))
class_profiles## # A tibble: 5 × 10
## Class x1 x2 x3 x4 x5 x6 x7 x8 x9
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 4.94 5.82 2.46 2.39 3.46 1.45 4.82 6.30 5.98
## 2 2 4.48 6.46 2.45 1.25 1.96 0.903 3.74 5.07 5.01
## 3 3 5.86 6.68 2.89 4.82 5.98 4.14 4.27 5.90 6.09
## 4 4 5.15 6.23 2.27 3.42 4.95 2.40 4.31 5.53 5.42
## 5 5 3.94 5.35 1.41 2.47 3.73 1.58 3.45 4.75 4.34
We are now observing mean results on each of the nine test for all five indentified classes. Overall, our solution suggests that the classes differ both quantitatively (overall performance level), and somewhat qualitatively (relative strengths/weaknesses across domains). The strongest differences between classes appear in textual performance (x4–x6), while speed variables are comparatively more stable. This suggests that verbal/textual ability is one of the major sources of heterogeneity among students.
If we look at each class in detail, we discover the following:
- Class 1 — Strong Speed / Weak Textual Profile
- High speed scores (x8 = 6.30, x9 = 5.98)
- Moderate visual
- Low textual
Students in this class appear cognitively efficient and relatively fast, but not exceptionally strong verbally.
- Class 2 — Weak Textual Profile
- Moderate speed
- Moderate visual
- Very low textual scores (x4 = 1.25, x5 = 1.96, x6 = 0.903)
This class appears to struggle specifically with textual/verbal tasks while maintaining moderate performance in other domains. This is one of the clearest qualitative profiles in the solution.
- Class 3 — High Overall Performers
- Highest scores almost everywhere
- Especially strong textual ability (x4 = 4.82, x5 = 5.98, x6 = 4.14.)
This class represents the strongest overall cognitive profile, with especially strong textual/verbal abilities alongside good visual and speed performance. This is likely our “high-achieving” subgroup.
- Class 4 — Balanced Moderate-High Performers
- Fairly balanced across domains
- No major weaknesses
Students in this class demonstrate relatively stable and balanced cognitive performance across visual, textual, and speed domains without extreme strengths or weaknesses. This could be our “typical above-average” group.
- Class 5 — Low Overall Performance Profile
- Lowest scores across most variables (x3 = 1.41, x9 = 4.34, x7 = 3.45)
This class appears to represent students with generally weaker performance across all cognitive domains - our the lowest-performing subgroup.
Let’s look at the graphs that showcase these group differences in a way clearer way and a summary table.
# Plot profiles with a basic plot function
plot_data <- tidyLPA::get_data(model_5) %>%
dplyr::group_by(Class) %>%
dplyr::summarise(dplyr::across(x1:x9, mean))# Plot Profiles with a ggplot
plot_data <- tidyLPA::get_data(model_5) %>%
group_by(Class) %>%
summarise(across(x1:x9, mean)) %>%
pivot_longer(x1:x9,
names_to = "Test",
values_to = "Mean")
ggplot(plot_data,
aes(x = Test,
y = Mean,
group = factor(Class),
color = factor(Class))) +
geom_line(size = 1.2) +
geom_point(size = 3) +
labs(
title = "Latent Profile Means Across Intelligence Tests",
color = "Class"
) +
theme_minimal()# Create interpretation summary table
class_summary <- tibble(
Class = c(
"Class 1",
"Class 2",
"Class 3",
"Class 4",
"Class 5"
),
Profile = c(
"Strong Speed / Weak Textual",
"Weak Textual",
"High Overall Performers",
"Balanced Moderate-High",
"Low Overall Performance"
),
Main_Findings = c(
"High speed scores (x8 = 6.30, x9 = 5.98), moderate visual, low textual",
"Very low textual scores (x4 = 1.25, x5 = 1.96, x6 = 0.903), moderate visual and speed",
"Highest scores across most variables, especially textual ability (x4 = 4.82, x5 = 5.98, x6 = 4.14)",
"Balanced performance across domains with no major weaknesses",
"Lowest scores across most variables (x3 = 1.41, x7 = 3.45, x9 = 4.34)"
),
Interpretation = c(
"Students appear cognitively efficient and relatively fast, but not exceptionally strong verbally.",
"Students struggle specifically with textual/verbal tasks while maintaining moderate performance in other domains.",
"Represents the strongest cognitive profile with especially strong textual/verbal abilities.",
"Represents a relatively stable and balanced above-average performance profile.",
"Represents the generally weakest-performing subgroup across cognitive domains."
)
)
# Display table
kable(
class_summary,
caption = "Interpretation of Latent Student Profiles",
align = "l"
) %>%
kable_styling(
full_width = FALSE,
bootstrap_options = c("striped", "hover", "condensed")
)| Class | Profile | Main_Findings | Interpretation |
|---|---|---|---|
| Class 1 | Strong Speed / Weak Textual | High speed scores (x8 = 6.30, x9 = 5.98), moderate visual, low textual | Students appear cognitively efficient and relatively fast, but not exceptionally strong verbally. |
| Class 2 | Weak Textual | Very low textual scores (x4 = 1.25, x5 = 1.96, x6 = 0.903), moderate visual and speed | Students struggle specifically with textual/verbal tasks while maintaining moderate performance in other domains. |
| Class 3 | High Overall Performers | Highest scores across most variables, especially textual ability (x4 = 4.82, x5 = 5.98, x6 = 4.14) | Represents the strongest cognitive profile with especially strong textual/verbal abilities. |
| Class 4 | Balanced Moderate-High | Balanced performance across domains with no major weaknesses | Represents a relatively stable and balanced above-average performance profile. |
| Class 5 | Low Overall Performance | Lowest scores across most variables (x3 = 1.41, x7 = 3.45, x9 = 4.34) | Represents the generally weakest-performing subgroup across cognitive domains. |
LPA: 3-class solution
Let’s see if we can build a model with less classes, see how the interpretation changes and compare the models in terms of their fit.
# Extract New Model
model_3 <- lpa_results[[3]]
# View Class Membership
class_profiles <- tidyLPA::get_data(model_3) %>%
group_by(Class) %>%
summarise(across(x1:x9, mean))
class_profiles## # A tibble: 3 × 10
## Class x1 x2 x3 x4 x5 x6 x7 x8 x9
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 4.98 6.13 2.15 3.21 4.70 2.19 4.21 5.48 5.35
## 2 2 4.34 5.69 2.13 1.88 2.76 1.21 4.05 5.43 5.06
## 3 3 5.87 6.68 2.85 4.74 5.96 3.99 4.34 5.89 6.04
# Plot Profiles with a ggplot
plot_data <- tidyLPA::get_data(model_3) %>%
group_by(Class) %>%
summarise(across(x1:x9, mean)) %>%
pivot_longer(x1:x9,
names_to = "Test",
values_to = "Mean")
ggplot(plot_data,
aes(x = Test,
y = Mean,
group = factor(Class),
color = factor(Class))) +
geom_line(size = 1.2) +
geom_point(size = 3) +
labs(
title = "Latent Profile Means Across Intelligence Tests",
color = "Class"
) +
theme_minimal()Class-by-class interpretation for our new model
- Class 1 — Moderate / Average Performance Profile
- Moderate scores across all domains
- Relatively balanced profile
- No major weaknesses
This class appears to represent students with generally average and balanced cognitive performance across domains.
- Class 2 — Weak Textual Performance Profile
- Lowest textual scores
- Moderate visual and speed performance
- Most domain-specific weakness
Students in this class appear to struggle specifically with textual/verbal tasks while maintaining comparatively moderate performance in visual and speed domains.
- Class 3 — High Overall Performance Profile
- Highest scores across nearly all variables
- Especially strong textual ability
- Strongest overall cognitive functioning
This class represents the strongest overall cognitive profile, combining strong visual, textual, and speed performance - our “high-achieving” subgroup.
For all three groups we can see that there is still a drop in textual abilities. Out of the three variables in that factor, all three groups perform the best on x5, then worse on x4 and even worse on x6. Similarly, for visual variables, all classes perform the best on x2, then worse on x1 and the worst on x3. However class’s 1 performance on x3 drops all the way to performance of class 2. As for the last group of speed tests, while class 3 performs the best on x9, for the other two classes the best performance is recorded for x8. Performance on x7 remains the lowest for all three classes.
# Create summary table for 3-class LPA solution
class_summary_3 <- tibble(
Class = c(
"Class 1",
"Class 2",
"Class 3"
),
Profile = c(
"Moderate / Average Performance",
"Weak Textual Performance",
"High Overall Performance"
),
Main_Findings = c(
"Moderate scores across all domains; relatively balanced profile; no major weaknesses",
"Lowest textual scores; moderate visual and speed performance; strongest domain-specific weakness",
"Highest scores across nearly all variables; especially strong textual ability; strongest overall cognitive functioning"
),
Interpretation = c(
"Represents students with generally average and balanced cognitive performance across domains.",
"Represents students who struggle specifically with textual/verbal tasks while maintaining moderate visual and speed performance.",
"Represents the strongest overall cognitive profile combining strong visual, textual, and speed abilities — the high-achieving subgroup."
)
)
# Display formatted table
kable(
class_summary_3,
caption = "Interpretation of the 3-Class Latent Profile Solution",
align = "l"
) %>%
kable_styling(
full_width = FALSE,
bootstrap_options = c("striped", "hover", "condensed")
)| Class | Profile | Main_Findings | Interpretation |
|---|---|---|---|
| Class 1 | Moderate / Average Performance | Moderate scores across all domains; relatively balanced profile; no major weaknesses | Represents students with generally average and balanced cognitive performance across domains. |
| Class 2 | Weak Textual Performance | Lowest textual scores; moderate visual and speed performance; strongest domain-specific weakness | Represents students who struggle specifically with textual/verbal tasks while maintaining moderate visual and speed performance. |
| Class 3 | High Overall Performance | Highest scores across nearly all variables; especially strong textual ability; strongest overall cognitive functioning | Represents the strongest overall cognitive profile combining strong visual, textual, and speed abilities — the high-achieving subgroup. |
Now let’s compare the performance of the two models.
# Extract fit statistics for all estimated models
model_fit <- tidyLPA::get_fit(lpa_results)
# Keep only 3-class and 5-class solutions
compare_models <- model_fit %>%
filter(Classes %in% c(3, 5)) %>%
select(
Model,
Classes,
AIC,
BIC,
SABIC,
Entropy,
prob_min,
prob_max,
n_min,
n_max
)
# Display comparison table
kable(
compare_models,
caption = "Comparison of 3-Class and 5-Class LPA Solutions",
digits = 3,
align = "c"
) %>%
kable_styling(
full_width = FALSE,
bootstrap_options = c("striped", "hover", "condensed")
)| Model | Classes | AIC | BIC | SABIC | Entropy | prob_min | prob_max | n_min | n_max |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 7806.095 | 7946.965 | 7826.451 | 0.851 | 0.909 | 0.948 | 0.153 | 0.561 |
| 1 | 5 | 7681.028 | 7896.040 | 7712.097 | 0.824 | 0.822 | 0.937 | 0.093 | 0.429 |
Overall, the 5-class solution demonstrates better numerical fit according to the information criteria, while the 3-class solution appears somewhat cleaner and more parsimonious overall. To be specific, all fit indices decrease substantially in the 5-class model, indicating improved statistical fit. There is even an improvement in BIC. When it comes to entropy, both values are good and exceed .80, indicating reasonably strong class separation. However, the 3-class model shows slightly cleaner classification. As for posterior probabilities, the 3-class model produces more certain class assignments overall, while the 5-class model introduces slightly more overlap between classes. Additionally, the 5-class model creates smaller subgroups, but they are still above common practical cutoffs (~5%).
Since the three-class model will be easier for us to illustrate and interpret we will now carry it out to our SEM path model.
LPA and SEM: separate
We might be asking: how different will be our SEM path diagram for these three classes? How different will be the loadings, and correlations with included additional variables? We can carry out this analysis in a similar way with how we did it with gender earlier.
# Extract class assignments
class_membership <- tidyLPA::get_data(model_3)$Class
# Add to original dataset
data$class_3 <- factor(class_membership)
# Number of students in each class
table(data$class_3)##
## 1 2 3
## 169 86 46
Note that we will make our model a bit simpler for this specific case because our third class only consists of 46 observations which is relatively low for a complex SEM model. We will still keep the basic three-factor structure and predictor variables.
hs.sem_class3 <- "
# Measurement model
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual =~ x2 + x1 + x3
"fit1 <- sem(
hs.sem_class3,
data = subset(data, class_3 == "1")
)
fit2 <- sem(
hs.sem_class3,
data = subset(data, class_3 == "2")
)
fit3 <- sem(
hs.sem_class3,
data = subset(data, class_3 == "3")
)
par(mfrow = c(1,3))
semPaths(fit1,
what = "std",
whatLabels = "std",
style = "mx",
layout = "tree",
residuals = FALSE,
intercepts = FALSE,
edge.label.cex = 1.4)
title("Class 1: Average Profile")
semPaths(fit2,
what = "std",
whatLabels = "std",
style = "mx",
layout = "tree",
residuals = FALSE,
intercepts = FALSE,
edge.label.cex = 1.4)
title("Class 2: Weak Textual Profile")
semPaths(fit3,
what = "std",
whatLabels = "std",
style = "mx",
layout = "tree",
residuals = FALSE,
intercepts = FALSE,
edge.label.cex = 1.4)
title("Class 3: High Performance Profile")- Class 1
This model appears relatively stable and interpretable.
- Strong loading for: x6 on textual (1.24)
- Moderate loadings for: speed indicators (0.51–0.68) visual indicators (0.57–0.83) *Very weak loading for: x4 on textual (0.19)
The latent structure is reasonably coherent in this class:
- textual ability is primarily represented by x6,
- speed and visual factors are moderately stable,
- overall measurement quality is acceptable.
- Class 2
This class shows the clearest and strongest factor structure overall.
- Very strong textual loading: x5 = 0.96
- Strong speed factor: especially x8 = 0.89
- Strong visual factor: x3 = 0.65
- Moderate latent correlations: textual ↔︎ visual (0.41) textual ↔︎ speed (0.16)
- Negative correlation between speed and visual: −0.31
This class demonstrates the most differentiated latent structure, clearer separation between cognitive domains, relatively reliable factor measurement.
- Class 3
This model is unstable and likely poorly identified.
- Only one visible loading: x4 = 112.50
- Most paths are absent
- Factor structure collapsed
This is a classic sign of SEM estimation problems:
- non-convergence,
- improper solutions,
- or factor underidentification.
The extremely large loading (112.50) is not substantively meaningful and indicates model instability rather than a real effect. This likely occurs because the class is too small (n = 46), the indicators are highly correlated, or variance within the class is too low for reliable latent estimation. Therefore, the SEM solution for Class 3 should not be interpreted substantively. If there was more people present in Class 3 there is chance that we would observe a structure more similar to the two remaining classes.
Thus, we can see that we are clearly benefiting from looking at the between-class differences here instead of assuming a single structure for the entire sample. All of a sudden the correlation between textual test performance and visual test performance that used to be positive in Class 1 is now negative and stronger in Class 3, while the structure for Class 3 collapses entirely. The strength of present and comparable correlations is also different accross the three classes.
Now if we try to replicate the same result without carrying out LPA separately from SEM, the model fails to converge because we move from a simple clustering problem (LPA) to a much more complex mixture SEM.
In LPA, classes are defined only by differences in observed means and variances, which makes the model relatively stable and easy to estimate. In mixture SEM, however, each class must simultaneously estimate:
- a full measurement model (factor loadings and latent variables)
- structural paths
- and class-specific variances/covariances
This creates several issues:
- Too many parameters per class relative to the sample size → weak identification
- Flat or irregular likelihood surface → non-convex Hessian
- Reduced class size → unstable covariance estimation
- High model complexity → local maxima and saddle points
In short, LPA works because it clusters observed data directly, while mixture SEM often fails here because the data does not provide enough information to reliably support separate latent-factor structures across classes. We already saw that there were some problems with class differences even at our LPA + SEM stage.