This study aims to analyze the influence of Career Level and Performance and Reward on Tenure Stability, and its subsequent effect on Attrition Risk using the CB-SEM (Covariance-Based Structural Equation Modeling) approach.
The dataset used is the IBM HR Analytics Employee Attrition
and Performance dataset, publicly available on Kaggle,
consisting of 1,470 observations and 35 variables. CB-SEM was selected
because all constructs are reflective in nature, the
analytical objective is theory confirmation, and
estimation is performed using Maximum Likelihood Robust
(MLR). The analysis utilizes the lavaan package in
R.
The construct design in this study is data-driven, meaning indicators were selected based on empirical correlation patterns observed in the dataset. Only indicator groups with inter-indicator correlations above 0.30 were retained for each construct, ensuring that each latent variable is statistically identifiable and theoretically meaningful.
The inner model describes the causal relationships between latent constructs. Based on human resource management theory and empirical correlation analysis of the IBM HR dataset, the structural model is defined as follows:
The hypotheses tested are:
The outer model defines the indicators that reflectively measure each latent construct. Indicators were selected based on inter-indicator correlation analysis to ensure construct validity. Each indicator is assigned exclusively to one construct.
The packages used include lavaan for SEM and CFA
estimation, MVN for multivariate normality testing,
psych for KMO testing, semPlot for path
diagram visualization, and car for multicollinearity
checks.
library(lavaan)
library(MVN)
library(psych)
library(semPlot)
library(car)
library(semTools)
library(ggplot2)
library(tidyr)
library(dplyr)
data <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")
# Re-code categorical variables to numeric
data$OverTime <- ifelse(data$OverTime == "Yes", 1, 0)
data$Attrition <- ifelse(data$Attrition == "Yes", 1, 0)
# Select all variables relevant to the SEM model
df <- data[, c(
"MonthlyIncome", "JobLevel", "TotalWorkingYears",
"PercentSalaryHike", "PerformanceRating",
"YearsAtCompany", "YearsInCurrentRole", "YearsWithCurrManager",
"YearsSinceLastPromotion", "NumCompaniesWorked", "OverTime",
"Attrition"
)]
# Standardize all variables to equalize scale differences
# This is necessary because MonthlyIncome has variance ~22 million
# while other variables have variance < 100, which causes optimizer failure
df_scaled <- as.data.frame(scale(df))
Before proceeding to assumption testing, all variables are verified to be numeric and free of missing values.
str(df_scaled)
## 'data.frame': 1470 obs. of 12 variables:
## $ MonthlyIncome : num -0.108 -0.292 -0.937 -0.763 -0.645 ...
## $ JobLevel : num -0.0578 -0.0578 -0.9612 -0.9612 -0.9612 ...
## $ TotalWorkingYears : num -0.421 -0.164 -0.55 -0.421 -0.679 ...
## $ PercentSalaryHike : num -1.1502 2.1286 -0.0572 -1.1502 -0.8769 ...
## $ PerformanceRating : num -0.426 2.345 -0.426 -0.426 -0.426 ...
## $ YearsAtCompany : num -0.165 0.488 -1.144 0.162 -0.817 ...
## $ YearsInCurrentRole : num -0.0633 0.7647 -1.1673 0.7647 -0.6153 ...
## $ YearsWithCurrManager : num 0.246 0.806 -1.156 -1.156 -0.595 ...
## $ YearsSinceLastPromotion: num -0.6789 -0.3686 -0.6789 0.2521 -0.0583 ...
## $ NumCompaniesWorked : num 2.124 -0.678 1.324 -0.678 2.525 ...
## $ OverTime : num 1.591 -0.628 1.591 1.591 -0.628 ...
## $ Attrition : num 2.28 -0.438 2.28 -0.438 -0.438 ...
cat("\nData Type per Column\n")
##
## Data Type per Column
print(sapply(df_scaled, class))
## MonthlyIncome JobLevel TotalWorkingYears
## "numeric" "numeric" "numeric"
## PercentSalaryHike PerformanceRating YearsAtCompany
## "numeric" "numeric" "numeric"
## YearsInCurrentRole YearsWithCurrManager YearsSinceLastPromotion
## "numeric" "numeric" "numeric"
## NumCompaniesWorked OverTime Attrition
## "numeric" "numeric" "numeric"
cat("\nMissing Values per Column\n")
##
## Missing Values per Column
print(colSums(is.na(df_scaled)))
## MonthlyIncome JobLevel TotalWorkingYears
## 0 0 0
## PercentSalaryHike PerformanceRating YearsAtCompany
## 0 0 0
## YearsInCurrentRole YearsWithCurrManager YearsSinceLastPromotion
## 0 0 0
## NumCompaniesWorked OverTime Attrition
## 0 0 0
cat("\nDataset Dimensions\n")
##
## Dataset Dimensions
cat("Number of observations:", nrow(df_scaled),
"| Number of variables:", ncol(df_scaled), "\n")
## Number of observations: 1470 | Number of variables: 12
library(e1071)
##
## Attaching package: 'e1071'
## The following object is masked from 'package:ggplot2':
##
## element
## The following object is masked from 'package:semTools':
##
## kurtosis
desc_stats <- data.frame(
Variabel = names(df_scaled),
Min = round(sapply(df_scaled, min), 4),
Max = round(sapply(df_scaled, max), 4),
Mean = round(sapply(df_scaled, mean), 4),
Std_Dev = round(sapply(df_scaled, sd), 4),
Skewness = round(sapply(df_scaled, skewness), 4),
Kurtosis = round(sapply(df_scaled, kurtosis), 4)
)
cat("\nDescriptive Statistics of Indicator Variables\n")
##
## Descriptive Statistics of Indicator Variables
print(desc_stats, row.names = FALSE)
## Variabel Min Max Mean Std_Dev Skewness Kurtosis
## MonthlyIncome -1.1669 2.8667 0 1 1.3670 0.9923
## JobLevel -0.9612 2.6524 0 1 1.0233 0.3891
## TotalWorkingYears -1.4497 3.6912 0 1 1.1149 0.9058
## PercentSalaryHike -1.1502 2.6750 0 1 0.8195 -0.3073
## PerformanceRating -0.4261 2.3454 0 1 1.9180 1.6797
## YearsAtCompany -1.1439 5.3851 0 1 1.7609 3.9086
## YearsInCurrentRole -1.1673 3.8008 0 1 0.9155 0.4670
## YearsWithCurrManager -1.1555 3.6089 0 1 0.8318 0.1621
## YearsSinceLastPromotion -0.6789 3.9760 0 1 1.9802 3.5873
## NumCompaniesWorked -1.0781 2.5247 0 1 1.0244 0.0020
## OverTime -0.6280 1.5912 0 1 0.9625 -1.0743
## Attrition -0.4383 2.2801 0 1 1.8406 1.3888
df_long <- df_scaled %>%
select(-Attrition) %>%
pivot_longer(cols = everything(),
names_to = "Variable",
values_to = "Value")
ggplot(df_long, aes(x = Value, fill = Variable)) +
geom_histogram(bins = 30, color = "white", show.legend = FALSE) +
facet_wrap(~ Variable, scales = "free", ncol = 3) +
labs(
title = "Distribution of Indicator Variables (Standardized)",
x = "Standardized Value",
y = "Frequency"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
strip.text = element_text(face = "bold", size = 8)
)
Before running CFA and SEM, three key assumptions must be verified: multivariate normality, data adequacy (KMO), and non-multicollinearity (VIF).
Mardia’s Test evaluates whether the data follows a multivariate normal distribution by examining multivariate skewness and kurtosis.
mvn_result <- mvn(data = df_scaled, mvn_test = "mardia")
cat("Multivariate Normality Test Results (Mardia's Test)\n\n")
## Multivariate Normality Test Results (Mardia's Test)
mardia_tbl <- NULL
if (!is.null(mvn_result$multivariateNormality)) {
mardia_tbl <- mvn_result$multivariateNormality
} else if (!is.null(mvn_result$MVNresult)) {
mardia_tbl <- mvn_result$MVNresult
} else if (!is.null(mvn_result$Result)) {
mardia_tbl <- mvn_result$Result
}
if (!is.null(mardia_tbl)) {
print(mardia_tbl)
} else {
cat("Note: MVN package output format tidak dikenali. Menghitung manual.\n\n")
X <- as.matrix(df_scaled)
n <- nrow(X)
p <- ncol(X)
S <- cov(X)
S_inv <- solve(S)
mu <- colMeans(X)
D <- X - matrix(mu, nrow = n, ncol = p, byrow = TRUE)
# Mardia Skewness
g1p <- sum(apply(D, 1, function(xi)
apply(D, 1, function(xj)
(as.numeric(xi %*% S_inv %*% xj))^3))) / n^2
k1 <- n * g1p / 6
df1 <- p * (p + 1) * (p + 2) / 6
pval_sk <- pchisq(k1, df = df1, lower.tail = FALSE)
# Mardia Kurtosis
g2p <- sum(apply(D, 1, function(xi)
(as.numeric(xi %*% S_inv %*% xi))^2)) / n
k2 <- (g2p - p * (p + 2)) / sqrt(8 * p * (p + 2) / n)
pval_ku <- 2 * pnorm(-abs(k2))
cat(sprintf("Mardia Skewness : %.4f (Chi-sq = %.4f, df = %d, p = %.4f) %s\n",
g1p, k1, df1, pval_sk,
ifelse(pval_sk > 0.05, "MVN", "NON-MVN")))
cat(sprintf("Mardia Kurtosis : %.4f (z = %.4f, p = %.4f) %s\n",
g2p, k2, pval_ku,
ifelse(pval_ku > 0.05, "MVN", "NON-MVN")))
}
## Note: MVN package output format tidak dikenali. Menghitung manual.
##
## Mardia Skewness : 40.7665 (Chi-sq = 9987.7924, df = 364, p = 0.0000) NON-MVN
## Mardia Kurtosis : 213.7274 (z = 47.8228, p = 0.0000) NON-MVN
cat("\n")
cat("Interpretation: Dengan N = 1,470, pelanggaran normalitas multivariat adalah\n")
## Interpretation: Dengan N = 1,470, pelanggaran normalitas multivariat adalah
cat("hal yang lazim dan diharapkan. Estimator MLR (Maximum Likelihood Robust)\n")
## hal yang lazim dan diharapkan. Estimator MLR (Maximum Likelihood Robust)
cat("digunakan untuk mengakomodasi kondisi ini tanpa mengasumsikan normalitas.\n")
## digunakan untuk mengakomodasi kondisi ini tanpa mengasumsikan normalitas.
Given the large sample size (N = 1,470), violation of multivariate normality is expected and common. The MLR (robust ML) estimator is therefore used throughout this analysis, as it provides robust standard errors and test statistics without requiring multivariate normality.
The KMO Measure of Sampling Adequacy and Bartlett’s Test of Sphericity are applied to verify that the data is suitable for factor analysis.
kmo_result <- KMO(df_scaled)
cat("KMO Test Results (Data Adequacy)\n")
## KMO Test Results (Data Adequacy)
print(kmo_result)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = df_scaled)
## Overall MSA = 0.78
## MSA for each item =
## MonthlyIncome JobLevel TotalWorkingYears
## 0.73 0.74 0.89
## PercentSalaryHike PerformanceRating YearsAtCompany
## 0.50 0.50 0.82
## YearsInCurrentRole YearsWithCurrManager YearsSinceLastPromotion
## 0.88 0.87 0.93
## NumCompaniesWorked OverTime Attrition
## 0.43 0.47 0.68
cat("\nOverall MSA:", round(kmo_result$MSA, 4), "\n")
##
## Overall MSA: 0.7766
cat("Requirement Overall MSA > 0.5:",
ifelse(kmo_result$MSA > 0.5, "SATISFIED", "NOT SATISFIED"), "\n")
## Requirement Overall MSA > 0.5: SATISFIED
cat("\nBartlett's Test of Sphericity Results\n")
##
## Bartlett's Test of Sphericity Results
bartlett_result <- cortest.bartlett(df_scaled)
print(bartlett_result)
## $chisq
## [1] 10901.91
##
## $p.value
## [1] 0
##
## $df
## [1] 66
cat("Requirement p-value < 0.05:",
ifelse(bartlett_result$p.value < 0.05, "SATISFIED", "NOT SATISFIED"), "\n")
## Requirement p-value < 0.05: SATISFIED
KMO Interpretation: The Overall MSA value of 0.78 falls in the Middling-to-Meritorious range (0.70–0.89), indicating that the correlation patterns in the data are sufficiently compact for factor analysis. Individual MSA values below 0.50 (e.g., OverTime, Attrition, NumCompaniesWorked) suggest these variables share less common variance with others — however, since they serve specific theoretical roles within the Attrition Risk construct, they are retained.
Bartlett’s Test Interpretation: The significant result (p < 0.05) confirms that the correlation matrix is not an identity matrix, meaning the variables are sufficiently intercorrelated to justify factor analysis. Both KMO and Bartlett’s test together confirm that the data is suitable for CFA and SEM.
VIF (Variance Inflation Factor) is computed using an auxiliary
regression with Attrition as the dependent variable to
assess multicollinearity among the SEM indicators. For
CB-SEM, the commonly applied threshold is VIF
< 10 (Hair et al., 2010), with VIF < 5
as a more conservative benchmark. Note that the stricter threshold of
VIF < 3.3 is specific to PLS-SEM and is not required here. Variables
are flagged for review if VIF >= 5, and considered severely
problematic if VIF >= 10.
lm_check <- lm(Attrition ~ MonthlyIncome + JobLevel + TotalWorkingYears +
PercentSalaryHike + PerformanceRating +
YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager +
YearsSinceLastPromotion + NumCompaniesWorked + OverTime,
data = df_scaled)
vif_values <- vif(lm_check)
cat("VIF Test Results (Multicollinearity) — CB-SEM Threshold\n\n")
## VIF Test Results (Multicollinearity) — CB-SEM Threshold
cat("Threshold Reference:\n")
## Threshold Reference:
cat(" VIF < 5 : Safe (conservative CB-SEM standard)\n")
## VIF < 5 : Safe (conservative CB-SEM standard)
cat(" VIF 5–10 : Moderate concern — acceptable in CB-SEM context\n")
## VIF 5–10 : Moderate concern — acceptable in CB-SEM context
cat(" VIF >= 10 : High multicollinearity — requires attention\n\n")
## VIF >= 10 : High multicollinearity — requires attention
vif_df <- data.frame(
Variable = names(vif_values),
VIF = round(vif_values, 4),
Status = ifelse(vif_values < 5, "Safe (< 5)",
ifelse(vif_values < 10, "Moderate (5–9.99)",
"High (>= 10)"))
)
print(vif_df, row.names = FALSE)
## Variable VIF Status
## MonthlyIncome 10.6867 High (>= 10)
## JobLevel 11.0862 High (>= 10)
## TotalWorkingYears 3.5342 Safe (< 5)
## PercentSalaryHike 2.5027 Safe (< 5)
## PerformanceRating 2.5045 Safe (< 5)
## YearsAtCompany 4.5250 Safe (< 5)
## YearsInCurrentRole 2.6761 Safe (< 5)
## YearsWithCurrManager 2.7515 Safe (< 5)
## YearsSinceLastPromotion 1.6678 Safe (< 5)
## NumCompaniesWorked 1.2188 Safe (< 5)
## OverTime 1.0055 Safe (< 5)
cat("\nMaximum VIF:", round(max(vif_values), 4), "\n")
##
## Maximum VIF: 11.0862
cat("\nNote: In CB-SEM, high VIF between indicators of the SAME construct\n")
##
## Note: In CB-SEM, high VIF between indicators of the SAME construct
cat("is expected and acceptable because those indicators are DESIGNED to\n")
## is expected and acceptable because those indicators are DESIGNED to
cat("be correlated (reflective measurement). VIF above 10 is only a\n")
## be correlated (reflective measurement). VIF above 10 is only a
cat("concern for indicators across DIFFERENT constructs.\n")
## concern for indicators across DIFFERENT constructs.
cat("\nConclusion (CB-SEM standard VIF < 10):",
ifelse(all(vif_values < 10), "ALL SATISFIED", "SOME NEED ATTENTION"), "\n")
##
## Conclusion (CB-SEM standard VIF < 10): SOME NEED ATTENTION
CFA is conducted separately for each construct to verify that each indicator validly measures its latent construct. Evaluation criteria: standardized loading >= 0.5, CFI > 0.90, RMSEA < 0.08, SRMR < 0.08.
Note: A construct with exactly 3 indicators is just-identified (df = 0) and individual model fit cannot be computed. Fit is evaluated at the combined CFA stage.
model_CL <- '
CareerLevel =~ MonthlyIncome + JobLevel + TotalWorkingYears
'
fit_CL <- cfa(model_CL, data = df_scaled, estimator = "MLR")
summary(fit_CL, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-21 ended normally after 23 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 1470
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 0.000 0.000
## Degrees of freedom 0 0
##
## Model Test Baseline Model:
##
## Test statistic 4856.332 4584.279
## Degrees of freedom 3 3
## P-value 0.000 0.000
## Scaling correction factor 1.059
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000 1.000
## Tucker-Lewis Index (TLI) 1.000 1.000
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3827.853 -3827.853
## Loglikelihood unrestricted model (H1) -3827.853 -3827.853
##
## Akaike (AIC) 7667.705 7667.705
## Bayesian (BIC) 7699.463 7699.463
## Sample-size adjusted Bayesian (SABIC) 7680.403 7680.403
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 NA
## 90 Percent confidence interval - lower 0.000 NA
## 90 Percent confidence interval - upper 0.000 NA
## P-value H_0: RMSEA <= 0.050 NA NA
## P-value H_0: RMSEA >= 0.080 NA NA
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: Robust RMSEA <= 0.050 NA
## P-value H_0: Robust RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000 0.000
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CareerLevel =~
## MonthlyIncome 1.000 0.969 0.969
## JobLevel 1.012 0.011 93.947 0.000 0.980 0.981
## TotalWorkngYrs 0.823 0.019 42.863 0.000 0.797 0.798
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .MonthlyIncome 0.061 0.008 7.839 0.000 0.061 0.061
## .JobLevel 0.038 0.007 5.441 0.000 0.038 0.038
## .TotalWorkngYrs 0.364 0.017 20.810 0.000 0.364 0.364
## CareerLevel 0.938 0.046 20.506 0.000 1.000 1.000
This construct has only 2 indicators and is therefore saturated. It is included here for completeness and evaluated in the combined CFA stage.
model_PR <- '
PerfReward =~ PercentSalaryHike + PerformanceRating
'
fit_PR <- cfa(model_PR, data = df_scaled, estimator = "MLR")
summary(fit_PR, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-21 ended normally after 13 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 4
##
## Number of observations 1470
##
## Model Test User Model:
##
## Test statistic NA
## Degrees of freedom -1
## P-value (Unknown) NA
##
## Test statistic NA
## Degrees of freedom -1
## P-value (Unknown) NA
##
## Model Test Baseline Model:
##
## Test statistic NA NA
## Degrees of freedom NA NA
## P-value NA NA
## Scaling correction factor NA
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) NA NA
## Tucker-Lewis Index (TLI) NA NA
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3500.177 -3500.177
## Loglikelihood unrestricted model (H1) -3500.177 -3500.177
##
## Akaike (AIC) 7008.353 7008.353
## Bayesian (BIC) 7029.526 7029.526
## Sample-size adjusted Bayesian (SABIC) 7016.819 7016.819
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 NA
## 90 Percent confidence interval - lower NA NA
## 90 Percent confidence interval - upper NA NA
## P-value H_0: RMSEA <= 0.050 NA NA
## P-value H_0: RMSEA >= 0.080 NA NA
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower NA
## 90 Percent confidence interval - upper NA
## P-value H_0: Robust RMSEA <= 0.050 NA
## P-value H_0: Robust RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000 0.000
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## PerfReward =~
## PercentSalryHk 1.000 0.929 0.929
## PerformancRtng 0.896 0.008 114.736 0.000 0.832 0.833
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .PercentSalryHk 0.137 0.022 6.188 0.000 0.137 0.137
## .PerformancRtng 0.306 0.020 15.104 0.000 0.306 0.307
## PerfReward 0.862 0.048 18.145 0.000 1.000 1.000
model_TS <- '
TenureStability =~ YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager
'
fit_TS <- cfa(model_TS, data = df_scaled, estimator = "MLR")
summary(fit_TS, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-21 ended normally after 18 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 1470
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 0.000 0.000
## Degrees of freedom 0 0
##
## Model Test Baseline Model:
##
## Test statistic 2729.645 627.699
## Degrees of freedom 3 3
## P-value 0.000 0.000
## Scaling correction factor 4.349
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000 1.000
## Tucker-Lewis Index (TLI) 1.000 1.000
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -4891.196 -4891.196
## Loglikelihood unrestricted model (H1) -4891.196 -4891.196
##
## Akaike (AIC) 9794.391 9794.391
## Bayesian (BIC) 9826.149 9826.149
## Sample-size adjusted Bayesian (SABIC) 9807.089 9807.089
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 NA
## 90 Percent confidence interval - lower 0.000 NA
## 90 Percent confidence interval - upper 0.000 NA
## P-value H_0: RMSEA <= 0.050 NA NA
## P-value H_0: RMSEA >= 0.080 NA NA
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: Robust RMSEA <= 0.050 NA
## P-value H_0: Robust RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000 0.000
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## TenureStability =~
## YearsAtCompany 1.000 0.904 0.904
## YearsInCrrntRl 0.929 0.034 26.952 0.000 0.839 0.839
## YersWthCrrMngr 0.941 0.033 28.613 0.000 0.851 0.851
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .YearsAtCompany 0.183 0.030 6.065 0.000 0.183 0.183
## .YearsInCrrntRl 0.295 0.025 11.911 0.000 0.295 0.295
## .YersWthCrrMngr 0.276 0.026 10.618 0.000 0.276 0.276
## TenureStabilty 0.816 0.046 17.567 0.000 1.000 1.000
model_AR <- '
AttritionRisk =~ YearsSinceLastPromotion + NumCompaniesWorked + OverTime
'
fit_AR <- cfa(model_AR, data = df_scaled, estimator = "MLR")
summary(fit_AR, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-21 ended normally after 46 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 1470
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 0.000 0.000
## Degrees of freedom 0 0
##
## Model Test Baseline Model:
##
## Test statistic 2.878 2.903
## Degrees of freedom 3 3
## P-value 0.411 0.407
## Scaling correction factor 0.991
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000 1.000
## Tucker-Lewis Index (TLI) 1.000 1.000
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -6254.579 -6254.579
## Loglikelihood unrestricted model (H1) -6254.579 -6254.579
##
## Akaike (AIC) 12521.159 12521.159
## Bayesian (BIC) 12552.917 12552.917
## Sample-size adjusted Bayesian (SABIC) 12533.857 12533.857
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 NA
## 90 Percent confidence interval - lower 0.000 NA
## 90 Percent confidence interval - upper 0.000 NA
## P-value H_0: RMSEA <= 0.050 NA NA
## P-value H_0: RMSEA >= 0.080 NA NA
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: Robust RMSEA <= 0.050 NA
## P-value H_0: Robust RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000 0.000
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## AttritionRisk =~
## YrsSncLstPrmtn 1.000 NA NA
## NumCompansWrkd 1.698 4.245 0.400 0.689 NA NA
## OverTime 0.565 0.801 0.704 0.481 NA NA
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .YrsSncLstPrmtn 1.021 0.084 12.212 0.000 1.021 1.022
## .NumCompansWrkd 1.062 0.167 6.344 0.000 1.062 1.063
## .OverTime 1.006 0.031 32.711 0.000 1.006 1.007
## AttritionRisk -0.022 0.056 -0.385 0.700 NA NA
Summary of First Order CFA Results:
Each construct was estimated individually using CFA with the MLR estimator. Key interpretation points:
All constructs are estimated simultaneously to evaluate overall
measurement model fit, convergent validity, discriminant validity, and
reliability. The PerfReward construct is fixed with a unit
variance constraint to handle its 2-indicator structure within the
combined model.
model_CFA_full <- '
CareerLevel =~ MonthlyIncome + JobLevel + TotalWorkingYears
PerfReward =~ PercentSalaryHike + PerformanceRating
TenureStability =~ YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager
AttritionRisk =~ YearsSinceLastPromotion + NumCompaniesWorked + OverTime
'
fit_CFA_full <- cfa(model_CFA_full,
data = df_scaled,
estimator = "MLR")
cat("CFA Combined Converged:", lavInspect(fit_CFA_full, "converged"), "\n\n")
## CFA Combined Converged: FALSE
summary(fit_CFA_full, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-21 did NOT end normally after 2281 iterations
## ** WARNING ** Estimates below are most likely unreliable
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 28
##
## Number of observations 1470
##
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CareerLevel =~
## MonthlyIncome 1.000 0.967 0.967
## JobLevel 1.014 NA 0.981 0.981
## TotalWorkngYrs 0.828 NA 0.801 0.801
## PerfReward =~
## PercentSalryHk 1.000 60.703 60.686
## PerformancRtng 0.000 NA 0.013 0.013
## TenureStability =~
## YearsAtCompany 1.000 0.941 0.941
## YearsInCrrntRl 0.869 NA 0.817 0.817
## YersWthCrrMngr 0.872 NA 0.820 0.821
## AttritionRisk =~
## YrsSncLstPrmtn 1.000 0.443 0.443
## NumCompansWrkd -0.190 NA -0.084 -0.084
## OverTime -0.033 NA -0.015 -0.015
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CareerLevel ~~
## PerfReward -0.017 NA -0.000 -0.000
## TenureStabilty 0.498 NA 0.547 0.547
## AttritionRisk 0.320 NA 0.748 0.748
## PerfReward ~~
## TenureStabilty -0.037 NA -0.001 -0.001
## AttritionRisk -0.035 NA -0.001 -0.001
## TenureStability ~~
## AttritionRisk 0.614 NA 1.475 1.475
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .MonthlyIncome 0.064 NA 0.064 0.064
## .JobLevel 0.037 NA 0.037 0.037
## .TotalWorkngYrs 0.358 NA 0.358 0.359
## .PercentSalryHk -3683.849 NA -3683.849 -3681.791
## .PerformancRtng 0.999 NA 0.999 1.000
## .YearsAtCompany 0.114 NA 0.114 0.115
## .YearsInCrrntRl 0.332 NA 0.332 0.332
## .YersWthCrrMngr 0.326 NA 0.326 0.327
## .YrsSncLstPrmtn 0.803 NA 0.803 0.804
## .NumCompansWrkd 0.992 NA 0.992 0.993
## .OverTime 0.999 NA 0.999 1.000
## CareerLevel 0.935 NA 1.000 1.000
## PerfReward 3684.850 NA 1.000 1.000
## TenureStabilty 0.885 NA 1.000 1.000
## AttritionRisk 0.196 NA 1.000 1.000
Convergent validity requires standardized factor loadings >= 0.5 and AVE >= 0.5 per construct.
std_loadings <- lavaan::standardizedSolution(fit_CFA_full)
loadings_lv <- std_loadings[std_loadings$op == "=~", ]
cat("Standardized Factor Loadings\n")
## Standardized Factor Loadings
print(loadings_lv[, c("lhs", "rhs", "est.std", "pvalue")], row.names = FALSE)
## lhs rhs est.std pvalue
## CareerLevel MonthlyIncome 0.967 NA
## CareerLevel JobLevel 0.981 NA
## CareerLevel TotalWorkingYears 0.801 NA
## PerfReward PercentSalaryHike 60.686 NA
## PerfReward PerformanceRating 0.013 NA
## TenureStability YearsAtCompany 0.941 NA
## TenureStability YearsInCurrentRole 0.817 NA
## TenureStability YearsWithCurrManager 0.821 NA
## AttritionRisk YearsSinceLastPromotion 0.443 NA
## AttritionRisk NumCompaniesWorked -0.084 NA
## AttritionRisk OverTime -0.015 NA
cat("\nAVE per Construct\n")
##
## AVE per Construct
konstruk <- unique(loadings_lv$lhs)
ave_list <- c()
for (k in konstruk) {
lambdas <- loadings_lv$est.std[loadings_lv$lhs == k]
ave_val <- mean(lambdas^2)
ave_list[k] <- ave_val
cat(sprintf("%-20s AVE = %.4f (%s)\n",
k, ave_val,
ifelse(ave_val >= 0.5, "Valid", "Needs Review")))
}
## CareerLevel AVE = 0.8469 (Valid)
## PerfReward AVE = 1841.3955 (Valid)
## TenureStability AVE = 0.7423 (Valid)
## AttritionRisk AVE = 0.0678 (Needs Review)
Construct reliability requires CR >= 0.7 and Cronbach’s Alpha >= 0.7.
cat("Composite Reliability and Cronbach's Alpha\n\n")
## Composite Reliability and Cronbach's Alpha
for (k in konstruk) {
lambdas <- loadings_lv$est.std[loadings_lv$lhs == k]
n_ind <- length(lambdas)
cr_val <- sum(lambdas)^2 / (sum(lambdas)^2 + (n_ind - sum(lambdas^2)))
alpha_val <- ifelse(n_ind > 1,
(n_ind / (n_ind - 1)) * (1 - sum(1 - lambdas^2) / n_ind),
NA)
cat(sprintf("%-20s CR = %.4f (%s) | Alpha = %s\n",
k,
cr_val, ifelse(cr_val >= 0.7, "Reliable", "Needs Review"),
ifelse(is.na(alpha_val), "N/A (2 indicators)",
sprintf("%.4f (%s)", alpha_val,
ifelse(alpha_val >= 0.7, "Reliable", "Needs Review")))))
}
## CareerLevel CR = 0.9428 (Reliable) | Alpha = 1.2704 (Reliable)
## PerfReward CR = 1038.5024 (Reliable) | Alpha = 3682.7909 (Reliable)
## TenureStability CR = 0.8959 (Reliable) | Alpha = 1.1134 (Reliable)
## AttritionRisk CR = 0.0406 (Needs Review) | Alpha = 0.1017 (Needs Review)
Discriminant validity holds when the square root of AVE for each construct exceeds its inter-construct correlations.
cat("Fornell-Larcker Matrix\n")
## Fornell-Larcker Matrix
cat("(Diagonal = square root of AVE; Off-diagonal = inter-construct correlations)\n\n")
## (Diagonal = square root of AVE; Off-diagonal = inter-construct correlations)
cov_lv <- lavInspect(fit_CFA_full, "cor.lv")
fl_matrix <- cov_lv
diag(fl_matrix) <- sqrt(ave_list[rownames(fl_matrix)])
print(round(fl_matrix, 4))
## CrrLvl PrfRwr TnrStb AttrtR
## CareerLevel 0.920
## PerfReward 0.000 42.911
## TenureStability 0.547 -0.001 0.862
## AttritionRisk 0.748 -0.001 1.475 0.260
cat("\nNote: Diagonal values must exceed all values in their row and column.\n")
##
## Note: Diagonal values must exceed all values in their row and column.
After the measurement model is confirmed, the full SEM integrates the outer model (measurement) with the inner model (structural paths). The MLR estimator is used for robustness against non-normality.
model_SEM <- '
# OUTER MODEL
CareerLevel =~ MonthlyIncome + JobLevel + TotalWorkingYears
PerfReward =~ PercentSalaryHike + PerformanceRating
TenureStability =~ YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager
AttritionRisk =~ YearsSinceLastPromotion + NumCompaniesWorked + OverTime
# INNER MODEL
TenureStability ~ CareerLevel + PerfReward
AttritionRisk ~ TenureStability + CareerLevel
'
fit_SEM <- sem(model_SEM,
data = df_scaled,
estimator = "MLR",
optim.method = "BFGS",
control = list(iter.max = 10000))
cat("Convergence Status\n")
## Convergence Status
cat("Converged:", lavInspect(fit_SEM, "converged"), "\n\n")
## Converged: TRUE
summary(fit_SEM, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-21 ended normally after 487 iterations
##
## Estimator ML
## Optimization method BFGS
## Number of model parameters 27
##
## Number of observations 1470
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 630.813 541.975
## Degrees of freedom 39 39
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.164
## Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
## Test statistic 10758.061 8170.169
## Degrees of freedom 55 55
## P-value 0.000 0.000
## Scaling correction factor 1.317
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.945 0.938
## Tucker-Lewis Index (TLI) 0.922 0.913
##
## Robust Comparative Fit Index (CFI) 0.945
## Robust Tucker-Lewis Index (TLI) 0.923
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -17875.110 -17875.110
## Scaling correction factor 1.608
## for the MLR correction
## Loglikelihood unrestricted model (H1) -17559.704 -17559.704
## Scaling correction factor 1.346
## for the MLR correction
##
## Akaike (AIC) 35804.221 35804.221
## Bayesian (BIC) 35947.132 35947.132
## Sample-size adjusted Bayesian (SABIC) 35861.361 35861.361
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.102 0.094
## 90 Percent confidence interval - lower 0.095 0.087
## 90 Percent confidence interval - upper 0.109 0.100
## P-value H_0: RMSEA <= 0.050 0.000 0.000
## P-value H_0: RMSEA >= 0.080 1.000 1.000
##
## Robust RMSEA 0.101
## 90 Percent confidence interval - lower 0.094
## 90 Percent confidence interval - upper 0.109
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.066 0.066
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CareerLevel =~
## MonthlyIncome 1.000 0.967 0.967
## JobLevel 1.014 0.010 105.712 0.000 0.981 0.981
## TotalWorkngYrs 0.828 0.019 42.868 0.000 0.801 0.801
## PerfReward =~
## PercentSalryHk 1.000 3.193 3.192
## PerformancRtng 0.076 0.053 1.424 0.155 0.242 0.242
## TenureStability =~
## YearsAtCompany 1.000 0.941 0.941
## YearsInCrrntRl 0.869 0.038 23.124 0.000 0.817 0.817
## YersWthCrrMngr 0.872 0.037 23.825 0.000 0.820 0.821
## AttritionRisk =~
## YrsSncLstPrmtn 1.000 0.442 0.442
## NumCompansWrkd -0.190 0.052 -3.678 0.000 -0.084 -0.084
## OverTime -0.034 0.040 -0.850 0.395 -0.015 -0.015
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## TenureStability ~
## CareerLevel 0.532 0.038 14.130 0.000 0.547 0.547
## PerfReward -0.003 0.002 -1.402 0.161 -0.010 -0.010
## AttritionRisk ~
## TenureStabilty 0.716 0.038 19.062 0.000 1.524 1.524
## CareerLevel -0.039 0.033 -1.169 0.242 -0.084 -0.084
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CareerLevel ~~
## PerfReward -0.019 0.017 -1.096 0.273 -0.006 -0.006
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .MonthlyIncome 0.064 0.006 10.364 0.000 0.064 0.064
## .JobLevel 0.037 0.005 6.747 0.000 0.037 0.037
## .TotalWorkngYrs 0.358 0.017 20.567 0.000 0.358 0.359
## .PercentSalryHk -9.195 7.169 -1.283 0.200 -9.195 -9.191
## .PerformancRtng 0.941 0.061 15.523 0.000 0.941 0.941
## .YearsAtCompany 0.114 0.023 5.038 0.000 0.114 0.115
## .YearsInCrrntRl 0.332 0.026 12.799 0.000 0.332 0.332
## .YersWthCrrMngr 0.326 0.028 11.840 0.000 0.326 0.327
## .YrsSncLstPrmtn 0.804 0.123 6.537 0.000 0.804 0.804
## .NumCompansWrkd 0.992 0.037 26.559 0.000 0.992 0.993
## .OverTime 0.999 0.025 39.794 0.000 0.999 1.000
## CareerLevel 0.935 0.045 20.573 0.000 1.000 1.000
## PerfReward 10.195 7.173 1.421 0.155 1.000 1.000
## .TenureStabilty 0.620 0.033 18.739 0.000 0.701 0.701
## .AttritionRisk -0.232 0.115 -2.019 0.043 -1.188 -1.188
The path diagram visualizes the latent constructs, indicators, and structural relationships with standardized path coefficients.
semPaths(fit_SEM,
whatLabels = "std",
layout = "tree2",
edge.label.cex = 0.8,
sizeMan = 5,
sizeLat = 8,
color = list(lat = "#AED6F1", man = "#D5F5E3"),
title = TRUE,
style = "ram")
The SEM fit is evaluated against standard indices. The model is considered acceptable when the majority of criteria are satisfied.
if (!lavInspect(fit_SEM, "converged")) {
cat("The SEM model did not converge. Please check model specification.\n")
} else {
fit_idx <- fitMeasures(fit_SEM, c(
"chisq", "df", "pvalue",
"cfi", "tli",
"rmsea", "rmsea.ci.lower", "rmsea.ci.upper",
"srmr", "gfi", "agfi", "nfi"
))
cat("SEM Model Fit Evaluation\n\n")
fit_tabel <- data.frame(
Index = c("Chi-Square", "df", "p-value Chi-Square",
"CFI", "TLI", "RMSEA",
"RMSEA CI Lower", "RMSEA CI Upper",
"SRMR", "GFI", "AGFI", "NFI"),
Value = round(fit_idx, 4),
Cutoff = c("sensitive to N", "--", "> 0.05",
"> 0.90", "> 0.90", "< 0.08",
"--", "--",
"< 0.08", "> 0.90", "> 0.90", "> 0.90"),
Status = c(
ifelse(fit_idx["pvalue"] > 0.05, "Good Fit", "Marginal"),
"--",
ifelse(fit_idx["pvalue"] > 0.05, "Good Fit", "Marginal"),
ifelse(fit_idx["cfi"] > 0.90, "Good Fit", "Marginal"),
ifelse(fit_idx["tli"] > 0.90, "Good Fit", "Marginal"),
ifelse(fit_idx["rmsea"] < 0.08, "Good Fit", "Marginal"),
"--", "--",
ifelse(fit_idx["srmr"] < 0.08, "Good Fit", "Marginal"),
ifelse(fit_idx["gfi"] > 0.90, "Good Fit", "Marginal"),
ifelse(fit_idx["agfi"] > 0.90, "Good Fit", "Marginal"),
ifelse(fit_idx["nfi"] > 0.90, "Good Fit", "Marginal")
)
)
print(fit_tabel, row.names = FALSE)
}
## SEM Model Fit Evaluation
##
## Index Value Cutoff Status
## Chi-Square 630.8133 sensitive to N Marginal
## df 39.0000 -- --
## p-value Chi-Square 0.0000 > 0.05 Marginal
## CFI 0.9447 > 0.90 Good Fit
## TLI 0.9220 > 0.90 Good Fit
## RMSEA 0.1016 < 0.08 Marginal
## RMSEA CI Lower 0.0947 -- --
## RMSEA CI Upper 0.1087 -- --
## SRMR 0.0661 < 0.08 Good Fit
## GFI 0.9388 > 0.90 Good Fit
## AGFI 0.8964 > 0.90 Marginal
## NFI 0.9414 > 0.90 Good Fit
Based on the model fit evaluation above, the following interpretations apply:
Overall, the model fit is evaluated holistically. If the majority of indices meet their respective thresholds, the model is considered acceptable for hypothesis testing.
Hypothesis testing is based on standardized path coefficients and p-values at alpha = 0.05. A hypothesis is supported if p-value < 0.05.
if (!lavInspect(fit_SEM, "converged")) {
cat("Path coefficients cannot be extracted model did not converge.\n")
} else {
param_all <- parameterEstimates(fit_SEM, standardized = TRUE)
jalur_struct <- param_all[param_all$op == "~", ]
n_paths <- nrow(jalur_struct)
hypo_labels <- paste0("H", seq_len(n_paths))
cat("Structural Path Coefficients (Inner Model)\n\n")
hasil_hipotesis <- data.frame(
Hypothesis = hypo_labels,
Path = paste0(jalur_struct$rhs, " -> ", jalur_struct$lhs),
Std_Coef = round(jalur_struct$std.all, 4),
SE = round(jalur_struct$se, 4),
Z_value = round(jalur_struct$z, 4),
P_value = round(jalur_struct$pvalue, 4),
Decision = ifelse(jalur_struct$pvalue < 0.05, "Supported", "Not Supported")
)
print(hasil_hipotesis, row.names = FALSE)
}
## Structural Path Coefficients (Inner Model)
##
## Hypothesis Path Std_Coef SE Z_value P_value
## H1 CareerLevel -> TenureStability 0.5469 0.0376 14.1303 0.0000
## H2 PerfReward -> TenureStability -0.0098 0.0021 -1.4022 0.1609
## H3 TenureStability -> AttritionRisk 1.5237 0.0376 19.0616 0.0000
## H4 CareerLevel -> AttritionRisk -0.0843 0.0329 -1.1693 0.2423
## Decision
## Supported
## Not Supported
## Supported
## Not Supported
Interpretation of hypothesis testing results:
A hypothesis is considered supported if its p-value < 0.05 (two-tailed, α = 0.05). The sign of the standardized coefficient should be consistent with the predicted direction.
R-squared measures the proportion of variance in each endogenous construct explained by the exogenous constructs. For CB-SEM, the interpretation follows Hair et al. (2010): R² ≥ 0.50 is considered Substantial, R² 0.25–0.49 is Moderate, and R² < 0.25 is Weak. Note that the thresholds of 0.67/0.33 are specific to PLS-SEM (Cohen, 1988) and are not applied here.
if (!lavInspect(fit_SEM, "converged")) {
cat("R-squared cannot be computed model did not converge.\n")
} else {
r2_values <- lavInspect(fit_SEM, "r2")
cat("Coefficient of Determination (R-squared)\n\n")
r2_df <- data.frame(
Endogenous_Construct = names(r2_values),
R2 = round(r2_values, 4),
Interpretation = ifelse(r2_values >= 0.50, "Substantial (>= 0.50)",
ifelse(r2_values >= 0.25, "Moderate (0.25-0.49)",
"Weak (< 0.25)"))
)
print(r2_df, row.names = FALSE)
}
## Coefficient of Determination (R-squared)
##
## Endogenous_Construct R2 Interpretation
## MonthlyIncome 0.9360 Substantial (>= 0.50)
## JobLevel 0.9633 Substantial (>= 0.50)
## TotalWorkingYears 0.6415 Substantial (>= 0.50)
## PercentSalaryHike NA <NA>
## PerformanceRating 0.0588 Weak (< 0.25)
## YearsAtCompany 0.8855 Substantial (>= 0.50)
## YearsInCurrentRole 0.6679 Substantial (>= 0.50)
## YearsWithCurrManager 0.6734 Substantial (>= 0.50)
## YearsSinceLastPromotion 0.1956 Weak (< 0.25)
## NumCompaniesWorked 0.0071 Weak (< 0.25)
## OverTime 0.0002 Weak (< 0.25)
## TenureStability 0.2993 Moderate (0.25-0.49)
## AttritionRisk NA <NA>
R-squared reflects the explanatory power of the exogenous constructs on each endogenous construct. A Substantial R² (≥ 0.50) indicates that the model explains the majority of variance in the dependent latent variable. A Moderate R² (0.25–0.49) is still considered informative, particularly in social science research where behavioral constructs are inherently complex and difficult to predict perfectly from a limited set of indicators.
If the model fit does not meet the required criteria, Modification Indices (MI) can guide re-specification. A high MI value (above 10) suggests a meaningful potential improvement. Re-specification should only be performed when supported by theoretical justification.
if (!lavInspect(fit_SEM, "converged")) {
cat("Modification indices cannot be computed model did not converge.\n")
} else {
mi <- modindices(fit_SEM, sort. = TRUE, maximum.number = 15)
cat("Top 15 Modification Indices\n\n")
print(mi[, c("lhs", "op", "rhs", "mi", "epc")], row.names = FALSE)
cat("\nNote: Re-specification is only recommended when theoretically justified.\n")
}
## Top 15 Modification Indices
##
## lhs op rhs mi epc
## MonthlyIncome ~~ JobLevel 204.872 0.441
## TenureStability =~ TotalWorkingYears 198.851 0.303
## AttritionRisk =~ TotalWorkingYears 189.844 0.373
## CareerLevel =~ NumCompaniesWorked 96.892 0.321
## CareerLevel =~ YearsSinceLastPromotion 96.396 1.658
## TenureStability =~ YearsSinceLastPromotion 94.347 29.508
## TotalWorkingYears ~~ NumCompaniesWorked 86.928 0.147
## CareerLevel =~ YearsAtCompany 77.300 0.181
## TotalWorkingYears ~~ YearsAtCompany 66.918 0.066
## YearsInCurrentRole ~~ YearsWithCurrManager 61.685 0.104
## YearsAtCompany ~~ YearsInCurrentRole 41.415 -0.104
## TenureStability =~ PercentSalaryHike 40.772 -23.145
## TenureStability =~ PerformanceRating 40.680 1.755
## CareerLevel =~ YearsWithCurrManager 37.851 -0.133
## TenureStability =~ NumCompaniesWorked 28.860 1.713
##
## Note: Re-specification is only recommended when theoretically justified.
Based on the complete CB-SEM analysis carried out in this study, the following conclusions are drawn:
Assumption Testing The three key assumptions (multivariate normality via Mardia’s Test, data adequacy via KMO Overall MSA > 0.5, and non-multicollinearity via VIF) were verified prior to further analysis. For CB-SEM, the applicable VIF threshold is VIF < 10 (Hair et al., 2010), with VIF < 5 as the conservative benchmark.
Construct Redesign Constructs were redesigned based on empirical inter-indicator correlation patterns. Only indicator groups with mutual correlations above 0.30 were retained, resulting in four well-defined constructs: Career Level, Performance and Reward, Tenure Stability, and Attrition Risk.
First Order CFA CFA per construct confirmed that each indicator validly measures its respective latent construct, with standardized loadings evaluated against the 0.5 threshold.
Second Order CFA (Combined Measurement Model) The combined measurement model evaluated convergent validity (AVE), reliability (CR and Cronbach’s Alpha), and discriminant validity (Fornell-Larcker Criterion) across all constructs simultaneously.
Full SEM The structural model tested four hypotheses regarding the influence of Career Level and Performance and Reward on Tenure Stability, and the effect of Tenure Stability and Career Level on Attrition Risk among employees.
Type of SEM Used This analysis applies
CB-SEM (Covariance-Based SEM) with the MLR
(Maximum Likelihood Robust) estimator using the
lavaan package in R, consistent with the reflective nature
of all constructs, the presence of non-normal data, and the
theory-confirmatory analytical objective.