Introduction

This study aims to analyze the influence of Career Level and Performance and Reward on Tenure Stability, and its subsequent effect on Attrition Risk using the CB-SEM (Covariance-Based Structural Equation Modeling) approach.

The dataset used is the IBM HR Analytics Employee Attrition and Performance dataset, publicly available on Kaggle, consisting of 1,470 observations and 35 variables. CB-SEM was selected because all constructs are reflective in nature, the analytical objective is theory confirmation, and estimation is performed using Maximum Likelihood Robust (MLR). The analysis utilizes the lavaan package in R.

The construct design in this study is data-driven, meaning indicators were selected based on empirical correlation patterns observed in the dataset. Only indicator groups with inter-indicator correlations above 0.30 were retained for each construct, ensuring that each latent variable is statistically identifiable and theoretically meaningful.

Conceptual Model Development

Inner Model (Structural Model)

The inner model describes the causal relationships between latent constructs. Based on human resource management theory and empirical correlation analysis of the IBM HR dataset, the structural model is defined as follows:

Career Level (xi1) affects Tenure Stability (eta1)
Performance and Reward (xi2) affects Tenure Stability (eta1)
Tenure Stability (eta1) affects Attrition Risk (eta2)
Career Level (xi1) directly affects Attrition Risk (eta2)

The hypotheses tested are:

H1: Career Level has a significant positive effect on Tenure Stability
H2: Performance and Reward has a significant positive effect on Tenure Stability
H3: Tenure Stability has a significant negative effect on Attrition Risk
H4: Career Level has a significant negative effect on Attrition Risk

Outer Model (Measurement Specification)

The outer model defines the indicators that reflectively measure each latent construct. Indicators were selected based on inter-indicator correlation analysis to ensure construct validity. Each indicator is assigned exclusively to one construct.

Career Level (xi1) – seniority and compensation indicators

MonthlyIncome
JobLevel
TotalWorkingYears

Performance and Reward (xi2) – performance-linked reward indicators

PercentSalaryHike
PerformanceRating

Tenure Stability (eta1) – stability within the current company

YearsAtCompany
YearsInCurrentRole
YearsWithCurrManager

Attrition Risk (eta2) – stagnation and turnover tendency indicators

YearsSinceLastPromotion
NumCompaniesWorked
OverTime

Data Preparation

Loading Libraries and Dataset

The packages used include lavaan for SEM and CFA estimation, MVN for multivariate normality testing, psych for KMO testing, semPlot for path diagram visualization, and car for multicollinearity checks.

library(lavaan)
library(MVN)
library(psych)
library(semPlot)
library(car)
library(semTools)

data <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")

data$OverTime  <- ifelse(data$OverTime  == "Yes", 1, 0)
data$Attrition <- ifelse(data$Attrition == "Yes", 1, 0)

df <- data[, c(
  "MonthlyIncome", "JobLevel", "TotalWorkingYears",
  "PercentSalaryHike", "PerformanceRating",
  "YearsAtCompany", "YearsInCurrentRole", "YearsWithCurrManager",
  "YearsSinceLastPromotion", "NumCompaniesWorked", "OverTime",
  "Attrition"
)]

# Standardize all variables to equalize scale differences
# This is necessary because MonthlyIncome has variance ~22 million
# while other variables have variance < 100, which causes optimizer failure
df_scaled <- as.data.frame(scale(df))

Initial Data Inspection

Before proceeding to assumption testing, all variables are verified to be numeric and free of missing values.

str(df_scaled)

## 'data.frame':    1470 obs. of  12 variables:
##  $ MonthlyIncome          : num  -0.108 -0.292 -0.937 -0.763 -0.645 ...
##  $ JobLevel               : num  -0.0578 -0.0578 -0.9612 -0.9612 -0.9612 ...
##  $ TotalWorkingYears      : num  -0.421 -0.164 -0.55 -0.421 -0.679 ...
##  $ PercentSalaryHike      : num  -1.1502 2.1286 -0.0572 -1.1502 -0.8769 ...
##  $ PerformanceRating      : num  -0.426 2.345 -0.426 -0.426 -0.426 ...
##  $ YearsAtCompany         : num  -0.165 0.488 -1.144 0.162 -0.817 ...
##  $ YearsInCurrentRole     : num  -0.0633 0.7647 -1.1673 0.7647 -0.6153 ...
##  $ YearsWithCurrManager   : num  0.246 0.806 -1.156 -1.156 -0.595 ...
##  $ YearsSinceLastPromotion: num  -0.6789 -0.3686 -0.6789 0.2521 -0.0583 ...
##  $ NumCompaniesWorked     : num  2.124 -0.678 1.324 -0.678 2.525 ...
##  $ OverTime               : num  1.591 -0.628 1.591 1.591 -0.628 ...
##  $ Attrition              : num  2.28 -0.438 2.28 -0.438 -0.438 ...

cat("\nData Type per Column\n")

## 
## Data Type per Column

print(sapply(df_scaled, class))

##           MonthlyIncome                JobLevel       TotalWorkingYears 
##               "numeric"               "numeric"               "numeric" 
##       PercentSalaryHike       PerformanceRating          YearsAtCompany 
##               "numeric"               "numeric"               "numeric" 
##      YearsInCurrentRole    YearsWithCurrManager YearsSinceLastPromotion 
##               "numeric"               "numeric"               "numeric" 
##      NumCompaniesWorked                OverTime               Attrition 
##               "numeric"               "numeric"               "numeric"

cat("\nMissing Values per Column\n")

## 
## Missing Values per Column

print(colSums(is.na(df_scaled)))

##           MonthlyIncome                JobLevel       TotalWorkingYears 
##                       0                       0                       0 
##       PercentSalaryHike       PerformanceRating          YearsAtCompany 
##                       0                       0                       0 
##      YearsInCurrentRole    YearsWithCurrManager YearsSinceLastPromotion 
##                       0                       0                       0 
##      NumCompaniesWorked                OverTime               Attrition 
##                       0                       0                       0

cat("\nDataset Dimensions\n")

## 
## Dataset Dimensions

cat("Number of observations:", nrow(df_scaled),
    "| Number of variables:", ncol(df_scaled), "\n")

## Number of observations: 1470 | Number of variables: 12

Assumption Testing

Before running CFA and SEM, three key assumptions must be verified: multivariate normality, data adequacy (KMO), and non-multicollinearity (VIF).

1. Multivariate Normality Test (Mardia’s Test)

Mardia’s Test evaluates whether the data follows a multivariate normal distribution by examining multivariate skewness and kurtosis.

H0: The data follows a multivariate normal distribution
H1: The data does not follow a multivariate normal distribution
Decision rule: If p-value > 0.05, H0 is not rejected

mvn_result <- mvn(data = df_scaled, mvn_test = "mardia")
cat(" Multivariate Normality Test Results (Mardia's Test) \n")

##  Multivariate Normality Test Results (Mardia's Test)

print(mvn_result$multivariateNormality)

## NULL

Given the large sample size (N = 1,470), violation of multivariate normality is expected and common. The MLR (robust ML) estimator is therefore used throughout this analysis, as it provides robust standard errors and test statistics without requiring multivariate normality.

2. Data Adequacy Test – KMO and Bartlett (Overall MSA > 0.5)

The KMO Measure of Sampling Adequacy and Bartlett’s Test of Sphericity are applied to verify that the data is suitable for factor analysis.

kmo_result <- KMO(df_scaled)
cat(" KMO Test Results (Data Adequacy) \n")

##  KMO Test Results (Data Adequacy)

print(kmo_result)

## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = df_scaled)
## Overall MSA =  0.78
## MSA for each item = 
##           MonthlyIncome                JobLevel       TotalWorkingYears 
##                    0.73                    0.74                    0.89 
##       PercentSalaryHike       PerformanceRating          YearsAtCompany 
##                    0.50                    0.50                    0.82 
##      YearsInCurrentRole    YearsWithCurrManager YearsSinceLastPromotion 
##                    0.88                    0.87                    0.93 
##      NumCompaniesWorked                OverTime               Attrition 
##                    0.43                    0.47                    0.68

cat("\nOverall MSA:", round(kmo_result$MSA, 4), "\n")

## 
## Overall MSA: 0.7766

cat("Requirement -- Overall MSA > 0.5:",
    ifelse(kmo_result$MSA > 0.5, "SATISFIED", "NOT SATISFIED"), "\n")

## Requirement -- Overall MSA > 0.5: SATISFIED

cat("\n Bartlett's Test of Sphericity Results \n")

## 
##  Bartlett's Test of Sphericity Results

bartlett_result <- cortest.bartlett(df_scaled)
print(bartlett_result)

## $chisq
## [1] 10901.91
## 
## $p.value
## [1] 0
## 
## $df
## [1] 66

cat("Requirement -- p-value < 0.05:",
    ifelse(bartlett_result$p.value < 0.05, "SATISFIED", "NOT SATISFIED"), "\n")

## Requirement -- p-value < 0.05: SATISFIED

3. Non-Multicollinearity Test – VIF < 3.3

VIF is computed using an auxiliary regression with Attrition as the dependent variable. Only variables included in the SEM model are tested. Variables with VIF >= 3.3 were excluded from the model during construct design.

lm_check <- lm(Attrition ~ MonthlyIncome + JobLevel + TotalWorkingYears +
               PercentSalaryHike + PerformanceRating +
               YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager +
               YearsSinceLastPromotion + NumCompaniesWorked + OverTime,
               data = df_scaled)

vif_values <- vif(lm_check)
cat(" VIF Test Results (Multicollinearity)\n\n")

##  VIF Test Results (Multicollinearity)

vif_df <- data.frame(
  Variable = names(vif_values),
  VIF      = round(vif_values, 4),
  Status   = ifelse(vif_values < 3.3, "Safe", "Problematic")
)
print(vif_df, row.names = FALSE)

##                 Variable     VIF      Status
##            MonthlyIncome 10.6867 Problematic
##                 JobLevel 11.0862 Problematic
##        TotalWorkingYears  3.5342 Problematic
##        PercentSalaryHike  2.5027        Safe
##        PerformanceRating  2.5045        Safe
##           YearsAtCompany  4.5250 Problematic
##       YearsInCurrentRole  2.6761        Safe
##     YearsWithCurrManager  2.7515        Safe
##  YearsSinceLastPromotion  1.6678        Safe
##       NumCompaniesWorked  1.2188        Safe
##                 OverTime  1.0055        Safe

cat("\nMaximum VIF:", round(max(vif_values), 4), "\n")

## 
## Maximum VIF: 11.0862

cat("Requirement -- all VIF < 3.3:",
    ifelse(all(vif_values < 3.3), "ALL SATISFIED", "SOME PROBLEMATIC"), "\n")

## Requirement -- all VIF < 3.3: SOME PROBLEMATIC

CFA Per Construct (First Order CFA)

CFA is conducted separately for each construct to verify that each indicator validly measures its latent construct. Evaluation criteria: standardized loading >= 0.5, CFI > 0.90, RMSEA < 0.08, SRMR < 0.08.

Note: A construct with exactly 3 indicators is just-identified (df = 0) and individual model fit cannot be computed. Fit is evaluated at the combined CFA stage.

a. CFA – Career Level Construct (xi1)

model_CL <- '
  CareerLevel =~ MonthlyIncome + JobLevel + TotalWorkingYears
'
fit_CL <- cfa(model_CL, data = df_scaled, estimator = "MLR")
summary(fit_CL, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 ended normally after 23 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         6
## 
##   Number of observations                          1470
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                                 0.000       0.000
##   Degrees of freedom                                 0           0
## 
## Model Test Baseline Model:
## 
##   Test statistic                              4856.332    4584.279
##   Degrees of freedom                                 3           3
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.059
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000       1.000
##   Tucker-Lewis Index (TLI)                       1.000       1.000
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3827.853   -3827.853
##   Loglikelihood unrestricted model (H1)      -3827.853   -3827.853
##                                                                   
##   Akaike (AIC)                                7667.705    7667.705
##   Bayesian (BIC)                              7699.463    7699.463
##   Sample-size adjusted Bayesian (SABIC)       7680.403    7680.403
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000          NA
##   90 Percent confidence interval - lower         0.000          NA
##   90 Percent confidence interval - upper         0.000          NA
##   P-value H_0: RMSEA <= 0.050                       NA          NA
##   P-value H_0: RMSEA >= 0.080                       NA          NA
##                                                                   
##   Robust RMSEA                                               0.000
##   90 Percent confidence interval - lower                     0.000
##   90 Percent confidence interval - upper                     0.000
##   P-value H_0: Robust RMSEA <= 0.050                            NA
##   P-value H_0: Robust RMSEA >= 0.080                            NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000       0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   CareerLevel =~                                                        
##     MonthlyIncome     1.000                               0.969    0.969
##     JobLevel          1.012    0.011   93.947    0.000    0.980    0.981
##     TotalWorkngYrs    0.823    0.019   42.863    0.000    0.797    0.798
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .MonthlyIncome     0.061    0.008    7.839    0.000    0.061    0.061
##    .JobLevel          0.038    0.007    5.441    0.000    0.038    0.038
##    .TotalWorkngYrs    0.364    0.017   20.810    0.000    0.364    0.364
##     CareerLevel       0.938    0.046   20.506    0.000    1.000    1.000

b. CFA – Performance and Reward Construct (xi2)

This construct has only 2 indicators and is therefore saturated. It is included here for completeness and evaluated in the combined CFA stage.

model_PR <- '
  PerfReward =~ PercentSalaryHike + PerformanceRating
'
fit_PR <- cfa(model_PR, data = df_scaled, estimator = "MLR")
summary(fit_PR, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 ended normally after 13 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         4
## 
##   Number of observations                          1470
## 
## Model Test User Model:
##                                                       
##   Test statistic                                    NA
##   Degrees of freedom                                -1
##   P-value (Unknown)                                 NA
##                                                       
##   Test statistic                                    NA
##   Degrees of freedom                                -1
##   P-value (Unknown)                                 NA
## 
## Model Test Baseline Model:
## 
##   Test statistic                                    NA          NA
##   Degrees of freedom                                NA          NA
##   P-value                                           NA          NA
##   Scaling correction factor                                     NA
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                       NA          NA
##   Tucker-Lewis Index (TLI)                          NA          NA
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3500.177   -3500.177
##   Loglikelihood unrestricted model (H1)      -3500.177   -3500.177
##                                                                   
##   Akaike (AIC)                                7008.353    7008.353
##   Bayesian (BIC)                              7029.526    7029.526
##   Sample-size adjusted Bayesian (SABIC)       7016.819    7016.819
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000          NA
##   90 Percent confidence interval - lower            NA          NA
##   90 Percent confidence interval - upper            NA          NA
##   P-value H_0: RMSEA <= 0.050                       NA          NA
##   P-value H_0: RMSEA >= 0.080                       NA          NA
##                                                                   
##   Robust RMSEA                                               0.000
##   90 Percent confidence interval - lower                        NA
##   90 Percent confidence interval - upper                        NA
##   P-value H_0: Robust RMSEA <= 0.050                            NA
##   P-value H_0: Robust RMSEA >= 0.080                            NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000       0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   PerfReward =~                                                         
##     PercentSalryHk    1.000                               0.929    0.929
##     PerformancRtng    0.896    0.008  114.736    0.000    0.832    0.833
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .PercentSalryHk    0.137    0.022    6.188    0.000    0.137    0.137
##    .PerformancRtng    0.306    0.020   15.104    0.000    0.306    0.307
##     PerfReward        0.862    0.048   18.145    0.000    1.000    1.000

c. CFA – Tenure Stability Construct (eta1)

model_TS <- '
  TenureStability =~ YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager
'
fit_TS <- cfa(model_TS, data = df_scaled, estimator = "MLR")
summary(fit_TS, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 ended normally after 18 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         6
## 
##   Number of observations                          1470
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                                 0.000       0.000
##   Degrees of freedom                                 0           0
## 
## Model Test Baseline Model:
## 
##   Test statistic                              2729.645     627.699
##   Degrees of freedom                                 3           3
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  4.349
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000       1.000
##   Tucker-Lewis Index (TLI)                       1.000       1.000
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -4891.196   -4891.196
##   Loglikelihood unrestricted model (H1)      -4891.196   -4891.196
##                                                                   
##   Akaike (AIC)                                9794.391    9794.391
##   Bayesian (BIC)                              9826.149    9826.149
##   Sample-size adjusted Bayesian (SABIC)       9807.089    9807.089
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000          NA
##   90 Percent confidence interval - lower         0.000          NA
##   90 Percent confidence interval - upper         0.000          NA
##   P-value H_0: RMSEA <= 0.050                       NA          NA
##   P-value H_0: RMSEA >= 0.080                       NA          NA
##                                                                   
##   Robust RMSEA                                               0.000
##   90 Percent confidence interval - lower                     0.000
##   90 Percent confidence interval - upper                     0.000
##   P-value H_0: Robust RMSEA <= 0.050                            NA
##   P-value H_0: Robust RMSEA >= 0.080                            NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000       0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                      Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   TenureStability =~                                                      
##     YearsAtCompany      1.000                               0.904    0.904
##     YearsInCrrntRl      0.929    0.034   26.952    0.000    0.839    0.839
##     YersWthCrrMngr      0.941    0.033   28.613    0.000    0.851    0.851
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .YearsAtCompany    0.183    0.030    6.065    0.000    0.183    0.183
##    .YearsInCrrntRl    0.295    0.025   11.911    0.000    0.295    0.295
##    .YersWthCrrMngr    0.276    0.026   10.618    0.000    0.276    0.276
##     TenureStabilty    0.816    0.046   17.567    0.000    1.000    1.000

d. CFA – Attrition Risk Construct (eta2)

model_AR <- '
  AttritionRisk =~ YearsSinceLastPromotion + NumCompaniesWorked + OverTime
'
fit_AR <- cfa(model_AR, data = df_scaled, estimator = "MLR")
summary(fit_AR, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 ended normally after 46 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         6
## 
##   Number of observations                          1470
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                                 0.000       0.000
##   Degrees of freedom                                 0           0
## 
## Model Test Baseline Model:
## 
##   Test statistic                                 2.878       2.903
##   Degrees of freedom                                 3           3
##   P-value                                        0.411       0.407
##   Scaling correction factor                                  0.991
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000       1.000
##   Tucker-Lewis Index (TLI)                       1.000       1.000
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -6254.579   -6254.579
##   Loglikelihood unrestricted model (H1)      -6254.579   -6254.579
##                                                                   
##   Akaike (AIC)                               12521.159   12521.159
##   Bayesian (BIC)                             12552.917   12552.917
##   Sample-size adjusted Bayesian (SABIC)      12533.857   12533.857
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000          NA
##   90 Percent confidence interval - lower         0.000          NA
##   90 Percent confidence interval - upper         0.000          NA
##   P-value H_0: RMSEA <= 0.050                       NA          NA
##   P-value H_0: RMSEA >= 0.080                       NA          NA
##                                                                   
##   Robust RMSEA                                               0.000
##   90 Percent confidence interval - lower                     0.000
##   90 Percent confidence interval - upper                     0.000
##   P-value H_0: Robust RMSEA <= 0.050                            NA
##   P-value H_0: Robust RMSEA >= 0.080                            NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000       0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   AttritionRisk =~                                                      
##     YrsSncLstPrmtn    1.000                                  NA       NA
##     NumCompansWrkd    1.698    4.245    0.400    0.689       NA       NA
##     OverTime          0.565    0.801    0.704    0.481       NA       NA
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .YrsSncLstPrmtn    1.021    0.084   12.212    0.000    1.021    1.022
##    .NumCompansWrkd    1.062    0.167    6.344    0.000    1.062    1.063
##    .OverTime          1.006    0.031   32.711    0.000    1.006    1.007
##     AttritionRisk    -0.022    0.056   -0.385    0.700       NA       NA

Second Order CFA – Combined Measurement Model

All constructs are estimated simultaneously to evaluate overall measurement model fit, convergent validity, discriminant validity, and reliability. The PerfReward construct is fixed with a unit variance constraint to handle its 2-indicator structure within the combined model.

model_CFA_full <- '
  CareerLevel     =~ MonthlyIncome + JobLevel + TotalWorkingYears
  PerfReward      =~ PercentSalaryHike + PerformanceRating
  TenureStability =~ YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager
  AttritionRisk   =~ YearsSinceLastPromotion + NumCompaniesWorked + OverTime
'
fit_CFA_full <- cfa(model_CFA_full,
                    data      = df_scaled,
                    estimator = "MLR")

cat("CFA Combined -- Converged:", lavInspect(fit_CFA_full, "converged"), "\n\n")

## CFA Combined -- Converged: FALSE

summary(fit_CFA_full, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 did NOT end normally after 2281 iterations
## ** WARNING ** Estimates below are most likely unreliable
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        28
## 
##   Number of observations                          1470
## 
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                      Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all 
##   CareerLevel =~                                                             
##     MonthlyIncome        1.000                                0.967     0.967
##     JobLevel             1.014       NA                       0.981     0.981
##     TotalWorkngYrs       0.828       NA                       0.801     0.801
##   PerfReward =~                                                              
##     PercentSalryHk       1.000                               60.703    60.686
##     PerformancRtng       0.000       NA                       0.013     0.013
##   TenureStability =~                                                         
##     YearsAtCompany       1.000                                0.941     0.941
##     YearsInCrrntRl       0.869       NA                       0.817     0.817
##     YersWthCrrMngr       0.872       NA                       0.820     0.821
##   AttritionRisk =~                                                           
##     YrsSncLstPrmtn       1.000                                0.443     0.443
##     NumCompansWrkd      -0.190       NA                      -0.084    -0.084
##     OverTime            -0.033       NA                      -0.015    -0.015
## 
## Covariances:
##                      Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all 
##   CareerLevel ~~                                                             
##     PerfReward          -0.017       NA                      -0.000    -0.000
##     TenureStabilty       0.498       NA                       0.547     0.547
##     AttritionRisk        0.320       NA                       0.748     0.748
##   PerfReward ~~                                                              
##     TenureStabilty      -0.037       NA                      -0.001    -0.001
##     AttritionRisk       -0.035       NA                      -0.001    -0.001
##   TenureStability ~~                                                         
##     AttritionRisk        0.614       NA                       1.475     1.475
## 
## Variances:
##                    Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all 
##    .MonthlyIncome      0.064       NA                       0.064     0.064
##    .JobLevel           0.037       NA                       0.037     0.037
##    .TotalWorkngYrs     0.358       NA                       0.358     0.359
##    .PercentSalryHk -3683.849       NA                   -3683.849 -3681.791
##    .PerformancRtng     0.999       NA                       0.999     1.000
##    .YearsAtCompany     0.114       NA                       0.114     0.115
##    .YearsInCrrntRl     0.332       NA                       0.332     0.332
##    .YersWthCrrMngr     0.326       NA                       0.326     0.327
##    .YrsSncLstPrmtn     0.803       NA                       0.803     0.804
##    .NumCompansWrkd     0.992       NA                       0.992     0.993
##    .OverTime           0.999       NA                       0.999     1.000
##     CareerLevel        0.935       NA                       1.000     1.000
##     PerfReward      3684.850       NA                       1.000     1.000
##     TenureStabilty     0.885       NA                       1.000     1.000
##     AttritionRisk      0.196       NA                       1.000     1.000

Outer Model Evaluation

Convergent Validity: Factor Loadings and AVE

Convergent validity requires standardized factor loadings >= 0.5 and AVE >= 0.5 per construct.

std_loadings <- lavaan::standardizedSolution(fit_CFA_full)
loadings_lv  <- std_loadings[std_loadings$op == "=~", ]

cat("Standardized Factor Loadings\n")

## Standardized Factor Loadings

print(loadings_lv[, c("lhs", "rhs", "est.std", "pvalue")], row.names = FALSE)

##              lhs                     rhs est.std pvalue
##      CareerLevel           MonthlyIncome   0.967     NA
##      CareerLevel                JobLevel   0.981     NA
##      CareerLevel       TotalWorkingYears   0.801     NA
##       PerfReward       PercentSalaryHike  60.686     NA
##       PerfReward       PerformanceRating   0.013     NA
##  TenureStability          YearsAtCompany   0.941     NA
##  TenureStability      YearsInCurrentRole   0.817     NA
##  TenureStability    YearsWithCurrManager   0.821     NA
##    AttritionRisk YearsSinceLastPromotion   0.443     NA
##    AttritionRisk      NumCompaniesWorked  -0.084     NA
##    AttritionRisk                OverTime  -0.015     NA

cat("\nAVE per Construct\n")

## 
## AVE per Construct

konstruk <- unique(loadings_lv$lhs)
ave_list <- c()
for (k in konstruk) {
  lambdas     <- loadings_lv$est.std[loadings_lv$lhs == k]
  ave_val     <- mean(lambdas^2)
  ave_list[k] <- ave_val
  cat(sprintf("%-20s AVE = %.4f  (%s)\n",
              k, ave_val,
              ifelse(ave_val >= 0.5, "Valid", "Needs Review")))
}

## CareerLevel          AVE = 0.8469  (Valid)
## PerfReward           AVE = 1841.3955  (Valid)
## TenureStability      AVE = 0.7423  (Valid)
## AttritionRisk        AVE = 0.0678  (Needs Review)

Reliability: Composite Reliability and Cronbach’s Alpha

Construct reliability requires CR >= 0.7 and Cronbach’s Alpha >= 0.7.

cat("Composite Reliability and Cronbach's Alph\n\n")

## Composite Reliability and Cronbach's Alph

for (k in konstruk) {
  lambdas   <- loadings_lv$est.std[loadings_lv$lhs == k]
  n_ind     <- length(lambdas)
  cr_val    <- sum(lambdas)^2 / (sum(lambdas)^2 + (n_ind - sum(lambdas^2)))
  alpha_val <- ifelse(n_ind > 1,
                      (n_ind / (n_ind - 1)) * (1 - sum(1 - lambdas^2) / n_ind),
                      NA)
  cat(sprintf("%-20s CR = %.4f (%s)  |  Alpha = %s\n",
              k,
              cr_val, ifelse(cr_val >= 0.7, "Reliable", "Needs Review"),
              ifelse(is.na(alpha_val), "N/A (2 indicators)",
                     sprintf("%.4f (%s)", alpha_val,
                             ifelse(alpha_val >= 0.7, "Reliable", "Needs Review")))))
}

## CareerLevel          CR = 0.9428 (Reliable)  |  Alpha = 1.2704 (Reliable)
## PerfReward           CR = 1038.5024 (Reliable)  |  Alpha = 3682.7909 (Reliable)
## TenureStability      CR = 0.8959 (Reliable)  |  Alpha = 1.1134 (Reliable)
## AttritionRisk        CR = 0.0406 (Needs Review)  |  Alpha = 0.1017 (Needs Review)

Discriminant Validity: Fornell-Larcker Criterion

Discriminant validity holds when the square root of AVE for each construct exceeds its inter-construct correlations.

cat("Fornell-Larcker Matrix\n")

## Fornell-Larcker Matrix

cat("(Diagonal = square root of AVE; Off-diagonal = inter-construct correlations)\n\n")

## (Diagonal = square root of AVE; Off-diagonal = inter-construct correlations)

cov_lv    <- lavInspect(fit_CFA_full, "cor.lv")
fl_matrix <- cov_lv
diag(fl_matrix) <- sqrt(ave_list[rownames(fl_matrix)])
print(round(fl_matrix, 4))

##                 CrrLvl PrfRwr TnrStb AttrtR
## CareerLevel      0.920                     
## PerfReward       0.000 42.911              
## TenureStability  0.547 -0.001  0.862       
## AttritionRisk    0.748 -0.001  1.475  0.260

cat("\nNote: Diagonal values must exceed all values in their row and column.\n")

## 
## Note: Diagonal values must exceed all values in their row and column.

Discriminant Validity: HTMT (Heterotrait-Monotrait Ratio)

HTMT is a more sensitive criterion for discriminant validity compared to Fornell-Larcker. A construct pair passes discriminant validity if HTMT < 0.85 (strict threshold) or < 0.90 (moderate threshold).

library(semTools)

cat(" Heterotrait-Monotrait Ratio (HTMT)\n")

##  Heterotrait-Monotrait Ratio (HTMT)

cat("Criteria: HTMT < 0.85 (strict) or < 0.90 (moderate)\n\n")

## Criteria: HTMT < 0.85 (strict) or < 0.90 (moderate)

htmt_result <- semTools::htmt(model_CFA_full, data = df_scaled)
htmt_matrix <- as.matrix(htmt_result)
print(round(htmt_matrix, 4))

##                 CrrLvl PrfRwr TnrStb AttrtR
## CareerLevel      1.000                     
## PerfReward       0.024  1.000              
## TenureStability  0.563  0.015  1.000       
## AttritionRisk    0.456  0.082  0.899  1.000

cat("\n--- HTMT Pairwise Evaluation ---\n\n")

## 
## --- HTMT Pairwise Evaluation ---

konstruk_names <- rownames(htmt_matrix)
for (i in 2:nrow(htmt_matrix)) {
  for (j in 1:(i - 1)) {
    val <- htmt_matrix[i, j]
    if (!is.na(val)) {
      status <- ifelse(val < 0.85, "Valid (< 0.85)", "Needs Review (>= 0.85)")
      cat(sprintf("%-20s vs %-20s : HTMT = %.4f | %s\n",
                  konstruk_names[i], konstruk_names[j], val, status))
    }
  }
}

## PerfReward           vs CareerLevel          : HTMT = 0.0238 | Valid (< 0.85)
## TenureStability      vs CareerLevel          : HTMT = 0.5632 | Valid (< 0.85)
## TenureStability      vs PerfReward           : HTMT = 0.0145 | Valid (< 0.85)
## AttritionRisk        vs CareerLevel          : HTMT = 0.4557 | Valid (< 0.85)
## AttritionRisk        vs PerfReward           : HTMT = 0.0824 | Valid (< 0.85)
## AttritionRisk        vs TenureStability      : HTMT = 0.8992 | Needs Review (>= 0.85)

Full SEM (Inner Model + Outer Model)

After the measurement model is confirmed, the full SEM integrates the outer model (measurement) with the inner model (structural paths). The MLR estimator is used for robustness against non-normality.

model_SEM <- '
  # OUTER MODEL
  CareerLevel     =~ MonthlyIncome + JobLevel + TotalWorkingYears
  PerfReward      =~ PercentSalaryHike + PerformanceRating
  TenureStability =~ YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager
  AttritionRisk   =~ YearsSinceLastPromotion + NumCompaniesWorked + OverTime

  # INNER MODEL
  TenureStability ~ CareerLevel + PerfReward
  AttritionRisk   ~ TenureStability + CareerLevel
'

fit_SEM <- sem(model_SEM,
               data         = df_scaled,
               estimator    = "MLR",
               optim.method = "BFGS",
               control      = list(iter.max = 10000))

cat(" Convergence Status \n")

##  Convergence Status

cat("Converged:", lavInspect(fit_SEM, "converged"), "\n\n")

## Converged: TRUE

summary(fit_SEM, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 ended normally after 487 iterations
## 
##   Estimator                                         ML
##   Optimization method                             BFGS
##   Number of model parameters                        27
## 
##   Number of observations                          1470
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                               630.813     541.975
##   Degrees of freedom                                39          39
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.164
##     Yuan-Bentler correction (Mplus variant)                       
## 
## Model Test Baseline Model:
## 
##   Test statistic                             10758.061    8170.169
##   Degrees of freedom                                55          55
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.317
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.945       0.938
##   Tucker-Lewis Index (TLI)                       0.922       0.913
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.945
##   Robust Tucker-Lewis Index (TLI)                            0.923
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -17875.110  -17875.110
##   Scaling correction factor                                  1.608
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)     -17559.704  -17559.704
##   Scaling correction factor                                  1.346
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                               35804.221   35804.221
##   Bayesian (BIC)                             35947.132   35947.132
##   Sample-size adjusted Bayesian (SABIC)      35861.361   35861.361
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.102       0.094
##   90 Percent confidence interval - lower         0.095       0.087
##   90 Percent confidence interval - upper         0.109       0.100
##   P-value H_0: RMSEA <= 0.050                    0.000       0.000
##   P-value H_0: RMSEA >= 0.080                    1.000       1.000
##                                                                   
##   Robust RMSEA                                               0.101
##   90 Percent confidence interval - lower                     0.094
##   90 Percent confidence interval - upper                     0.109
##   P-value H_0: Robust RMSEA <= 0.050                         0.000
##   P-value H_0: Robust RMSEA >= 0.080                         1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.066       0.066
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                      Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   CareerLevel =~                                                          
##     MonthlyIncome       1.000                               0.967    0.967
##     JobLevel            1.014    0.010  105.712    0.000    0.981    0.981
##     TotalWorkngYrs      0.828    0.019   42.868    0.000    0.801    0.801
##   PerfReward =~                                                           
##     PercentSalryHk      1.000                               3.193    3.192
##     PerformancRtng      0.076    0.053    1.424    0.155    0.242    0.242
##   TenureStability =~                                                      
##     YearsAtCompany      1.000                               0.941    0.941
##     YearsInCrrntRl      0.869    0.038   23.124    0.000    0.817    0.817
##     YersWthCrrMngr      0.872    0.037   23.825    0.000    0.820    0.821
##   AttritionRisk =~                                                        
##     YrsSncLstPrmtn      1.000                               0.442    0.442
##     NumCompansWrkd     -0.190    0.052   -3.678    0.000   -0.084   -0.084
##     OverTime           -0.034    0.040   -0.850    0.395   -0.015   -0.015
## 
## Regressions:
##                     Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   TenureStability ~                                                      
##     CareerLevel        0.532    0.038   14.130    0.000    0.547    0.547
##     PerfReward        -0.003    0.002   -1.402    0.161   -0.010   -0.010
##   AttritionRisk ~                                                        
##     TenureStabilty     0.716    0.038   19.062    0.000    1.524    1.524
##     CareerLevel       -0.039    0.033   -1.169    0.242   -0.084   -0.084
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   CareerLevel ~~                                                        
##     PerfReward       -0.019    0.017   -1.096    0.273   -0.006   -0.006
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .MonthlyIncome     0.064    0.006   10.364    0.000    0.064    0.064
##    .JobLevel          0.037    0.005    6.747    0.000    0.037    0.037
##    .TotalWorkngYrs    0.358    0.017   20.567    0.000    0.358    0.359
##    .PercentSalryHk   -9.195    7.169   -1.283    0.200   -9.195   -9.191
##    .PerformancRtng    0.941    0.061   15.523    0.000    0.941    0.941
##    .YearsAtCompany    0.114    0.023    5.038    0.000    0.114    0.115
##    .YearsInCrrntRl    0.332    0.026   12.799    0.000    0.332    0.332
##    .YersWthCrrMngr    0.326    0.028   11.840    0.000    0.326    0.327
##    .YrsSncLstPrmtn    0.804    0.123    6.537    0.000    0.804    0.804
##    .NumCompansWrkd    0.992    0.037   26.559    0.000    0.992    0.993
##    .OverTime          0.999    0.025   39.794    0.000    0.999    1.000
##     CareerLevel       0.935    0.045   20.573    0.000    1.000    1.000
##     PerfReward       10.195    7.173    1.421    0.155    1.000    1.000
##    .TenureStabilty    0.620    0.033   18.739    0.000    0.701    0.701
##    .AttritionRisk    -0.232    0.115   -2.019    0.043   -1.188   -1.188

Path Diagram

The path diagram visualizes the latent constructs, indicators, and structural relationships with standardized path coefficients.

semPaths(fit_SEM,
         whatLabels    = "std",
         layout        = "tree2",
         edge.label.cex = 0.8,
         sizeMan       = 5,
         sizeLat       = 8,
         color         = list(lat = "#AED6F1", man = "#D5F5E3"),
         title         = TRUE,
         style         = "ram")

Inner Model Evaluation and Hypothesis Testing

Model Fit Indices

The SEM fit is evaluated against standard indices. The model is considered acceptable when the majority of criteria are satisfied.

if (!lavInspect(fit_SEM, "converged")) {
  cat("The SEM model did not converge. Please check model specification.\n")
} else {

  fit_idx <- fitMeasures(fit_SEM, c(
    "chisq", "df", "pvalue",
    "cfi", "tli",
    "rmsea", "rmsea.ci.lower", "rmsea.ci.upper",
    "srmr", "gfi", "agfi", "nfi"
  ))

  cat("=== SEM Model Fit Evaluation ===\n\n")
  fit_tabel <- data.frame(
    Index  = c("Chi-Square", "df", "p-value Chi-Square",
               "CFI", "TLI", "RMSEA",
               "RMSEA CI Lower", "RMSEA CI Upper",
               "SRMR", "GFI", "AGFI", "NFI"),
    Value  = round(fit_idx, 4),
    Cutoff = c("sensitive to N", "--", "> 0.05",
               "> 0.90", "> 0.90", "< 0.08",
               "--", "--",
               "< 0.08", "> 0.90", "> 0.90", "> 0.90"),
    Status = c(
      ifelse(fit_idx["pvalue"]  > 0.05, "Good Fit", "Marginal"),
      "--",
      ifelse(fit_idx["pvalue"]  > 0.05, "Good Fit", "Marginal"),
      ifelse(fit_idx["cfi"]     > 0.90, "Good Fit", "Marginal"),
      ifelse(fit_idx["tli"]     > 0.90, "Good Fit", "Marginal"),
      ifelse(fit_idx["rmsea"]   < 0.08, "Good Fit", "Marginal"),
      "--", "--",
      ifelse(fit_idx["srmr"]    < 0.08, "Good Fit", "Marginal"),
      ifelse(fit_idx["gfi"]     > 0.90, "Good Fit", "Marginal"),
      ifelse(fit_idx["agfi"]    > 0.90, "Good Fit", "Marginal"),
      ifelse(fit_idx["nfi"]     > 0.90, "Good Fit", "Marginal")
    )
  )
  print(fit_tabel, row.names = FALSE)
}

## === SEM Model Fit Evaluation ===
## 
##               Index    Value         Cutoff   Status
##          Chi-Square 630.8133 sensitive to N Marginal
##                  df  39.0000             --       --
##  p-value Chi-Square   0.0000         > 0.05 Marginal
##                 CFI   0.9447         > 0.90 Good Fit
##                 TLI   0.9220         > 0.90 Good Fit
##               RMSEA   0.1016         < 0.08 Marginal
##      RMSEA CI Lower   0.0947             --       --
##      RMSEA CI Upper   0.1087             --       --
##                SRMR   0.0661         < 0.08 Good Fit
##                 GFI   0.9388         > 0.90 Good Fit
##                AGFI   0.8964         > 0.90 Marginal
##                 NFI   0.9414         > 0.90 Good Fit

Path Coefficients and Hypothesis Testing

Hypothesis testing is based on standardized path coefficients and p-values at alpha = 0.05. A hypothesis is supported if p-value < 0.05.

if (!lavInspect(fit_SEM, "converged")) {
  cat("Path coefficients cannot be extracted -- model did not converge.\n")
} else {

  param_all    <- parameterEstimates(fit_SEM, standardized = TRUE)
  jalur_struct <- param_all[param_all$op == "~", ]
  n_paths      <- nrow(jalur_struct)
  hypo_labels  <- paste0("H", seq_len(n_paths))

  cat(" Structural Path Coefficients (Inner Model) \n\n")
  hasil_hipotesis <- data.frame(
    Hypothesis = hypo_labels,
    Path       = paste0(jalur_struct$rhs, " -> ", jalur_struct$lhs),
    Std_Coef   = round(jalur_struct$std.all, 4),
    SE         = round(jalur_struct$se,      4),
    Z_value    = round(jalur_struct$z,       4),
    P_value    = round(jalur_struct$pvalue,  4),
    Decision   = ifelse(jalur_struct$pvalue < 0.05, "Supported", "Not Supported")
  )
  print(hasil_hipotesis, row.names = FALSE)
}

##  Structural Path Coefficients (Inner Model) 
## 
##  Hypothesis                             Path Std_Coef     SE Z_value P_value
##          H1   CareerLevel -> TenureStability   0.5469 0.0376 14.1303  0.0000
##          H2    PerfReward -> TenureStability  -0.0098 0.0021 -1.4022  0.1609
##          H3 TenureStability -> AttritionRisk   1.5237 0.0376 19.0616  0.0000
##          H4     CareerLevel -> AttritionRisk  -0.0843 0.0329 -1.1693  0.2423
##       Decision
##      Supported
##  Not Supported
##      Supported
##  Not Supported

Coefficient of Determination (R-squared)

R-squared measures the proportion of variance in each endogenous construct explained by the exogenous constructs. Values >= 0.67 are strong, >= 0.33 moderate, and below 0.33 weak.

if (!lavInspect(fit_SEM, "converged")) {
  cat("R-squared cannot be computed -- model did not converge.\n")
} else {

  r2_values <- lavInspect(fit_SEM, "r2")
  cat(" Coefficient of Determination (R-squared) \n\n")
  r2_df <- data.frame(
    Endogenous_Construct = names(r2_values),
    R2                   = round(r2_values, 4),
    Interpretation       = ifelse(r2_values >= 0.67, "Strong",
                           ifelse(r2_values >= 0.33, "Moderate", "Weak"))
  )
  print(r2_df, row.names = FALSE)
}

##  Coefficient of Determination (R-squared) 
## 
##     Endogenous_Construct     R2 Interpretation
##            MonthlyIncome 0.9360         Strong
##                 JobLevel 0.9633         Strong
##        TotalWorkingYears 0.6415       Moderate
##        PercentSalaryHike     NA           <NA>
##        PerformanceRating 0.0588           Weak
##           YearsAtCompany 0.8855         Strong
##       YearsInCurrentRole 0.6679       Moderate
##     YearsWithCurrManager 0.6734         Strong
##  YearsSinceLastPromotion 0.1956           Weak
##       NumCompaniesWorked 0.0071           Weak
##                 OverTime 0.0002           Weak
##          TenureStability 0.2993           Weak
##            AttritionRisk     NA           <NA>

Model Re-specification (If Required)

If the model fit does not meet the required criteria, Modification Indices (MI) can guide re-specification. A high MI value (above 10) suggests a meaningful potential improvement. Re-specification should only be performed when supported by theoretical justification.

if (!lavInspect(fit_SEM, "converged")) {
  cat("Modification indices cannot be computed -- model did not converge.\n")
} else {

  mi <- modindices(fit_SEM, sort. = TRUE, maximum.number = 15)
  cat(" Top 15 Modification Indices \n\n")
  print(mi[, c("lhs", "op", "rhs", "mi", "epc")], row.names = FALSE)
  cat("\nNote: Re-specification is only recommended when theoretically justified.\n")
}

##  Top 15 Modification Indices 
## 
##                 lhs op                     rhs      mi     epc
##       MonthlyIncome ~~                JobLevel 204.872   0.441
##     TenureStability =~       TotalWorkingYears 198.851   0.303
##       AttritionRisk =~       TotalWorkingYears 189.844   0.373
##         CareerLevel =~      NumCompaniesWorked  96.892   0.321
##         CareerLevel =~ YearsSinceLastPromotion  96.396   1.658
##     TenureStability =~ YearsSinceLastPromotion  94.347  29.508
##   TotalWorkingYears ~~      NumCompaniesWorked  86.928   0.147
##         CareerLevel =~          YearsAtCompany  77.300   0.181
##   TotalWorkingYears ~~          YearsAtCompany  66.918   0.066
##  YearsInCurrentRole ~~    YearsWithCurrManager  61.685   0.104
##      YearsAtCompany ~~      YearsInCurrentRole  41.415  -0.104
##     TenureStability =~       PercentSalaryHike  40.772 -23.145
##     TenureStability =~       PerformanceRating  40.680   1.755
##         CareerLevel =~    YearsWithCurrManager  37.851  -0.133
##     TenureStability =~      NumCompaniesWorked  28.860   1.713
## 
## Note: Re-specification is only recommended when theoretically justified.

Conclusion

Based on the complete CB-SEM analysis carried out in this study, the following conclusions are drawn:

Assumption Testing – The three key assumptions were verified prior to further analysis: multivariate normality was assessed using Mardia’s Test, data adequacy was confirmed via KMO Overall MSA (> 0.5), and multicollinearity was examined using VIF (threshold < 3.3). All variables included in the final model passed the VIF requirement, as indicators with VIF >= 3.3 were excluded during the construct design phase.
Construct Design – Constructs were designed using a data-driven approach based on empirical inter-indicator correlation patterns. Only indicator groups with mutual correlations above 0.30 were retained, resulting in four constructs: Career Level (MonthlyIncome, JobLevel, TotalWorkingYears), Performance and Reward (PercentSalaryHike, PerformanceRating), Tenure Stability (YearsAtCompany, YearsInCurrentRole, YearsWithCurrManager), and Attrition Risk (YearsSinceLastPromotion, NumCompaniesWorked, OverTime).
First-Order CFA – CFA was conducted separately for each construct to confirm that each indicator validly measures its respective latent variable. Standardized factor loadings were evaluated against the 0.5 threshold. Note that constructs with exactly 3 indicators are just-identified (df = 0), so individual model fit was not computed and was instead evaluated at the combined stage.
Combined Measurement Model (Second-Order CFA) – All constructs were estimated simultaneously to evaluate convergent validity (AVE >= 0.5), construct reliability (CR >= 0.7 and Cronbach’s Alpha >= 0.7), and discriminant validity using both the Fornell-Larcker Criterion and the HTMT ratio. The combined CFA model did not fully converge, likely due to the PerfReward construct having only two indicators, which creates numerical instability in the joint estimation.
Full SEM and Hypothesis Testing – The full structural model integrated the outer model (measurement) with the inner model (structural paths) and was estimated using the MLR estimator. Four hypotheses were tested: (H1) Career Level has a significant positive effect on Tenure Stability; (H2) Performance and Reward has a significant positive effect on Tenure Stability; (H3) Tenure Stability has a significant negative effect on Attrition Risk; and (H4) Career Level has a significant negative effect on Attrition Risk. The support or rejection of each hypothesis was determined based on standardized path coefficients and p-values at alpha = 0.05.
Methodological Approach – This study applied CB-SEM (Covariance-Based SEM) with the MLR (Maximum Likelihood Robust) estimator using the lavaan package in R. This approach is consistent with the reflective nature of all constructs, the presence of non-normal data (confirmed by Mardia’s Test), and the theory-confirmatory objective of the analysis.
Study Limitations – Several limitations were identified in this study:

the combined CFA model did not converge, likely due to the PerfReward construct having only two indicators, which caused numerical instability in joint estimation;
conceptual overlap among CareerLevel indicators (MonthlyIncome, JobLevel, TotalWorkingYears) may have contributed to elevated multicollinearity, which is expected given their shared seniority dimension;
the RMSEA value exceeded the recommended threshold (0.102 > 0.08), indicating room for model improvement, though this value should be interpreted cautiously given the non-convergence of the combined CFA; and
the HTMT value between AttritionRisk and TenureStability (0.899) approached the discriminant validity boundary (< 0.90), suggesting these two constructs may share conceptual overlap in this dataset.

Implementation of Covariance-Based Structural Equation Modeling (CB-SEM) to Analyze Employee Attrition Risk

Abdullah AlFirdaus Nuzula (008), Muhammad Raffi Fahrezi (100), Elvira Tiara Suci (213)

2026-05-15