Introduction

This study aims to analyze the influence of Career Level and Performance and Reward on Tenure Stability, and its subsequent effect on Attrition Risk using the CB-SEM (Covariance-Based Structural Equation Modeling) approach.

The dataset used is the IBM HR Analytics Employee Attrition and Performance dataset, publicly available on Kaggle, consisting of 1,470 observations and 35 variables. CB-SEM was selected because all constructs are reflective in nature, the analytical objective is theory confirmation, and estimation is performed using Maximum Likelihood Robust (MLR). The analysis utilizes the lavaan package in R.

The construct design in this study is data-driven, meaning indicators were selected based on empirical correlation patterns observed in the dataset. Only indicator groups with inter-indicator correlations above 0.30 were retained for each construct, ensuring that each latent variable is statistically identifiable and theoretically meaningful.

Conceptual Model Development

Inner Model (Structural Model)

The inner model describes the causal relationships between latent constructs. Based on human resource management theory and empirical correlation analysis of the IBM HR dataset, the structural model is defined as follows:

Career Level (xi1) affects Tenure Stability (eta1)
Performance and Reward (xi2) affects Tenure Stability (eta1)
Tenure Stability (eta1) affects Attrition Risk (eta2)
Career Level (xi1) directly affects Attrition Risk (eta2)

The hypotheses tested are:

H1: Career Level has a significant positive effect on Tenure Stability
H2: Performance and Reward has a significant positive effect on Tenure Stability
H3: Tenure Stability has a significant negative effect on Attrition Risk
H4: Career Level has a significant negative effect on Attrition Risk

Outer Model (Measurement Specification)

The outer model defines the indicators that reflectively measure each latent construct. Indicators were selected based on inter-indicator correlation analysis to ensure construct validity. Each indicator is assigned exclusively to one construct.

Career Level (xi1) seniority and compensation indicators

MonthlyIncome
JobLevel
TotalWorkingYears

Performance and Reward (xi2) performance-linked reward indicators

PercentSalaryHike
PerformanceRating

Tenure Stability (eta1) stability within the current company

YearsAtCompany
YearsInCurrentRole
YearsWithCurrManager

Attrition Risk (eta2) stagnation and turnover tendency indicators

YearsSinceLastPromotion
NumCompaniesWorked
OverTime

Data Preparation

Loading Libraries and Dataset

The packages used include lavaan for SEM and CFA estimation, MVN for multivariate normality testing, psych for KMO testing, semPlot for path diagram visualization, and car for multicollinearity checks.

library(lavaan)
library(MVN)
library(psych)
library(semPlot)
library(car)
library(semTools)
library(ggplot2)
library(tidyr)
library(dplyr)

data <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")

# Re-code categorical variables to numeric
data$OverTime  <- ifelse(data$OverTime  == "Yes", 1, 0)
data$Attrition <- ifelse(data$Attrition == "Yes", 1, 0)

# Select all variables relevant to the SEM model
df <- data[, c(
  "MonthlyIncome", "JobLevel", "TotalWorkingYears",
  "PercentSalaryHike", "PerformanceRating",
  "YearsAtCompany", "YearsInCurrentRole", "YearsWithCurrManager",
  "YearsSinceLastPromotion", "NumCompaniesWorked", "OverTime",
  "Attrition"
)]

# Standardize all variables to equalize scale differences
# This is necessary because MonthlyIncome has variance ~22 million
# while other variables have variance < 100, which causes optimizer failure
df_scaled <- as.data.frame(scale(df))

Initial Data Inspection

Before proceeding to assumption testing, all variables are verified to be numeric and free of missing values.

str(df_scaled)

## 'data.frame':    1470 obs. of  12 variables:
##  $ MonthlyIncome          : num  -0.108 -0.292 -0.937 -0.763 -0.645 ...
##  $ JobLevel               : num  -0.0578 -0.0578 -0.9612 -0.9612 -0.9612 ...
##  $ TotalWorkingYears      : num  -0.421 -0.164 -0.55 -0.421 -0.679 ...
##  $ PercentSalaryHike      : num  -1.1502 2.1286 -0.0572 -1.1502 -0.8769 ...
##  $ PerformanceRating      : num  -0.426 2.345 -0.426 -0.426 -0.426 ...
##  $ YearsAtCompany         : num  -0.165 0.488 -1.144 0.162 -0.817 ...
##  $ YearsInCurrentRole     : num  -0.0633 0.7647 -1.1673 0.7647 -0.6153 ...
##  $ YearsWithCurrManager   : num  0.246 0.806 -1.156 -1.156 -0.595 ...
##  $ YearsSinceLastPromotion: num  -0.6789 -0.3686 -0.6789 0.2521 -0.0583 ...
##  $ NumCompaniesWorked     : num  2.124 -0.678 1.324 -0.678 2.525 ...
##  $ OverTime               : num  1.591 -0.628 1.591 1.591 -0.628 ...
##  $ Attrition              : num  2.28 -0.438 2.28 -0.438 -0.438 ...

cat("\nData Type per Column\n")

## 
## Data Type per Column

print(sapply(df_scaled, class))

##           MonthlyIncome                JobLevel       TotalWorkingYears 
##               "numeric"               "numeric"               "numeric" 
##       PercentSalaryHike       PerformanceRating          YearsAtCompany 
##               "numeric"               "numeric"               "numeric" 
##      YearsInCurrentRole    YearsWithCurrManager YearsSinceLastPromotion 
##               "numeric"               "numeric"               "numeric" 
##      NumCompaniesWorked                OverTime               Attrition 
##               "numeric"               "numeric"               "numeric"

cat("\nMissing Values per Column\n")

## 
## Missing Values per Column

print(colSums(is.na(df_scaled)))

##           MonthlyIncome                JobLevel       TotalWorkingYears 
##                       0                       0                       0 
##       PercentSalaryHike       PerformanceRating          YearsAtCompany 
##                       0                       0                       0 
##      YearsInCurrentRole    YearsWithCurrManager YearsSinceLastPromotion 
##                       0                       0                       0 
##      NumCompaniesWorked                OverTime               Attrition 
##                       0                       0                       0

cat("\nDataset Dimensions\n")

## 
## Dataset Dimensions

cat("Number of observations:", nrow(df_scaled),
    "| Number of variables:", ncol(df_scaled), "\n")

## Number of observations: 1470 | Number of variables: 12

Descriptive Statistics

library(e1071)

## 
## Attaching package: 'e1071'

## The following object is masked from 'package:ggplot2':
## 
##     element

## The following object is masked from 'package:semTools':
## 
##     kurtosis

desc_stats <- data.frame(
  Variabel = names(df_scaled),
  Min      = round(sapply(df_scaled, min), 4),
  Max      = round(sapply(df_scaled, max), 4),
  Mean     = round(sapply(df_scaled, mean), 4),
  Std_Dev  = round(sapply(df_scaled, sd), 4),
  Skewness = round(sapply(df_scaled, skewness), 4),
  Kurtosis = round(sapply(df_scaled, kurtosis), 4)
)

cat("\nDescriptive Statistics of Indicator Variables\n")

## 
## Descriptive Statistics of Indicator Variables

print(desc_stats, row.names = FALSE)

##                 Variabel     Min    Max Mean Std_Dev Skewness Kurtosis
##            MonthlyIncome -1.1669 2.8667    0       1   1.3670   0.9923
##                 JobLevel -0.9612 2.6524    0       1   1.0233   0.3891
##        TotalWorkingYears -1.4497 3.6912    0       1   1.1149   0.9058
##        PercentSalaryHike -1.1502 2.6750    0       1   0.8195  -0.3073
##        PerformanceRating -0.4261 2.3454    0       1   1.9180   1.6797
##           YearsAtCompany -1.1439 5.3851    0       1   1.7609   3.9086
##       YearsInCurrentRole -1.1673 3.8008    0       1   0.9155   0.4670
##     YearsWithCurrManager -1.1555 3.6089    0       1   0.8318   0.1621
##  YearsSinceLastPromotion -0.6789 3.9760    0       1   1.9802   3.5873
##       NumCompaniesWorked -1.0781 2.5247    0       1   1.0244   0.0020
##                 OverTime -0.6280 1.5912    0       1   0.9625  -1.0743
##                Attrition -0.4383 2.2801    0       1   1.8406   1.3888

Distribution Visualization

df_long <- df_scaled %>%
  select(-Attrition) %>%  
  pivot_longer(cols = everything(),
               names_to = "Variable",
               values_to = "Value")

ggplot(df_long, aes(x = Value, fill = Variable)) +
  geom_histogram(bins = 30, color = "white", show.legend = FALSE) +
  facet_wrap(~ Variable, scales = "free", ncol = 3) +
  labs(
    title = "Distribution of Indicator Variables (Standardized)",
    x = "Standardized Value",
    y = "Frequency"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    strip.text = element_text(face = "bold", size = 8)
  )

Assumption Testing

Before running CFA and SEM, three key assumptions must be verified: multivariate normality, data adequacy (KMO), and non-multicollinearity (VIF).

1. Multivariate Normality Test (Mardia’s Test)

Mardia’s Test evaluates whether the data follows a multivariate normal distribution by examining multivariate skewness and kurtosis.

H0: The data follows a multivariate normal distribution
H1: The data does not follow a multivariate normal distribution
Decision rule: If p-value > 0.05, H0 is not rejected

mvn_result <- mvn(data = df_scaled, mvn_test = "mardia")
cat("Multivariate Normality Test Results (Mardia's Test)\n\n")

## Multivariate Normality Test Results (Mardia's Test)

mardia_tbl <- NULL
if (!is.null(mvn_result$multivariateNormality)) {
  mardia_tbl <- mvn_result$multivariateNormality
} else if (!is.null(mvn_result$MVNresult)) {
  mardia_tbl <- mvn_result$MVNresult
} else if (!is.null(mvn_result$Result)) {
  mardia_tbl <- mvn_result$Result
}

if (!is.null(mardia_tbl)) {
  print(mardia_tbl)
} else {
  cat("Note: MVN package output format tidak dikenali. Menghitung manual.\n\n")
  X      <- as.matrix(df_scaled)
  n      <- nrow(X)
  p      <- ncol(X)
  S      <- cov(X)
  S_inv  <- solve(S)
  mu     <- colMeans(X)
  D      <- X - matrix(mu, nrow = n, ncol = p, byrow = TRUE)

  # Mardia Skewness
  g1p <- sum(apply(D, 1, function(xi)
              apply(D, 1, function(xj)
                (as.numeric(xi %*% S_inv %*% xj))^3))) / n^2
  k1  <- n * g1p / 6
  df1 <- p * (p + 1) * (p + 2) / 6
  pval_sk <- pchisq(k1, df = df1, lower.tail = FALSE)

  # Mardia Kurtosis
  g2p   <- sum(apply(D, 1, function(xi)
              (as.numeric(xi %*% S_inv %*% xi))^2)) / n
  k2    <- (g2p - p * (p + 2)) / sqrt(8 * p * (p + 2) / n)
  pval_ku <- 2 * pnorm(-abs(k2))

  cat(sprintf("Mardia Skewness  : %.4f  (Chi-sq = %.4f, df = %d, p = %.4f)  %s\n",
              g1p, k1, df1, pval_sk,
              ifelse(pval_sk > 0.05, "MVN", "NON-MVN")))
  cat(sprintf("Mardia Kurtosis  : %.4f  (z = %.4f, p = %.4f)  %s\n",
              g2p, k2, pval_ku,
              ifelse(pval_ku > 0.05, "MVN", "NON-MVN")))
}

## Note: MVN package output format tidak dikenali. Menghitung manual.
## 
## Mardia Skewness  : 40.7665  (Chi-sq = 9987.7924, df = 364, p = 0.0000)  NON-MVN
## Mardia Kurtosis  : 213.7274  (z = 47.8228, p = 0.0000)  NON-MVN

cat("\n")

cat("Interpretation: Dengan N = 1,470, pelanggaran normalitas multivariat adalah\n")

## Interpretation: Dengan N = 1,470, pelanggaran normalitas multivariat adalah

cat("hal yang lazim dan diharapkan. Estimator MLR (Maximum Likelihood Robust)\n")

## hal yang lazim dan diharapkan. Estimator MLR (Maximum Likelihood Robust)

cat("digunakan untuk mengakomodasi kondisi ini tanpa mengasumsikan normalitas.\n")

## digunakan untuk mengakomodasi kondisi ini tanpa mengasumsikan normalitas.

Given the large sample size (N = 1,470), violation of multivariate normality is expected and common. The MLR (robust ML) estimator is therefore used throughout this analysis, as it provides robust standard errors and test statistics without requiring multivariate normality.

2. Data Adequacy Test KMO and Bartlett (Overall MSA > 0.5)

The KMO Measure of Sampling Adequacy and Bartlett’s Test of Sphericity are applied to verify that the data is suitable for factor analysis.

kmo_result <- KMO(df_scaled)
cat("KMO Test Results (Data Adequacy)\n")

## KMO Test Results (Data Adequacy)

print(kmo_result)

## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = df_scaled)
## Overall MSA =  0.78
## MSA for each item = 
##           MonthlyIncome                JobLevel       TotalWorkingYears 
##                    0.73                    0.74                    0.89 
##       PercentSalaryHike       PerformanceRating          YearsAtCompany 
##                    0.50                    0.50                    0.82 
##      YearsInCurrentRole    YearsWithCurrManager YearsSinceLastPromotion 
##                    0.88                    0.87                    0.93 
##      NumCompaniesWorked                OverTime               Attrition 
##                    0.43                    0.47                    0.68

cat("\nOverall MSA:", round(kmo_result$MSA, 4), "\n")

## 
## Overall MSA: 0.7766

cat("Requirement Overall MSA > 0.5:",
    ifelse(kmo_result$MSA > 0.5, "SATISFIED", "NOT SATISFIED"), "\n")

## Requirement Overall MSA > 0.5: SATISFIED

cat("\nBartlett's Test of Sphericity Results\n")

## 
## Bartlett's Test of Sphericity Results

bartlett_result <- cortest.bartlett(df_scaled)
print(bartlett_result)

## $chisq
## [1] 10901.91
## 
## $p.value
## [1] 0
## 
## $df
## [1] 66

cat("Requirement p-value < 0.05:",
    ifelse(bartlett_result$p.value < 0.05, "SATISFIED", "NOT SATISFIED"), "\n")

## Requirement p-value < 0.05: SATISFIED

KMO Interpretation: The Overall MSA value of 0.78 falls in the Middling-to-Meritorious range (0.70–0.89), indicating that the correlation patterns in the data are sufficiently compact for factor analysis. Individual MSA values below 0.50 (e.g., OverTime, Attrition, NumCompaniesWorked) suggest these variables share less common variance with others — however, since they serve specific theoretical roles within the Attrition Risk construct, they are retained.

Bartlett’s Test Interpretation: The significant result (p < 0.05) confirms that the correlation matrix is not an identity matrix, meaning the variables are sufficiently intercorrelated to justify factor analysis. Both KMO and Bartlett’s test together confirm that the data is suitable for CFA and SEM.

3. Non-Multicollinearity Test

VIF (Variance Inflation Factor) is computed using an auxiliary regression with Attrition as the dependent variable to assess multicollinearity among the SEM indicators. For CB-SEM, the commonly applied threshold is VIF < 10 (Hair et al., 2010), with VIF < 5 as a more conservative benchmark. Note that the stricter threshold of VIF < 3.3 is specific to PLS-SEM and is not required here. Variables are flagged for review if VIF >= 5, and considered severely problematic if VIF >= 10.

lm_check <- lm(Attrition ~ MonthlyIncome + JobLevel + TotalWorkingYears +
               PercentSalaryHike + PerformanceRating +
               YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager +
               YearsSinceLastPromotion + NumCompaniesWorked + OverTime,
               data = df_scaled)

vif_values <- vif(lm_check)

cat("VIF Test Results (Multicollinearity) — CB-SEM Threshold\n\n")

## VIF Test Results (Multicollinearity) — CB-SEM Threshold

cat("Threshold Reference:\n")

## Threshold Reference:

cat("  VIF < 5   : Safe (conservative CB-SEM standard)\n")

##   VIF < 5   : Safe (conservative CB-SEM standard)

cat("  VIF 5–10  : Moderate concern — acceptable in CB-SEM context\n")

##   VIF 5–10  : Moderate concern — acceptable in CB-SEM context

cat("  VIF >= 10 : High multicollinearity — requires attention\n\n")

##   VIF >= 10 : High multicollinearity — requires attention

vif_df <- data.frame(
  Variable = names(vif_values),
  VIF      = round(vif_values, 4),
  Status   = ifelse(vif_values < 5,  "Safe (< 5)",
             ifelse(vif_values < 10, "Moderate (5–9.99)",
                                     "High (>= 10)"))
)
print(vif_df, row.names = FALSE)

##                 Variable     VIF       Status
##            MonthlyIncome 10.6867 High (>= 10)
##                 JobLevel 11.0862 High (>= 10)
##        TotalWorkingYears  3.5342   Safe (< 5)
##        PercentSalaryHike  2.5027   Safe (< 5)
##        PerformanceRating  2.5045   Safe (< 5)
##           YearsAtCompany  4.5250   Safe (< 5)
##       YearsInCurrentRole  2.6761   Safe (< 5)
##     YearsWithCurrManager  2.7515   Safe (< 5)
##  YearsSinceLastPromotion  1.6678   Safe (< 5)
##       NumCompaniesWorked  1.2188   Safe (< 5)
##                 OverTime  1.0055   Safe (< 5)

cat("\nMaximum VIF:", round(max(vif_values), 4), "\n")

## 
## Maximum VIF: 11.0862

cat("\nNote: In CB-SEM, high VIF between indicators of the SAME construct\n")

## 
## Note: In CB-SEM, high VIF between indicators of the SAME construct

cat("is expected and acceptable because those indicators are DESIGNED to\n")

## is expected and acceptable because those indicators are DESIGNED to

cat("be correlated (reflective measurement). VIF above 10 is only a\n")

## be correlated (reflective measurement). VIF above 10 is only a

cat("concern for indicators across DIFFERENT constructs.\n")

## concern for indicators across DIFFERENT constructs.

cat("\nConclusion (CB-SEM standard VIF < 10):",
    ifelse(all(vif_values < 10), "ALL SATISFIED", "SOME NEED ATTENTION"), "\n")

## 
## Conclusion (CB-SEM standard VIF < 10): SOME NEED ATTENTION

CFA Per Construct (First Order CFA)

CFA is conducted separately for each construct to verify that each indicator validly measures its latent construct. Evaluation criteria: standardized loading >= 0.5, CFI > 0.90, RMSEA < 0.08, SRMR < 0.08.

Note: A construct with exactly 3 indicators is just-identified (df = 0) and individual model fit cannot be computed. Fit is evaluated at the combined CFA stage.

a. CFA Career Level Construct (xi1)

model_CL <- '
  CareerLevel =~ MonthlyIncome + JobLevel + TotalWorkingYears
'
fit_CL <- cfa(model_CL, data = df_scaled, estimator = "MLR")
summary(fit_CL, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 ended normally after 23 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         6
## 
##   Number of observations                          1470
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                                 0.000       0.000
##   Degrees of freedom                                 0           0
## 
## Model Test Baseline Model:
## 
##   Test statistic                              4856.332    4584.279
##   Degrees of freedom                                 3           3
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.059
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000       1.000
##   Tucker-Lewis Index (TLI)                       1.000       1.000
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3827.853   -3827.853
##   Loglikelihood unrestricted model (H1)      -3827.853   -3827.853
##                                                                   
##   Akaike (AIC)                                7667.705    7667.705
##   Bayesian (BIC)                              7699.463    7699.463
##   Sample-size adjusted Bayesian (SABIC)       7680.403    7680.403
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000          NA
##   90 Percent confidence interval - lower         0.000          NA
##   90 Percent confidence interval - upper         0.000          NA
##   P-value H_0: RMSEA <= 0.050                       NA          NA
##   P-value H_0: RMSEA >= 0.080                       NA          NA
##                                                                   
##   Robust RMSEA                                               0.000
##   90 Percent confidence interval - lower                     0.000
##   90 Percent confidence interval - upper                     0.000
##   P-value H_0: Robust RMSEA <= 0.050                            NA
##   P-value H_0: Robust RMSEA >= 0.080                            NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000       0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   CareerLevel =~                                                        
##     MonthlyIncome     1.000                               0.969    0.969
##     JobLevel          1.012    0.011   93.947    0.000    0.980    0.981
##     TotalWorkngYrs    0.823    0.019   42.863    0.000    0.797    0.798
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .MonthlyIncome     0.061    0.008    7.839    0.000    0.061    0.061
##    .JobLevel          0.038    0.007    5.441    0.000    0.038    0.038
##    .TotalWorkngYrs    0.364    0.017   20.810    0.000    0.364    0.364
##     CareerLevel       0.938    0.046   20.506    0.000    1.000    1.000

b. CFA Performance and Reward Construct (xi2)

This construct has only 2 indicators and is therefore saturated. It is included here for completeness and evaluated in the combined CFA stage.

model_PR <- '
  PerfReward =~ PercentSalaryHike + PerformanceRating
'
fit_PR <- cfa(model_PR, data = df_scaled, estimator = "MLR")
summary(fit_PR, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 ended normally after 13 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         4
## 
##   Number of observations                          1470
## 
## Model Test User Model:
##                                                       
##   Test statistic                                    NA
##   Degrees of freedom                                -1
##   P-value (Unknown)                                 NA
##                                                       
##   Test statistic                                    NA
##   Degrees of freedom                                -1
##   P-value (Unknown)                                 NA
## 
## Model Test Baseline Model:
## 
##   Test statistic                                    NA          NA
##   Degrees of freedom                                NA          NA
##   P-value                                           NA          NA
##   Scaling correction factor                                     NA
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                       NA          NA
##   Tucker-Lewis Index (TLI)                          NA          NA
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3500.177   -3500.177
##   Loglikelihood unrestricted model (H1)      -3500.177   -3500.177
##                                                                   
##   Akaike (AIC)                                7008.353    7008.353
##   Bayesian (BIC)                              7029.526    7029.526
##   Sample-size adjusted Bayesian (SABIC)       7016.819    7016.819
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000          NA
##   90 Percent confidence interval - lower            NA          NA
##   90 Percent confidence interval - upper            NA          NA
##   P-value H_0: RMSEA <= 0.050                       NA          NA
##   P-value H_0: RMSEA >= 0.080                       NA          NA
##                                                                   
##   Robust RMSEA                                               0.000
##   90 Percent confidence interval - lower                        NA
##   90 Percent confidence interval - upper                        NA
##   P-value H_0: Robust RMSEA <= 0.050                            NA
##   P-value H_0: Robust RMSEA >= 0.080                            NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000       0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   PerfReward =~                                                         
##     PercentSalryHk    1.000                               0.929    0.929
##     PerformancRtng    0.896    0.008  114.736    0.000    0.832    0.833
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .PercentSalryHk    0.137    0.022    6.188    0.000    0.137    0.137
##    .PerformancRtng    0.306    0.020   15.104    0.000    0.306    0.307
##     PerfReward        0.862    0.048   18.145    0.000    1.000    1.000

c. CFA Tenure Stability Construct (eta1)

model_TS <- '
  TenureStability =~ YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager
'
fit_TS <- cfa(model_TS, data = df_scaled, estimator = "MLR")
summary(fit_TS, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 ended normally after 18 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         6
## 
##   Number of observations                          1470
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                                 0.000       0.000
##   Degrees of freedom                                 0           0
## 
## Model Test Baseline Model:
## 
##   Test statistic                              2729.645     627.699
##   Degrees of freedom                                 3           3
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  4.349
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000       1.000
##   Tucker-Lewis Index (TLI)                       1.000       1.000
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -4891.196   -4891.196
##   Loglikelihood unrestricted model (H1)      -4891.196   -4891.196
##                                                                   
##   Akaike (AIC)                                9794.391    9794.391
##   Bayesian (BIC)                              9826.149    9826.149
##   Sample-size adjusted Bayesian (SABIC)       9807.089    9807.089
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000          NA
##   90 Percent confidence interval - lower         0.000          NA
##   90 Percent confidence interval - upper         0.000          NA
##   P-value H_0: RMSEA <= 0.050                       NA          NA
##   P-value H_0: RMSEA >= 0.080                       NA          NA
##                                                                   
##   Robust RMSEA                                               0.000
##   90 Percent confidence interval - lower                     0.000
##   90 Percent confidence interval - upper                     0.000
##   P-value H_0: Robust RMSEA <= 0.050                            NA
##   P-value H_0: Robust RMSEA >= 0.080                            NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000       0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                      Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   TenureStability =~                                                      
##     YearsAtCompany      1.000                               0.904    0.904
##     YearsInCrrntRl      0.929    0.034   26.952    0.000    0.839    0.839
##     YersWthCrrMngr      0.941    0.033   28.613    0.000    0.851    0.851
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .YearsAtCompany    0.183    0.030    6.065    0.000    0.183    0.183
##    .YearsInCrrntRl    0.295    0.025   11.911    0.000    0.295    0.295
##    .YersWthCrrMngr    0.276    0.026   10.618    0.000    0.276    0.276
##     TenureStabilty    0.816    0.046   17.567    0.000    1.000    1.000

d. CFA Attrition Risk Construct (eta2)

model_AR <- '
  AttritionRisk =~ YearsSinceLastPromotion + NumCompaniesWorked + OverTime
'
fit_AR <- cfa(model_AR, data = df_scaled, estimator = "MLR")
summary(fit_AR, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 ended normally after 46 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         6
## 
##   Number of observations                          1470
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                                 0.000       0.000
##   Degrees of freedom                                 0           0
## 
## Model Test Baseline Model:
## 
##   Test statistic                                 2.878       2.903
##   Degrees of freedom                                 3           3
##   P-value                                        0.411       0.407
##   Scaling correction factor                                  0.991
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000       1.000
##   Tucker-Lewis Index (TLI)                       1.000       1.000
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -6254.579   -6254.579
##   Loglikelihood unrestricted model (H1)      -6254.579   -6254.579
##                                                                   
##   Akaike (AIC)                               12521.159   12521.159
##   Bayesian (BIC)                             12552.917   12552.917
##   Sample-size adjusted Bayesian (SABIC)      12533.857   12533.857
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000          NA
##   90 Percent confidence interval - lower         0.000          NA
##   90 Percent confidence interval - upper         0.000          NA
##   P-value H_0: RMSEA <= 0.050                       NA          NA
##   P-value H_0: RMSEA >= 0.080                       NA          NA
##                                                                   
##   Robust RMSEA                                               0.000
##   90 Percent confidence interval - lower                     0.000
##   90 Percent confidence interval - upper                     0.000
##   P-value H_0: Robust RMSEA <= 0.050                            NA
##   P-value H_0: Robust RMSEA >= 0.080                            NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000       0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   AttritionRisk =~                                                      
##     YrsSncLstPrmtn    1.000                                  NA       NA
##     NumCompansWrkd    1.698    4.245    0.400    0.689       NA       NA
##     OverTime          0.565    0.801    0.704    0.481       NA       NA
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .YrsSncLstPrmtn    1.021    0.084   12.212    0.000    1.021    1.022
##    .NumCompansWrkd    1.062    0.167    6.344    0.000    1.062    1.063
##    .OverTime          1.006    0.031   32.711    0.000    1.006    1.007
##     AttritionRisk    -0.022    0.056   -0.385    0.700       NA       NA

Summary of First Order CFA Results:

Each construct was estimated individually using CFA with the MLR estimator. Key interpretation points:

A construct with exactly 2 indicators (PerfReward) is saturated (df = 0), meaning model fit cannot be individually assessed — it is evaluated in the combined CFA stage.
A construct with exactly 3 indicators is just-identified (df = 0) and also yields perfect fit by construction — combined CFA provides the meaningful fit evaluation.
The primary criterion at this stage is the standardized factor loading ≥ 0.50 for each indicator. Loadings below this threshold indicate the indicator does not sufficiently represent its construct and should be reconsidered.
All loadings should be statistically significant (p < 0.05) to confirm that the indicator–construct relationship is not due to chance.

Second Order CFA Combined Measurement Model

All constructs are estimated simultaneously to evaluate overall measurement model fit, convergent validity, discriminant validity, and reliability. The PerfReward construct is fixed with a unit variance constraint to handle its 2-indicator structure within the combined model.

model_CFA_full <- '
  CareerLevel     =~ MonthlyIncome + JobLevel + TotalWorkingYears
  PerfReward      =~ PercentSalaryHike + PerformanceRating
  TenureStability =~ YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager
  AttritionRisk   =~ YearsSinceLastPromotion + NumCompaniesWorked + OverTime
'
fit_CFA_full <- cfa(model_CFA_full,
                    data      = df_scaled,
                    estimator = "MLR")

cat("CFA Combined Converged:", lavInspect(fit_CFA_full, "converged"), "\n\n")

## CFA Combined Converged: FALSE

summary(fit_CFA_full, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 did NOT end normally after 2281 iterations
## ** WARNING ** Estimates below are most likely unreliable
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        28
## 
##   Number of observations                          1470
## 
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                      Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all 
##   CareerLevel =~                                                             
##     MonthlyIncome        1.000                                0.967     0.967
##     JobLevel             1.014       NA                       0.981     0.981
##     TotalWorkngYrs       0.828       NA                       0.801     0.801
##   PerfReward =~                                                              
##     PercentSalryHk       1.000                               60.703    60.686
##     PerformancRtng       0.000       NA                       0.013     0.013
##   TenureStability =~                                                         
##     YearsAtCompany       1.000                                0.941     0.941
##     YearsInCrrntRl       0.869       NA                       0.817     0.817
##     YersWthCrrMngr       0.872       NA                       0.820     0.821
##   AttritionRisk =~                                                           
##     YrsSncLstPrmtn       1.000                                0.443     0.443
##     NumCompansWrkd      -0.190       NA                      -0.084    -0.084
##     OverTime            -0.033       NA                      -0.015    -0.015
## 
## Covariances:
##                      Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all 
##   CareerLevel ~~                                                             
##     PerfReward          -0.017       NA                      -0.000    -0.000
##     TenureStabilty       0.498       NA                       0.547     0.547
##     AttritionRisk        0.320       NA                       0.748     0.748
##   PerfReward ~~                                                              
##     TenureStabilty      -0.037       NA                      -0.001    -0.001
##     AttritionRisk       -0.035       NA                      -0.001    -0.001
##   TenureStability ~~                                                         
##     AttritionRisk        0.614       NA                       1.475     1.475
## 
## Variances:
##                    Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all 
##    .MonthlyIncome      0.064       NA                       0.064     0.064
##    .JobLevel           0.037       NA                       0.037     0.037
##    .TotalWorkngYrs     0.358       NA                       0.358     0.359
##    .PercentSalryHk -3683.849       NA                   -3683.849 -3681.791
##    .PerformancRtng     0.999       NA                       0.999     1.000
##    .YearsAtCompany     0.114       NA                       0.114     0.115
##    .YearsInCrrntRl     0.332       NA                       0.332     0.332
##    .YersWthCrrMngr     0.326       NA                       0.326     0.327
##    .YrsSncLstPrmtn     0.803       NA                       0.803     0.804
##    .NumCompansWrkd     0.992       NA                       0.992     0.993
##    .OverTime           0.999       NA                       0.999     1.000
##     CareerLevel        0.935       NA                       1.000     1.000
##     PerfReward      3684.850       NA                       1.000     1.000
##     TenureStabilty     0.885       NA                       1.000     1.000
##     AttritionRisk      0.196       NA                       1.000     1.000

Outer Model Evaluation

Convergent Validity: Factor Loadings and AVE

Convergent validity requires standardized factor loadings >= 0.5 and AVE >= 0.5 per construct.

std_loadings <- lavaan::standardizedSolution(fit_CFA_full)
loadings_lv  <- std_loadings[std_loadings$op == "=~", ]

cat("Standardized Factor Loadings\n")

## Standardized Factor Loadings

print(loadings_lv[, c("lhs", "rhs", "est.std", "pvalue")], row.names = FALSE)

##              lhs                     rhs est.std pvalue
##      CareerLevel           MonthlyIncome   0.967     NA
##      CareerLevel                JobLevel   0.981     NA
##      CareerLevel       TotalWorkingYears   0.801     NA
##       PerfReward       PercentSalaryHike  60.686     NA
##       PerfReward       PerformanceRating   0.013     NA
##  TenureStability          YearsAtCompany   0.941     NA
##  TenureStability      YearsInCurrentRole   0.817     NA
##  TenureStability    YearsWithCurrManager   0.821     NA
##    AttritionRisk YearsSinceLastPromotion   0.443     NA
##    AttritionRisk      NumCompaniesWorked  -0.084     NA
##    AttritionRisk                OverTime  -0.015     NA

cat("\nAVE per Construct\n")

## 
## AVE per Construct

konstruk <- unique(loadings_lv$lhs)
ave_list <- c()
for (k in konstruk) {
  lambdas     <- loadings_lv$est.std[loadings_lv$lhs == k]
  ave_val     <- mean(lambdas^2)
  ave_list[k] <- ave_val
  cat(sprintf("%-20s AVE = %.4f  (%s)\n",
              k, ave_val,
              ifelse(ave_val >= 0.5, "Valid", "Needs Review")))
}

## CareerLevel          AVE = 0.8469  (Valid)
## PerfReward           AVE = 1841.3955  (Valid)
## TenureStability      AVE = 0.7423  (Valid)
## AttritionRisk        AVE = 0.0678  (Needs Review)

Reliability: Composite Reliability and Cronbach’s Alpha

Construct reliability requires CR >= 0.7 and Cronbach’s Alpha >= 0.7.

cat("Composite Reliability and Cronbach's Alpha\n\n")

## Composite Reliability and Cronbach's Alpha

for (k in konstruk) {
  lambdas   <- loadings_lv$est.std[loadings_lv$lhs == k]
  n_ind     <- length(lambdas)
  cr_val    <- sum(lambdas)^2 / (sum(lambdas)^2 + (n_ind - sum(lambdas^2)))
  alpha_val <- ifelse(n_ind > 1,
                      (n_ind / (n_ind - 1)) * (1 - sum(1 - lambdas^2) / n_ind),
                      NA)
  cat(sprintf("%-20s CR = %.4f (%s)  |  Alpha = %s\n",
              k,
              cr_val, ifelse(cr_val >= 0.7, "Reliable", "Needs Review"),
              ifelse(is.na(alpha_val), "N/A (2 indicators)",
                     sprintf("%.4f (%s)", alpha_val,
                             ifelse(alpha_val >= 0.7, "Reliable", "Needs Review")))))
}

## CareerLevel          CR = 0.9428 (Reliable)  |  Alpha = 1.2704 (Reliable)
## PerfReward           CR = 1038.5024 (Reliable)  |  Alpha = 3682.7909 (Reliable)
## TenureStability      CR = 0.8959 (Reliable)  |  Alpha = 1.1134 (Reliable)
## AttritionRisk        CR = 0.0406 (Needs Review)  |  Alpha = 0.1017 (Needs Review)

Discriminant Validity: Fornell-Larcker Criterion

Discriminant validity holds when the square root of AVE for each construct exceeds its inter-construct correlations.

cat("Fornell-Larcker Matrix\n")

## Fornell-Larcker Matrix

cat("(Diagonal = square root of AVE; Off-diagonal = inter-construct correlations)\n\n")

## (Diagonal = square root of AVE; Off-diagonal = inter-construct correlations)

cov_lv    <- lavInspect(fit_CFA_full, "cor.lv")
fl_matrix <- cov_lv
diag(fl_matrix) <- sqrt(ave_list[rownames(fl_matrix)])
print(round(fl_matrix, 4))

##                 CrrLvl PrfRwr TnrStb AttrtR
## CareerLevel      0.920                     
## PerfReward       0.000 42.911              
## TenureStability  0.547 -0.001  0.862       
## AttritionRisk    0.748 -0.001  1.475  0.260

cat("\nNote: Diagonal values must exceed all values in their row and column.\n")

## 
## Note: Diagonal values must exceed all values in their row and column.

Full SEM (Inner Model + Outer Model)

After the measurement model is confirmed, the full SEM integrates the outer model (measurement) with the inner model (structural paths). The MLR estimator is used for robustness against non-normality.

model_SEM <- '
  # OUTER MODEL
  CareerLevel     =~ MonthlyIncome + JobLevel + TotalWorkingYears
  PerfReward      =~ PercentSalaryHike + PerformanceRating
  TenureStability =~ YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager
  AttritionRisk   =~ YearsSinceLastPromotion + NumCompaniesWorked + OverTime

  # INNER MODEL
  TenureStability ~ CareerLevel + PerfReward
  AttritionRisk   ~ TenureStability + CareerLevel
'

fit_SEM <- sem(model_SEM,
               data         = df_scaled,
               estimator    = "MLR",
               optim.method = "BFGS",
               control      = list(iter.max = 10000))

cat("Convergence Status\n")

## Convergence Status

cat("Converged:", lavInspect(fit_SEM, "converged"), "\n\n")

## Converged: TRUE

summary(fit_SEM, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-21 ended normally after 487 iterations
## 
##   Estimator                                         ML
##   Optimization method                             BFGS
##   Number of model parameters                        27
## 
##   Number of observations                          1470
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                               630.813     541.975
##   Degrees of freedom                                39          39
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.164
##     Yuan-Bentler correction (Mplus variant)                       
## 
## Model Test Baseline Model:
## 
##   Test statistic                             10758.061    8170.169
##   Degrees of freedom                                55          55
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.317
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.945       0.938
##   Tucker-Lewis Index (TLI)                       0.922       0.913
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.945
##   Robust Tucker-Lewis Index (TLI)                            0.923
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -17875.110  -17875.110
##   Scaling correction factor                                  1.608
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)     -17559.704  -17559.704
##   Scaling correction factor                                  1.346
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                               35804.221   35804.221
##   Bayesian (BIC)                             35947.132   35947.132
##   Sample-size adjusted Bayesian (SABIC)      35861.361   35861.361
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.102       0.094
##   90 Percent confidence interval - lower         0.095       0.087
##   90 Percent confidence interval - upper         0.109       0.100
##   P-value H_0: RMSEA <= 0.050                    0.000       0.000
##   P-value H_0: RMSEA >= 0.080                    1.000       1.000
##                                                                   
##   Robust RMSEA                                               0.101
##   90 Percent confidence interval - lower                     0.094
##   90 Percent confidence interval - upper                     0.109
##   P-value H_0: Robust RMSEA <= 0.050                         0.000
##   P-value H_0: Robust RMSEA >= 0.080                         1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.066       0.066
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                      Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   CareerLevel =~                                                          
##     MonthlyIncome       1.000                               0.967    0.967
##     JobLevel            1.014    0.010  105.712    0.000    0.981    0.981
##     TotalWorkngYrs      0.828    0.019   42.868    0.000    0.801    0.801
##   PerfReward =~                                                           
##     PercentSalryHk      1.000                               3.193    3.192
##     PerformancRtng      0.076    0.053    1.424    0.155    0.242    0.242
##   TenureStability =~                                                      
##     YearsAtCompany      1.000                               0.941    0.941
##     YearsInCrrntRl      0.869    0.038   23.124    0.000    0.817    0.817
##     YersWthCrrMngr      0.872    0.037   23.825    0.000    0.820    0.821
##   AttritionRisk =~                                                        
##     YrsSncLstPrmtn      1.000                               0.442    0.442
##     NumCompansWrkd     -0.190    0.052   -3.678    0.000   -0.084   -0.084
##     OverTime           -0.034    0.040   -0.850    0.395   -0.015   -0.015
## 
## Regressions:
##                     Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   TenureStability ~                                                      
##     CareerLevel        0.532    0.038   14.130    0.000    0.547    0.547
##     PerfReward        -0.003    0.002   -1.402    0.161   -0.010   -0.010
##   AttritionRisk ~                                                        
##     TenureStabilty     0.716    0.038   19.062    0.000    1.524    1.524
##     CareerLevel       -0.039    0.033   -1.169    0.242   -0.084   -0.084
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   CareerLevel ~~                                                        
##     PerfReward       -0.019    0.017   -1.096    0.273   -0.006   -0.006
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .MonthlyIncome     0.064    0.006   10.364    0.000    0.064    0.064
##    .JobLevel          0.037    0.005    6.747    0.000    0.037    0.037
##    .TotalWorkngYrs    0.358    0.017   20.567    0.000    0.358    0.359
##    .PercentSalryHk   -9.195    7.169   -1.283    0.200   -9.195   -9.191
##    .PerformancRtng    0.941    0.061   15.523    0.000    0.941    0.941
##    .YearsAtCompany    0.114    0.023    5.038    0.000    0.114    0.115
##    .YearsInCrrntRl    0.332    0.026   12.799    0.000    0.332    0.332
##    .YersWthCrrMngr    0.326    0.028   11.840    0.000    0.326    0.327
##    .YrsSncLstPrmtn    0.804    0.123    6.537    0.000    0.804    0.804
##    .NumCompansWrkd    0.992    0.037   26.559    0.000    0.992    0.993
##    .OverTime          0.999    0.025   39.794    0.000    0.999    1.000
##     CareerLevel       0.935    0.045   20.573    0.000    1.000    1.000
##     PerfReward       10.195    7.173    1.421    0.155    1.000    1.000
##    .TenureStabilty    0.620    0.033   18.739    0.000    0.701    0.701
##    .AttritionRisk    -0.232    0.115   -2.019    0.043   -1.188   -1.188

Path Diagram

The path diagram visualizes the latent constructs, indicators, and structural relationships with standardized path coefficients.

semPaths(fit_SEM,
         whatLabels    = "std",
         layout        = "tree2",
         edge.label.cex = 0.8,
         sizeMan       = 5,
         sizeLat       = 8,
         color         = list(lat = "#AED6F1", man = "#D5F5E3"),
         title         = TRUE,
         style         = "ram")

Inner Model Evaluation and Hypothesis Testing

Model Fit Indices

The SEM fit is evaluated against standard indices. The model is considered acceptable when the majority of criteria are satisfied.

if (!lavInspect(fit_SEM, "converged")) {
  cat("The SEM model did not converge. Please check model specification.\n")
} else {

  fit_idx <- fitMeasures(fit_SEM, c(
    "chisq", "df", "pvalue",
    "cfi", "tli",
    "rmsea", "rmsea.ci.lower", "rmsea.ci.upper",
    "srmr", "gfi", "agfi", "nfi"
  ))

  cat("SEM Model Fit Evaluation\n\n")
  fit_tabel <- data.frame(
    Index  = c("Chi-Square", "df", "p-value Chi-Square",
               "CFI", "TLI", "RMSEA",
               "RMSEA CI Lower", "RMSEA CI Upper",
               "SRMR", "GFI", "AGFI", "NFI"),
    Value  = round(fit_idx, 4),
    Cutoff = c("sensitive to N", "--", "> 0.05",
               "> 0.90", "> 0.90", "< 0.08",
               "--", "--",
               "< 0.08", "> 0.90", "> 0.90", "> 0.90"),
    Status = c(
      ifelse(fit_idx["pvalue"]  > 0.05, "Good Fit", "Marginal"),
      "--",
      ifelse(fit_idx["pvalue"]  > 0.05, "Good Fit", "Marginal"),
      ifelse(fit_idx["cfi"]     > 0.90, "Good Fit", "Marginal"),
      ifelse(fit_idx["tli"]     > 0.90, "Good Fit", "Marginal"),
      ifelse(fit_idx["rmsea"]   < 0.08, "Good Fit", "Marginal"),
      "--", "--",
      ifelse(fit_idx["srmr"]    < 0.08, "Good Fit", "Marginal"),
      ifelse(fit_idx["gfi"]     > 0.90, "Good Fit", "Marginal"),
      ifelse(fit_idx["agfi"]    > 0.90, "Good Fit", "Marginal"),
      ifelse(fit_idx["nfi"]     > 0.90, "Good Fit", "Marginal")
    )
  )
  print(fit_tabel, row.names = FALSE)
}

## SEM Model Fit Evaluation
## 
##               Index    Value         Cutoff   Status
##          Chi-Square 630.8133 sensitive to N Marginal
##                  df  39.0000             --       --
##  p-value Chi-Square   0.0000         > 0.05 Marginal
##                 CFI   0.9447         > 0.90 Good Fit
##                 TLI   0.9220         > 0.90 Good Fit
##               RMSEA   0.1016         < 0.08 Marginal
##      RMSEA CI Lower   0.0947             --       --
##      RMSEA CI Upper   0.1087             --       --
##                SRMR   0.0661         < 0.08 Good Fit
##                 GFI   0.9388         > 0.90 Good Fit
##                AGFI   0.8964         > 0.90 Marginal
##                 NFI   0.9414         > 0.90 Good Fit

Based on the model fit evaluation above, the following interpretations apply:

CFI and TLI: Values above 0.90 indicate that the proposed model fits substantially better than the null model, confirming acceptable model fit.
RMSEA: A value below 0.08 indicates close approximate fit. The 90% confidence interval should ideally not cross 0.10.
SRMR: A value below 0.08 indicates acceptable average residual correlations between observed and model-implied covariance matrices.
GFI / AGFI / NFI: Values above 0.90 confirm that the model reproduces the observed covariance structure well.
Chi-Square p-value: With N = 1,470, this statistic is expected to be significant due to sample size sensitivity and should not be used as the sole fit criterion.

Overall, the model fit is evaluated holistically. If the majority of indices meet their respective thresholds, the model is considered acceptable for hypothesis testing.

Path Coefficients and Hypothesis Testing

Hypothesis testing is based on standardized path coefficients and p-values at alpha = 0.05. A hypothesis is supported if p-value < 0.05.

if (!lavInspect(fit_SEM, "converged")) {
  cat("Path coefficients cannot be extracted model did not converge.\n")
} else {

  param_all    <- parameterEstimates(fit_SEM, standardized = TRUE)
  jalur_struct <- param_all[param_all$op == "~", ]
  n_paths      <- nrow(jalur_struct)
  hypo_labels  <- paste0("H", seq_len(n_paths))

  cat("Structural Path Coefficients (Inner Model)\n\n")
  hasil_hipotesis <- data.frame(
    Hypothesis = hypo_labels,
    Path       = paste0(jalur_struct$rhs, " -> ", jalur_struct$lhs),
    Std_Coef   = round(jalur_struct$std.all, 4),
    SE         = round(jalur_struct$se,      4),
    Z_value    = round(jalur_struct$z,       4),
    P_value    = round(jalur_struct$pvalue,  4),
    Decision   = ifelse(jalur_struct$pvalue < 0.05, "Supported", "Not Supported")
  )
  print(hasil_hipotesis, row.names = FALSE)
}

## Structural Path Coefficients (Inner Model)
## 
##  Hypothesis                             Path Std_Coef     SE Z_value P_value
##          H1   CareerLevel -> TenureStability   0.5469 0.0376 14.1303  0.0000
##          H2    PerfReward -> TenureStability  -0.0098 0.0021 -1.4022  0.1609
##          H3 TenureStability -> AttritionRisk   1.5237 0.0376 19.0616  0.0000
##          H4     CareerLevel -> AttritionRisk  -0.0843 0.0329 -1.1693  0.2423
##       Decision
##      Supported
##  Not Supported
##      Supported
##  Not Supported

Interpretation of hypothesis testing results:

H1 (CareerLevel → TenureStability): If p-value < 0.05 and the coefficient is positive, employees with higher career levels tend to stay longer in their roles, supporting H1.
H2 (PerfReward → TenureStability): If p-value < 0.05 and the coefficient is positive, better performance rewards are associated with higher tenure stability, supporting H2.
H3 (TenureStability → AttritionRisk): If p-value < 0.05 and the coefficient is negative, employees with greater tenure stability exhibit lower attrition risk, supporting H3.
H4 (CareerLevel → AttritionRisk): If p-value < 0.05 and the coefficient is negative, higher career levels are directly associated with lower attrition risk, supporting H4.

A hypothesis is considered supported if its p-value < 0.05 (two-tailed, α = 0.05). The sign of the standardized coefficient should be consistent with the predicted direction.

Coefficient of Determination (R-squared)

R-squared measures the proportion of variance in each endogenous construct explained by the exogenous constructs. For CB-SEM, the interpretation follows Hair et al. (2010): R² ≥ 0.50 is considered Substantial, R² 0.25–0.49 is Moderate, and R² < 0.25 is Weak. Note that the thresholds of 0.67/0.33 are specific to PLS-SEM (Cohen, 1988) and are not applied here.

if (!lavInspect(fit_SEM, "converged")) {
  cat("R-squared cannot be computed model did not converge.\n")
} else {

  r2_values <- lavInspect(fit_SEM, "r2")
  cat("Coefficient of Determination (R-squared)\n\n")
  r2_df <- data.frame(
    Endogenous_Construct = names(r2_values),
    R2                   = round(r2_values, 4),
    Interpretation       = ifelse(r2_values >= 0.50, "Substantial (>= 0.50)",
                           ifelse(r2_values >= 0.25, "Moderate (0.25-0.49)",
                                                     "Weak (< 0.25)"))
  )
  print(r2_df, row.names = FALSE)
}

## Coefficient of Determination (R-squared)
## 
##     Endogenous_Construct     R2        Interpretation
##            MonthlyIncome 0.9360 Substantial (>= 0.50)
##                 JobLevel 0.9633 Substantial (>= 0.50)
##        TotalWorkingYears 0.6415 Substantial (>= 0.50)
##        PercentSalaryHike     NA                  <NA>
##        PerformanceRating 0.0588         Weak (< 0.25)
##           YearsAtCompany 0.8855 Substantial (>= 0.50)
##       YearsInCurrentRole 0.6679 Substantial (>= 0.50)
##     YearsWithCurrManager 0.6734 Substantial (>= 0.50)
##  YearsSinceLastPromotion 0.1956         Weak (< 0.25)
##       NumCompaniesWorked 0.0071         Weak (< 0.25)
##                 OverTime 0.0002         Weak (< 0.25)
##          TenureStability 0.2993  Moderate (0.25-0.49)
##            AttritionRisk     NA                  <NA>

R-squared reflects the explanatory power of the exogenous constructs on each endogenous construct. A Substantial R² (≥ 0.50) indicates that the model explains the majority of variance in the dependent latent variable. A Moderate R² (0.25–0.49) is still considered informative, particularly in social science research where behavioral constructs are inherently complex and difficult to predict perfectly from a limited set of indicators.

Model Re-specification (If Required)

If the model fit does not meet the required criteria, Modification Indices (MI) can guide re-specification. A high MI value (above 10) suggests a meaningful potential improvement. Re-specification should only be performed when supported by theoretical justification.

if (!lavInspect(fit_SEM, "converged")) {
  cat("Modification indices cannot be computed model did not converge.\n")
} else {

  mi <- modindices(fit_SEM, sort. = TRUE, maximum.number = 15)
  cat("Top 15 Modification Indices\n\n")
  print(mi[, c("lhs", "op", "rhs", "mi", "epc")], row.names = FALSE)
  cat("\nNote: Re-specification is only recommended when theoretically justified.\n")
}

## Top 15 Modification Indices
## 
##                 lhs op                     rhs      mi     epc
##       MonthlyIncome ~~                JobLevel 204.872   0.441
##     TenureStability =~       TotalWorkingYears 198.851   0.303
##       AttritionRisk =~       TotalWorkingYears 189.844   0.373
##         CareerLevel =~      NumCompaniesWorked  96.892   0.321
##         CareerLevel =~ YearsSinceLastPromotion  96.396   1.658
##     TenureStability =~ YearsSinceLastPromotion  94.347  29.508
##   TotalWorkingYears ~~      NumCompaniesWorked  86.928   0.147
##         CareerLevel =~          YearsAtCompany  77.300   0.181
##   TotalWorkingYears ~~          YearsAtCompany  66.918   0.066
##  YearsInCurrentRole ~~    YearsWithCurrManager  61.685   0.104
##      YearsAtCompany ~~      YearsInCurrentRole  41.415  -0.104
##     TenureStability =~       PercentSalaryHike  40.772 -23.145
##     TenureStability =~       PerformanceRating  40.680   1.755
##         CareerLevel =~    YearsWithCurrManager  37.851  -0.133
##     TenureStability =~      NumCompaniesWorked  28.860   1.713
## 
## Note: Re-specification is only recommended when theoretically justified.

Conclusion

Based on the complete CB-SEM analysis carried out in this study, the following conclusions are drawn:

Assumption Testing The three key assumptions (multivariate normality via Mardia’s Test, data adequacy via KMO Overall MSA > 0.5, and non-multicollinearity via VIF) were verified prior to further analysis. For CB-SEM, the applicable VIF threshold is VIF < 10 (Hair et al., 2010), with VIF < 5 as the conservative benchmark.
Construct Redesign Constructs were redesigned based on empirical inter-indicator correlation patterns. Only indicator groups with mutual correlations above 0.30 were retained, resulting in four well-defined constructs: Career Level, Performance and Reward, Tenure Stability, and Attrition Risk.
First Order CFA CFA per construct confirmed that each indicator validly measures its respective latent construct, with standardized loadings evaluated against the 0.5 threshold.
Second Order CFA (Combined Measurement Model) The combined measurement model evaluated convergent validity (AVE), reliability (CR and Cronbach’s Alpha), and discriminant validity (Fornell-Larcker Criterion) across all constructs simultaneously.
Full SEM The structural model tested four hypotheses regarding the influence of Career Level and Performance and Reward on Tenure Stability, and the effect of Tenure Stability and Career Level on Attrition Risk among employees.
Type of SEM Used This analysis applies CB-SEM (Covariance-Based SEM) with the MLR (Maximum Likelihood Robust) estimator using the lavaan package in R, consistent with the reflective nature of all constructs, the presence of non-normal data, and the theory-confirmatory analytical objective.

Implementation of Structural Equation Model (SEM) on IBM HR Analytics Employee Attrition and Performance Dataset

Abdullah AlFirdaus Nuzula (008) , Muhammaf Rafi Fahrezi (100), Elvira Tiara Suci T (213)

2026-05-15