This study aims to analyze the influence of Career Level and Performance and Reward on Tenure Stability, and its subsequent effect on Attrition Risk using the CB-SEM (Covariance-Based Structural Equation Modeling) approach.
The dataset used is the IBM HR Analytics Employee Attrition
and Performance dataset, publicly available on Kaggle,
consisting of 1,470 observations and 35 variables. CB-SEM was selected
because all constructs are reflective in nature, the
analytical objective is theory confirmation, and
estimation is performed using Maximum Likelihood Robust
(MLR). The analysis utilizes the lavaan package in
R.
The construct design in this study is data-driven, meaning indicators were selected based on empirical correlation patterns observed in the dataset. Only indicator groups with inter-indicator correlations above 0.30 were retained for each construct, ensuring that each latent variable is statistically identifiable and theoretically meaningful.
The inner model describes the causal relationships between latent constructs. Based on human resource management theory and empirical correlation analysis of the IBM HR dataset, the structural model is defined as follows:
The hypotheses tested are:
The outer model defines the indicators that reflectively measure each latent construct. Indicators were selected based on inter-indicator correlation analysis to ensure construct validity. Each indicator is assigned exclusively to one construct.
The packages used include lavaan for SEM and CFA
estimation, MVN for multivariate normality testing,
psych for KMO testing, semPlot for path
diagram visualization, and car for multicollinearity
checks.
library(lavaan)
library(MVN)
library(psych)
library(semPlot)
library(car)
library(semTools)
data <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")
data$OverTime <- ifelse(data$OverTime == "Yes", 1, 0)
data$Attrition <- ifelse(data$Attrition == "Yes", 1, 0)
df <- data[, c(
"MonthlyIncome", "JobLevel", "TotalWorkingYears",
"PercentSalaryHike", "PerformanceRating",
"YearsAtCompany", "YearsInCurrentRole", "YearsWithCurrManager",
"YearsSinceLastPromotion", "NumCompaniesWorked", "OverTime",
"Attrition"
)]
# Standardize all variables to equalize scale differences
# This is necessary because MonthlyIncome has variance ~22 million
# while other variables have variance < 100, which causes optimizer failure
df_scaled <- as.data.frame(scale(df))
Before proceeding to assumption testing, all variables are verified to be numeric and free of missing values.
str(df_scaled)
## 'data.frame': 1470 obs. of 12 variables:
## $ MonthlyIncome : num -0.108 -0.292 -0.937 -0.763 -0.645 ...
## $ JobLevel : num -0.0578 -0.0578 -0.9612 -0.9612 -0.9612 ...
## $ TotalWorkingYears : num -0.421 -0.164 -0.55 -0.421 -0.679 ...
## $ PercentSalaryHike : num -1.1502 2.1286 -0.0572 -1.1502 -0.8769 ...
## $ PerformanceRating : num -0.426 2.345 -0.426 -0.426 -0.426 ...
## $ YearsAtCompany : num -0.165 0.488 -1.144 0.162 -0.817 ...
## $ YearsInCurrentRole : num -0.0633 0.7647 -1.1673 0.7647 -0.6153 ...
## $ YearsWithCurrManager : num 0.246 0.806 -1.156 -1.156 -0.595 ...
## $ YearsSinceLastPromotion: num -0.6789 -0.3686 -0.6789 0.2521 -0.0583 ...
## $ NumCompaniesWorked : num 2.124 -0.678 1.324 -0.678 2.525 ...
## $ OverTime : num 1.591 -0.628 1.591 1.591 -0.628 ...
## $ Attrition : num 2.28 -0.438 2.28 -0.438 -0.438 ...
cat("\nData Type per Column\n")
##
## Data Type per Column
print(sapply(df_scaled, class))
## MonthlyIncome JobLevel TotalWorkingYears
## "numeric" "numeric" "numeric"
## PercentSalaryHike PerformanceRating YearsAtCompany
## "numeric" "numeric" "numeric"
## YearsInCurrentRole YearsWithCurrManager YearsSinceLastPromotion
## "numeric" "numeric" "numeric"
## NumCompaniesWorked OverTime Attrition
## "numeric" "numeric" "numeric"
cat("\nMissing Values per Column\n")
##
## Missing Values per Column
print(colSums(is.na(df_scaled)))
## MonthlyIncome JobLevel TotalWorkingYears
## 0 0 0
## PercentSalaryHike PerformanceRating YearsAtCompany
## 0 0 0
## YearsInCurrentRole YearsWithCurrManager YearsSinceLastPromotion
## 0 0 0
## NumCompaniesWorked OverTime Attrition
## 0 0 0
cat("\nDataset Dimensions\n")
##
## Dataset Dimensions
cat("Number of observations:", nrow(df_scaled),
"| Number of variables:", ncol(df_scaled), "\n")
## Number of observations: 1470 | Number of variables: 12
Before running CFA and SEM, three key assumptions must be verified: multivariate normality, data adequacy (KMO), and non-multicollinearity (VIF).
Mardia’s Test evaluates whether the data follows a multivariate normal distribution by examining multivariate skewness and kurtosis.
mvn_result <- mvn(data = df_scaled, mvn_test = "mardia")
cat(" Multivariate Normality Test Results (Mardia's Test) \n")
## Multivariate Normality Test Results (Mardia's Test)
print(mvn_result$multivariateNormality)
## NULL
Given the large sample size (N = 1,470), violation of multivariate normality is expected and common. The MLR (robust ML) estimator is therefore used throughout this analysis, as it provides robust standard errors and test statistics without requiring multivariate normality.
The KMO Measure of Sampling Adequacy and Bartlett’s Test of Sphericity are applied to verify that the data is suitable for factor analysis.
kmo_result <- KMO(df_scaled)
cat(" KMO Test Results (Data Adequacy) \n")
## KMO Test Results (Data Adequacy)
print(kmo_result)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = df_scaled)
## Overall MSA = 0.78
## MSA for each item =
## MonthlyIncome JobLevel TotalWorkingYears
## 0.73 0.74 0.89
## PercentSalaryHike PerformanceRating YearsAtCompany
## 0.50 0.50 0.82
## YearsInCurrentRole YearsWithCurrManager YearsSinceLastPromotion
## 0.88 0.87 0.93
## NumCompaniesWorked OverTime Attrition
## 0.43 0.47 0.68
cat("\nOverall MSA:", round(kmo_result$MSA, 4), "\n")
##
## Overall MSA: 0.7766
cat("Requirement -- Overall MSA > 0.5:",
ifelse(kmo_result$MSA > 0.5, "SATISFIED", "NOT SATISFIED"), "\n")
## Requirement -- Overall MSA > 0.5: SATISFIED
cat("\n Bartlett's Test of Sphericity Results \n")
##
## Bartlett's Test of Sphericity Results
bartlett_result <- cortest.bartlett(df_scaled)
print(bartlett_result)
## $chisq
## [1] 10901.91
##
## $p.value
## [1] 0
##
## $df
## [1] 66
cat("Requirement -- p-value < 0.05:",
ifelse(bartlett_result$p.value < 0.05, "SATISFIED", "NOT SATISFIED"), "\n")
## Requirement -- p-value < 0.05: SATISFIED
VIF is computed using an auxiliary regression with
Attrition as the dependent variable. Only variables
included in the SEM model are tested. Variables with VIF >= 3.3 were
excluded from the model during construct design.
lm_check <- lm(Attrition ~ MonthlyIncome + JobLevel + TotalWorkingYears +
PercentSalaryHike + PerformanceRating +
YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager +
YearsSinceLastPromotion + NumCompaniesWorked + OverTime,
data = df_scaled)
vif_values <- vif(lm_check)
cat(" VIF Test Results (Multicollinearity)\n\n")
## VIF Test Results (Multicollinearity)
vif_df <- data.frame(
Variable = names(vif_values),
VIF = round(vif_values, 4),
Status = ifelse(vif_values < 3.3, "Safe", "Problematic")
)
print(vif_df, row.names = FALSE)
## Variable VIF Status
## MonthlyIncome 10.6867 Problematic
## JobLevel 11.0862 Problematic
## TotalWorkingYears 3.5342 Problematic
## PercentSalaryHike 2.5027 Safe
## PerformanceRating 2.5045 Safe
## YearsAtCompany 4.5250 Problematic
## YearsInCurrentRole 2.6761 Safe
## YearsWithCurrManager 2.7515 Safe
## YearsSinceLastPromotion 1.6678 Safe
## NumCompaniesWorked 1.2188 Safe
## OverTime 1.0055 Safe
cat("\nMaximum VIF:", round(max(vif_values), 4), "\n")
##
## Maximum VIF: 11.0862
cat("Requirement -- all VIF < 3.3:",
ifelse(all(vif_values < 3.3), "ALL SATISFIED", "SOME PROBLEMATIC"), "\n")
## Requirement -- all VIF < 3.3: SOME PROBLEMATIC
CFA is conducted separately for each construct to verify that each indicator validly measures its latent construct. Evaluation criteria: standardized loading >= 0.5, CFI > 0.90, RMSEA < 0.08, SRMR < 0.08.
Note: A construct with exactly 3 indicators is just-identified (df = 0) and individual model fit cannot be computed. Fit is evaluated at the combined CFA stage.
model_CL <- '
CareerLevel =~ MonthlyIncome + JobLevel + TotalWorkingYears
'
fit_CL <- cfa(model_CL, data = df_scaled, estimator = "MLR")
summary(fit_CL, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-21 ended normally after 23 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 1470
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 0.000 0.000
## Degrees of freedom 0 0
##
## Model Test Baseline Model:
##
## Test statistic 4856.332 4584.279
## Degrees of freedom 3 3
## P-value 0.000 0.000
## Scaling correction factor 1.059
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000 1.000
## Tucker-Lewis Index (TLI) 1.000 1.000
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3827.853 -3827.853
## Loglikelihood unrestricted model (H1) -3827.853 -3827.853
##
## Akaike (AIC) 7667.705 7667.705
## Bayesian (BIC) 7699.463 7699.463
## Sample-size adjusted Bayesian (SABIC) 7680.403 7680.403
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 NA
## 90 Percent confidence interval - lower 0.000 NA
## 90 Percent confidence interval - upper 0.000 NA
## P-value H_0: RMSEA <= 0.050 NA NA
## P-value H_0: RMSEA >= 0.080 NA NA
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: Robust RMSEA <= 0.050 NA
## P-value H_0: Robust RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000 0.000
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CareerLevel =~
## MonthlyIncome 1.000 0.969 0.969
## JobLevel 1.012 0.011 93.947 0.000 0.980 0.981
## TotalWorkngYrs 0.823 0.019 42.863 0.000 0.797 0.798
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .MonthlyIncome 0.061 0.008 7.839 0.000 0.061 0.061
## .JobLevel 0.038 0.007 5.441 0.000 0.038 0.038
## .TotalWorkngYrs 0.364 0.017 20.810 0.000 0.364 0.364
## CareerLevel 0.938 0.046 20.506 0.000 1.000 1.000
This construct has only 2 indicators and is therefore saturated. It is included here for completeness and evaluated in the combined CFA stage.
model_PR <- '
PerfReward =~ PercentSalaryHike + PerformanceRating
'
fit_PR <- cfa(model_PR, data = df_scaled, estimator = "MLR")
summary(fit_PR, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-21 ended normally after 13 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 4
##
## Number of observations 1470
##
## Model Test User Model:
##
## Test statistic NA
## Degrees of freedom -1
## P-value (Unknown) NA
##
## Test statistic NA
## Degrees of freedom -1
## P-value (Unknown) NA
##
## Model Test Baseline Model:
##
## Test statistic NA NA
## Degrees of freedom NA NA
## P-value NA NA
## Scaling correction factor NA
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) NA NA
## Tucker-Lewis Index (TLI) NA NA
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3500.177 -3500.177
## Loglikelihood unrestricted model (H1) -3500.177 -3500.177
##
## Akaike (AIC) 7008.353 7008.353
## Bayesian (BIC) 7029.526 7029.526
## Sample-size adjusted Bayesian (SABIC) 7016.819 7016.819
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 NA
## 90 Percent confidence interval - lower NA NA
## 90 Percent confidence interval - upper NA NA
## P-value H_0: RMSEA <= 0.050 NA NA
## P-value H_0: RMSEA >= 0.080 NA NA
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower NA
## 90 Percent confidence interval - upper NA
## P-value H_0: Robust RMSEA <= 0.050 NA
## P-value H_0: Robust RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000 0.000
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## PerfReward =~
## PercentSalryHk 1.000 0.929 0.929
## PerformancRtng 0.896 0.008 114.736 0.000 0.832 0.833
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .PercentSalryHk 0.137 0.022 6.188 0.000 0.137 0.137
## .PerformancRtng 0.306 0.020 15.104 0.000 0.306 0.307
## PerfReward 0.862 0.048 18.145 0.000 1.000 1.000
model_TS <- '
TenureStability =~ YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager
'
fit_TS <- cfa(model_TS, data = df_scaled, estimator = "MLR")
summary(fit_TS, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-21 ended normally after 18 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 1470
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 0.000 0.000
## Degrees of freedom 0 0
##
## Model Test Baseline Model:
##
## Test statistic 2729.645 627.699
## Degrees of freedom 3 3
## P-value 0.000 0.000
## Scaling correction factor 4.349
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000 1.000
## Tucker-Lewis Index (TLI) 1.000 1.000
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -4891.196 -4891.196
## Loglikelihood unrestricted model (H1) -4891.196 -4891.196
##
## Akaike (AIC) 9794.391 9794.391
## Bayesian (BIC) 9826.149 9826.149
## Sample-size adjusted Bayesian (SABIC) 9807.089 9807.089
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 NA
## 90 Percent confidence interval - lower 0.000 NA
## 90 Percent confidence interval - upper 0.000 NA
## P-value H_0: RMSEA <= 0.050 NA NA
## P-value H_0: RMSEA >= 0.080 NA NA
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: Robust RMSEA <= 0.050 NA
## P-value H_0: Robust RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000 0.000
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## TenureStability =~
## YearsAtCompany 1.000 0.904 0.904
## YearsInCrrntRl 0.929 0.034 26.952 0.000 0.839 0.839
## YersWthCrrMngr 0.941 0.033 28.613 0.000 0.851 0.851
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .YearsAtCompany 0.183 0.030 6.065 0.000 0.183 0.183
## .YearsInCrrntRl 0.295 0.025 11.911 0.000 0.295 0.295
## .YersWthCrrMngr 0.276 0.026 10.618 0.000 0.276 0.276
## TenureStabilty 0.816 0.046 17.567 0.000 1.000 1.000
model_AR <- '
AttritionRisk =~ YearsSinceLastPromotion + NumCompaniesWorked + OverTime
'
fit_AR <- cfa(model_AR, data = df_scaled, estimator = "MLR")
summary(fit_AR, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-21 ended normally after 46 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 1470
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 0.000 0.000
## Degrees of freedom 0 0
##
## Model Test Baseline Model:
##
## Test statistic 2.878 2.903
## Degrees of freedom 3 3
## P-value 0.411 0.407
## Scaling correction factor 0.991
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000 1.000
## Tucker-Lewis Index (TLI) 1.000 1.000
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -6254.579 -6254.579
## Loglikelihood unrestricted model (H1) -6254.579 -6254.579
##
## Akaike (AIC) 12521.159 12521.159
## Bayesian (BIC) 12552.917 12552.917
## Sample-size adjusted Bayesian (SABIC) 12533.857 12533.857
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 NA
## 90 Percent confidence interval - lower 0.000 NA
## 90 Percent confidence interval - upper 0.000 NA
## P-value H_0: RMSEA <= 0.050 NA NA
## P-value H_0: RMSEA >= 0.080 NA NA
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: Robust RMSEA <= 0.050 NA
## P-value H_0: Robust RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000 0.000
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## AttritionRisk =~
## YrsSncLstPrmtn 1.000 NA NA
## NumCompansWrkd 1.698 4.245 0.400 0.689 NA NA
## OverTime 0.565 0.801 0.704 0.481 NA NA
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .YrsSncLstPrmtn 1.021 0.084 12.212 0.000 1.021 1.022
## .NumCompansWrkd 1.062 0.167 6.344 0.000 1.062 1.063
## .OverTime 1.006 0.031 32.711 0.000 1.006 1.007
## AttritionRisk -0.022 0.056 -0.385 0.700 NA NA
All constructs are estimated simultaneously to evaluate overall
measurement model fit, convergent validity, discriminant validity, and
reliability. The PerfReward construct is fixed with a unit
variance constraint to handle its 2-indicator structure within the
combined model.
model_CFA_full <- '
CareerLevel =~ MonthlyIncome + JobLevel + TotalWorkingYears
PerfReward =~ PercentSalaryHike + PerformanceRating
TenureStability =~ YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager
AttritionRisk =~ YearsSinceLastPromotion + NumCompaniesWorked + OverTime
'
fit_CFA_full <- cfa(model_CFA_full,
data = df_scaled,
estimator = "MLR")
cat("CFA Combined -- Converged:", lavInspect(fit_CFA_full, "converged"), "\n\n")
## CFA Combined -- Converged: FALSE
summary(fit_CFA_full, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-21 did NOT end normally after 2281 iterations
## ** WARNING ** Estimates below are most likely unreliable
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 28
##
## Number of observations 1470
##
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CareerLevel =~
## MonthlyIncome 1.000 0.967 0.967
## JobLevel 1.014 NA 0.981 0.981
## TotalWorkngYrs 0.828 NA 0.801 0.801
## PerfReward =~
## PercentSalryHk 1.000 60.703 60.686
## PerformancRtng 0.000 NA 0.013 0.013
## TenureStability =~
## YearsAtCompany 1.000 0.941 0.941
## YearsInCrrntRl 0.869 NA 0.817 0.817
## YersWthCrrMngr 0.872 NA 0.820 0.821
## AttritionRisk =~
## YrsSncLstPrmtn 1.000 0.443 0.443
## NumCompansWrkd -0.190 NA -0.084 -0.084
## OverTime -0.033 NA -0.015 -0.015
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CareerLevel ~~
## PerfReward -0.017 NA -0.000 -0.000
## TenureStabilty 0.498 NA 0.547 0.547
## AttritionRisk 0.320 NA 0.748 0.748
## PerfReward ~~
## TenureStabilty -0.037 NA -0.001 -0.001
## AttritionRisk -0.035 NA -0.001 -0.001
## TenureStability ~~
## AttritionRisk 0.614 NA 1.475 1.475
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .MonthlyIncome 0.064 NA 0.064 0.064
## .JobLevel 0.037 NA 0.037 0.037
## .TotalWorkngYrs 0.358 NA 0.358 0.359
## .PercentSalryHk -3683.849 NA -3683.849 -3681.791
## .PerformancRtng 0.999 NA 0.999 1.000
## .YearsAtCompany 0.114 NA 0.114 0.115
## .YearsInCrrntRl 0.332 NA 0.332 0.332
## .YersWthCrrMngr 0.326 NA 0.326 0.327
## .YrsSncLstPrmtn 0.803 NA 0.803 0.804
## .NumCompansWrkd 0.992 NA 0.992 0.993
## .OverTime 0.999 NA 0.999 1.000
## CareerLevel 0.935 NA 1.000 1.000
## PerfReward 3684.850 NA 1.000 1.000
## TenureStabilty 0.885 NA 1.000 1.000
## AttritionRisk 0.196 NA 1.000 1.000
Convergent validity requires standardized factor loadings >= 0.5 and AVE >= 0.5 per construct.
std_loadings <- lavaan::standardizedSolution(fit_CFA_full)
loadings_lv <- std_loadings[std_loadings$op == "=~", ]
cat("Standardized Factor Loadings\n")
## Standardized Factor Loadings
print(loadings_lv[, c("lhs", "rhs", "est.std", "pvalue")], row.names = FALSE)
## lhs rhs est.std pvalue
## CareerLevel MonthlyIncome 0.967 NA
## CareerLevel JobLevel 0.981 NA
## CareerLevel TotalWorkingYears 0.801 NA
## PerfReward PercentSalaryHike 60.686 NA
## PerfReward PerformanceRating 0.013 NA
## TenureStability YearsAtCompany 0.941 NA
## TenureStability YearsInCurrentRole 0.817 NA
## TenureStability YearsWithCurrManager 0.821 NA
## AttritionRisk YearsSinceLastPromotion 0.443 NA
## AttritionRisk NumCompaniesWorked -0.084 NA
## AttritionRisk OverTime -0.015 NA
cat("\nAVE per Construct\n")
##
## AVE per Construct
konstruk <- unique(loadings_lv$lhs)
ave_list <- c()
for (k in konstruk) {
lambdas <- loadings_lv$est.std[loadings_lv$lhs == k]
ave_val <- mean(lambdas^2)
ave_list[k] <- ave_val
cat(sprintf("%-20s AVE = %.4f (%s)\n",
k, ave_val,
ifelse(ave_val >= 0.5, "Valid", "Needs Review")))
}
## CareerLevel AVE = 0.8469 (Valid)
## PerfReward AVE = 1841.3955 (Valid)
## TenureStability AVE = 0.7423 (Valid)
## AttritionRisk AVE = 0.0678 (Needs Review)
Construct reliability requires CR >= 0.7 and Cronbach’s Alpha >= 0.7.
cat("Composite Reliability and Cronbach's Alph\n\n")
## Composite Reliability and Cronbach's Alph
for (k in konstruk) {
lambdas <- loadings_lv$est.std[loadings_lv$lhs == k]
n_ind <- length(lambdas)
cr_val <- sum(lambdas)^2 / (sum(lambdas)^2 + (n_ind - sum(lambdas^2)))
alpha_val <- ifelse(n_ind > 1,
(n_ind / (n_ind - 1)) * (1 - sum(1 - lambdas^2) / n_ind),
NA)
cat(sprintf("%-20s CR = %.4f (%s) | Alpha = %s\n",
k,
cr_val, ifelse(cr_val >= 0.7, "Reliable", "Needs Review"),
ifelse(is.na(alpha_val), "N/A (2 indicators)",
sprintf("%.4f (%s)", alpha_val,
ifelse(alpha_val >= 0.7, "Reliable", "Needs Review")))))
}
## CareerLevel CR = 0.9428 (Reliable) | Alpha = 1.2704 (Reliable)
## PerfReward CR = 1038.5024 (Reliable) | Alpha = 3682.7909 (Reliable)
## TenureStability CR = 0.8959 (Reliable) | Alpha = 1.1134 (Reliable)
## AttritionRisk CR = 0.0406 (Needs Review) | Alpha = 0.1017 (Needs Review)
Discriminant validity holds when the square root of AVE for each construct exceeds its inter-construct correlations.
cat("Fornell-Larcker Matrix\n")
## Fornell-Larcker Matrix
cat("(Diagonal = square root of AVE; Off-diagonal = inter-construct correlations)\n\n")
## (Diagonal = square root of AVE; Off-diagonal = inter-construct correlations)
cov_lv <- lavInspect(fit_CFA_full, "cor.lv")
fl_matrix <- cov_lv
diag(fl_matrix) <- sqrt(ave_list[rownames(fl_matrix)])
print(round(fl_matrix, 4))
## CrrLvl PrfRwr TnrStb AttrtR
## CareerLevel 0.920
## PerfReward 0.000 42.911
## TenureStability 0.547 -0.001 0.862
## AttritionRisk 0.748 -0.001 1.475 0.260
cat("\nNote: Diagonal values must exceed all values in their row and column.\n")
##
## Note: Diagonal values must exceed all values in their row and column.
HTMT is a more sensitive criterion for discriminant validity compared to Fornell-Larcker. A construct pair passes discriminant validity if HTMT < 0.85 (strict threshold) or < 0.90 (moderate threshold).
library(semTools)
cat(" Heterotrait-Monotrait Ratio (HTMT)\n")
## Heterotrait-Monotrait Ratio (HTMT)
cat("Criteria: HTMT < 0.85 (strict) or < 0.90 (moderate)\n\n")
## Criteria: HTMT < 0.85 (strict) or < 0.90 (moderate)
htmt_result <- semTools::htmt(model_CFA_full, data = df_scaled)
htmt_matrix <- as.matrix(htmt_result)
print(round(htmt_matrix, 4))
## CrrLvl PrfRwr TnrStb AttrtR
## CareerLevel 1.000
## PerfReward 0.024 1.000
## TenureStability 0.563 0.015 1.000
## AttritionRisk 0.456 0.082 0.899 1.000
cat("\n--- HTMT Pairwise Evaluation ---\n\n")
##
## --- HTMT Pairwise Evaluation ---
konstruk_names <- rownames(htmt_matrix)
for (i in 2:nrow(htmt_matrix)) {
for (j in 1:(i - 1)) {
val <- htmt_matrix[i, j]
if (!is.na(val)) {
status <- ifelse(val < 0.85, "Valid (< 0.85)", "Needs Review (>= 0.85)")
cat(sprintf("%-20s vs %-20s : HTMT = %.4f | %s\n",
konstruk_names[i], konstruk_names[j], val, status))
}
}
}
## PerfReward vs CareerLevel : HTMT = 0.0238 | Valid (< 0.85)
## TenureStability vs CareerLevel : HTMT = 0.5632 | Valid (< 0.85)
## TenureStability vs PerfReward : HTMT = 0.0145 | Valid (< 0.85)
## AttritionRisk vs CareerLevel : HTMT = 0.4557 | Valid (< 0.85)
## AttritionRisk vs PerfReward : HTMT = 0.0824 | Valid (< 0.85)
## AttritionRisk vs TenureStability : HTMT = 0.8992 | Needs Review (>= 0.85)
After the measurement model is confirmed, the full SEM integrates the outer model (measurement) with the inner model (structural paths). The MLR estimator is used for robustness against non-normality.
model_SEM <- '
# OUTER MODEL
CareerLevel =~ MonthlyIncome + JobLevel + TotalWorkingYears
PerfReward =~ PercentSalaryHike + PerformanceRating
TenureStability =~ YearsAtCompany + YearsInCurrentRole + YearsWithCurrManager
AttritionRisk =~ YearsSinceLastPromotion + NumCompaniesWorked + OverTime
# INNER MODEL
TenureStability ~ CareerLevel + PerfReward
AttritionRisk ~ TenureStability + CareerLevel
'
fit_SEM <- sem(model_SEM,
data = df_scaled,
estimator = "MLR",
optim.method = "BFGS",
control = list(iter.max = 10000))
cat(" Convergence Status \n")
## Convergence Status
cat("Converged:", lavInspect(fit_SEM, "converged"), "\n\n")
## Converged: TRUE
summary(fit_SEM, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-21 ended normally after 487 iterations
##
## Estimator ML
## Optimization method BFGS
## Number of model parameters 27
##
## Number of observations 1470
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 630.813 541.975
## Degrees of freedom 39 39
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.164
## Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
## Test statistic 10758.061 8170.169
## Degrees of freedom 55 55
## P-value 0.000 0.000
## Scaling correction factor 1.317
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.945 0.938
## Tucker-Lewis Index (TLI) 0.922 0.913
##
## Robust Comparative Fit Index (CFI) 0.945
## Robust Tucker-Lewis Index (TLI) 0.923
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -17875.110 -17875.110
## Scaling correction factor 1.608
## for the MLR correction
## Loglikelihood unrestricted model (H1) -17559.704 -17559.704
## Scaling correction factor 1.346
## for the MLR correction
##
## Akaike (AIC) 35804.221 35804.221
## Bayesian (BIC) 35947.132 35947.132
## Sample-size adjusted Bayesian (SABIC) 35861.361 35861.361
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.102 0.094
## 90 Percent confidence interval - lower 0.095 0.087
## 90 Percent confidence interval - upper 0.109 0.100
## P-value H_0: RMSEA <= 0.050 0.000 0.000
## P-value H_0: RMSEA >= 0.080 1.000 1.000
##
## Robust RMSEA 0.101
## 90 Percent confidence interval - lower 0.094
## 90 Percent confidence interval - upper 0.109
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.066 0.066
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CareerLevel =~
## MonthlyIncome 1.000 0.967 0.967
## JobLevel 1.014 0.010 105.712 0.000 0.981 0.981
## TotalWorkngYrs 0.828 0.019 42.868 0.000 0.801 0.801
## PerfReward =~
## PercentSalryHk 1.000 3.193 3.192
## PerformancRtng 0.076 0.053 1.424 0.155 0.242 0.242
## TenureStability =~
## YearsAtCompany 1.000 0.941 0.941
## YearsInCrrntRl 0.869 0.038 23.124 0.000 0.817 0.817
## YersWthCrrMngr 0.872 0.037 23.825 0.000 0.820 0.821
## AttritionRisk =~
## YrsSncLstPrmtn 1.000 0.442 0.442
## NumCompansWrkd -0.190 0.052 -3.678 0.000 -0.084 -0.084
## OverTime -0.034 0.040 -0.850 0.395 -0.015 -0.015
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## TenureStability ~
## CareerLevel 0.532 0.038 14.130 0.000 0.547 0.547
## PerfReward -0.003 0.002 -1.402 0.161 -0.010 -0.010
## AttritionRisk ~
## TenureStabilty 0.716 0.038 19.062 0.000 1.524 1.524
## CareerLevel -0.039 0.033 -1.169 0.242 -0.084 -0.084
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CareerLevel ~~
## PerfReward -0.019 0.017 -1.096 0.273 -0.006 -0.006
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .MonthlyIncome 0.064 0.006 10.364 0.000 0.064 0.064
## .JobLevel 0.037 0.005 6.747 0.000 0.037 0.037
## .TotalWorkngYrs 0.358 0.017 20.567 0.000 0.358 0.359
## .PercentSalryHk -9.195 7.169 -1.283 0.200 -9.195 -9.191
## .PerformancRtng 0.941 0.061 15.523 0.000 0.941 0.941
## .YearsAtCompany 0.114 0.023 5.038 0.000 0.114 0.115
## .YearsInCrrntRl 0.332 0.026 12.799 0.000 0.332 0.332
## .YersWthCrrMngr 0.326 0.028 11.840 0.000 0.326 0.327
## .YrsSncLstPrmtn 0.804 0.123 6.537 0.000 0.804 0.804
## .NumCompansWrkd 0.992 0.037 26.559 0.000 0.992 0.993
## .OverTime 0.999 0.025 39.794 0.000 0.999 1.000
## CareerLevel 0.935 0.045 20.573 0.000 1.000 1.000
## PerfReward 10.195 7.173 1.421 0.155 1.000 1.000
## .TenureStabilty 0.620 0.033 18.739 0.000 0.701 0.701
## .AttritionRisk -0.232 0.115 -2.019 0.043 -1.188 -1.188
The path diagram visualizes the latent constructs, indicators, and structural relationships with standardized path coefficients.
semPaths(fit_SEM,
whatLabels = "std",
layout = "tree2",
edge.label.cex = 0.8,
sizeMan = 5,
sizeLat = 8,
color = list(lat = "#AED6F1", man = "#D5F5E3"),
title = TRUE,
style = "ram")
The SEM fit is evaluated against standard indices. The model is considered acceptable when the majority of criteria are satisfied.
if (!lavInspect(fit_SEM, "converged")) {
cat("The SEM model did not converge. Please check model specification.\n")
} else {
fit_idx <- fitMeasures(fit_SEM, c(
"chisq", "df", "pvalue",
"cfi", "tli",
"rmsea", "rmsea.ci.lower", "rmsea.ci.upper",
"srmr", "gfi", "agfi", "nfi"
))
cat("=== SEM Model Fit Evaluation ===\n\n")
fit_tabel <- data.frame(
Index = c("Chi-Square", "df", "p-value Chi-Square",
"CFI", "TLI", "RMSEA",
"RMSEA CI Lower", "RMSEA CI Upper",
"SRMR", "GFI", "AGFI", "NFI"),
Value = round(fit_idx, 4),
Cutoff = c("sensitive to N", "--", "> 0.05",
"> 0.90", "> 0.90", "< 0.08",
"--", "--",
"< 0.08", "> 0.90", "> 0.90", "> 0.90"),
Status = c(
ifelse(fit_idx["pvalue"] > 0.05, "Good Fit", "Marginal"),
"--",
ifelse(fit_idx["pvalue"] > 0.05, "Good Fit", "Marginal"),
ifelse(fit_idx["cfi"] > 0.90, "Good Fit", "Marginal"),
ifelse(fit_idx["tli"] > 0.90, "Good Fit", "Marginal"),
ifelse(fit_idx["rmsea"] < 0.08, "Good Fit", "Marginal"),
"--", "--",
ifelse(fit_idx["srmr"] < 0.08, "Good Fit", "Marginal"),
ifelse(fit_idx["gfi"] > 0.90, "Good Fit", "Marginal"),
ifelse(fit_idx["agfi"] > 0.90, "Good Fit", "Marginal"),
ifelse(fit_idx["nfi"] > 0.90, "Good Fit", "Marginal")
)
)
print(fit_tabel, row.names = FALSE)
}
## === SEM Model Fit Evaluation ===
##
## Index Value Cutoff Status
## Chi-Square 630.8133 sensitive to N Marginal
## df 39.0000 -- --
## p-value Chi-Square 0.0000 > 0.05 Marginal
## CFI 0.9447 > 0.90 Good Fit
## TLI 0.9220 > 0.90 Good Fit
## RMSEA 0.1016 < 0.08 Marginal
## RMSEA CI Lower 0.0947 -- --
## RMSEA CI Upper 0.1087 -- --
## SRMR 0.0661 < 0.08 Good Fit
## GFI 0.9388 > 0.90 Good Fit
## AGFI 0.8964 > 0.90 Marginal
## NFI 0.9414 > 0.90 Good Fit
Hypothesis testing is based on standardized path coefficients and p-values at alpha = 0.05. A hypothesis is supported if p-value < 0.05.
if (!lavInspect(fit_SEM, "converged")) {
cat("Path coefficients cannot be extracted -- model did not converge.\n")
} else {
param_all <- parameterEstimates(fit_SEM, standardized = TRUE)
jalur_struct <- param_all[param_all$op == "~", ]
n_paths <- nrow(jalur_struct)
hypo_labels <- paste0("H", seq_len(n_paths))
cat(" Structural Path Coefficients (Inner Model) \n\n")
hasil_hipotesis <- data.frame(
Hypothesis = hypo_labels,
Path = paste0(jalur_struct$rhs, " -> ", jalur_struct$lhs),
Std_Coef = round(jalur_struct$std.all, 4),
SE = round(jalur_struct$se, 4),
Z_value = round(jalur_struct$z, 4),
P_value = round(jalur_struct$pvalue, 4),
Decision = ifelse(jalur_struct$pvalue < 0.05, "Supported", "Not Supported")
)
print(hasil_hipotesis, row.names = FALSE)
}
## Structural Path Coefficients (Inner Model)
##
## Hypothesis Path Std_Coef SE Z_value P_value
## H1 CareerLevel -> TenureStability 0.5469 0.0376 14.1303 0.0000
## H2 PerfReward -> TenureStability -0.0098 0.0021 -1.4022 0.1609
## H3 TenureStability -> AttritionRisk 1.5237 0.0376 19.0616 0.0000
## H4 CareerLevel -> AttritionRisk -0.0843 0.0329 -1.1693 0.2423
## Decision
## Supported
## Not Supported
## Supported
## Not Supported
R-squared measures the proportion of variance in each endogenous construct explained by the exogenous constructs. Values >= 0.67 are strong, >= 0.33 moderate, and below 0.33 weak.
if (!lavInspect(fit_SEM, "converged")) {
cat("R-squared cannot be computed -- model did not converge.\n")
} else {
r2_values <- lavInspect(fit_SEM, "r2")
cat(" Coefficient of Determination (R-squared) \n\n")
r2_df <- data.frame(
Endogenous_Construct = names(r2_values),
R2 = round(r2_values, 4),
Interpretation = ifelse(r2_values >= 0.67, "Strong",
ifelse(r2_values >= 0.33, "Moderate", "Weak"))
)
print(r2_df, row.names = FALSE)
}
## Coefficient of Determination (R-squared)
##
## Endogenous_Construct R2 Interpretation
## MonthlyIncome 0.9360 Strong
## JobLevel 0.9633 Strong
## TotalWorkingYears 0.6415 Moderate
## PercentSalaryHike NA <NA>
## PerformanceRating 0.0588 Weak
## YearsAtCompany 0.8855 Strong
## YearsInCurrentRole 0.6679 Moderate
## YearsWithCurrManager 0.6734 Strong
## YearsSinceLastPromotion 0.1956 Weak
## NumCompaniesWorked 0.0071 Weak
## OverTime 0.0002 Weak
## TenureStability 0.2993 Weak
## AttritionRisk NA <NA>
If the model fit does not meet the required criteria, Modification Indices (MI) can guide re-specification. A high MI value (above 10) suggests a meaningful potential improvement. Re-specification should only be performed when supported by theoretical justification.
if (!lavInspect(fit_SEM, "converged")) {
cat("Modification indices cannot be computed -- model did not converge.\n")
} else {
mi <- modindices(fit_SEM, sort. = TRUE, maximum.number = 15)
cat(" Top 15 Modification Indices \n\n")
print(mi[, c("lhs", "op", "rhs", "mi", "epc")], row.names = FALSE)
cat("\nNote: Re-specification is only recommended when theoretically justified.\n")
}
## Top 15 Modification Indices
##
## lhs op rhs mi epc
## MonthlyIncome ~~ JobLevel 204.872 0.441
## TenureStability =~ TotalWorkingYears 198.851 0.303
## AttritionRisk =~ TotalWorkingYears 189.844 0.373
## CareerLevel =~ NumCompaniesWorked 96.892 0.321
## CareerLevel =~ YearsSinceLastPromotion 96.396 1.658
## TenureStability =~ YearsSinceLastPromotion 94.347 29.508
## TotalWorkingYears ~~ NumCompaniesWorked 86.928 0.147
## CareerLevel =~ YearsAtCompany 77.300 0.181
## TotalWorkingYears ~~ YearsAtCompany 66.918 0.066
## YearsInCurrentRole ~~ YearsWithCurrManager 61.685 0.104
## YearsAtCompany ~~ YearsInCurrentRole 41.415 -0.104
## TenureStability =~ PercentSalaryHike 40.772 -23.145
## TenureStability =~ PerformanceRating 40.680 1.755
## CareerLevel =~ YearsWithCurrManager 37.851 -0.133
## TenureStability =~ NumCompaniesWorked 28.860 1.713
##
## Note: Re-specification is only recommended when theoretically justified.
Based on the complete CB-SEM analysis carried out in this study, the following conclusions are drawn:
Assumption Testing – The three key assumptions were verified prior to further analysis: multivariate normality was assessed using Mardia’s Test, data adequacy was confirmed via KMO Overall MSA (> 0.5), and multicollinearity was examined using VIF (threshold < 3.3). All variables included in the final model passed the VIF requirement, as indicators with VIF >= 3.3 were excluded during the construct design phase.
Construct Design – Constructs were designed using a data-driven approach based on empirical inter-indicator correlation patterns. Only indicator groups with mutual correlations above 0.30 were retained, resulting in four constructs: Career Level (MonthlyIncome, JobLevel, TotalWorkingYears), Performance and Reward (PercentSalaryHike, PerformanceRating), Tenure Stability (YearsAtCompany, YearsInCurrentRole, YearsWithCurrManager), and Attrition Risk (YearsSinceLastPromotion, NumCompaniesWorked, OverTime).
First-Order CFA – CFA was conducted separately for each construct to confirm that each indicator validly measures its respective latent variable. Standardized factor loadings were evaluated against the 0.5 threshold. Note that constructs with exactly 3 indicators are just-identified (df = 0), so individual model fit was not computed and was instead evaluated at the combined stage.
Combined Measurement Model (Second-Order CFA) – All constructs were estimated simultaneously to evaluate convergent validity (AVE >= 0.5), construct reliability (CR >= 0.7 and Cronbach’s Alpha >= 0.7), and discriminant validity using both the Fornell-Larcker Criterion and the HTMT ratio. The combined CFA model did not fully converge, likely due to the PerfReward construct having only two indicators, which creates numerical instability in the joint estimation.
Full SEM and Hypothesis Testing – The full structural model integrated the outer model (measurement) with the inner model (structural paths) and was estimated using the MLR estimator. Four hypotheses were tested: (H1) Career Level has a significant positive effect on Tenure Stability; (H2) Performance and Reward has a significant positive effect on Tenure Stability; (H3) Tenure Stability has a significant negative effect on Attrition Risk; and (H4) Career Level has a significant negative effect on Attrition Risk. The support or rejection of each hypothesis was determined based on standardized path coefficients and p-values at alpha = 0.05.
Methodological Approach – This study applied CB-SEM (Covariance-Based SEM) with the MLR (Maximum Likelihood Robust) estimator using the lavaan package in R. This approach is consistent with the reflective nature of all constructs, the presence of non-normal data (confirmed by Mardia’s Test), and the theory-confirmatory objective of the analysis.
Study Limitations – Several limitations were identified in this study: