The following report evaluates three logistic regression models developed to predict employee attrition. The analysis transitions from a baseline model (Monthly Income) to a complex model utilizing all available employee data.
The analyst utilized a dataset named testa.csv. The data
was partitioned using a 70/30 split (Training
vs. Testing), ensuring that the internal ratios of the target variable,
Attrition, were preserved across both sets.
The models were estimated using the training set and validated against the hold-out test set (n = 4,471). A probability threshold of 0.5 was applied to classify outcomes.
The initial model attempted to predict attrition solely based on
MonthlyIncome.
## Reference
## Prediction Stayed Left
## Stayed 2361 2110
## Left 0 0
The second iteration added Overtime as a binary
predictor.
## Reference
## Prediction Stayed Left
## Stayed 1657 1362
## Left 704 748
The final model utilized all available features, including role, satisfaction levels, and tenure.
## Reference
## Prediction Stayed Left
## Stayed 1795 540
## Left 566 1570
Based on the results generated by the analyst: 1. Complexity Matters: Attrition is not a product of a single factor like salary; it is multi-dimensional. 2. Focus on Overtime: Overtime emerged as a key inflection point for predictive accuracy. 3. Deployment: Model 3 is recommended for internal HR auditing to identify “at-risk” employees during quarterly reviews.