Executive Summary

The following report evaluates three logistic regression models developed to predict employee attrition. The analysis transitions from a baseline model (Monthly Income) to a complex model utilizing all available employee data.

1. Data Methodology

The analyst utilized a dataset named testa.csv. The data was partitioned using a 70/30 split (Training vs. Testing), ensuring that the internal ratios of the target variable, Attrition, were preserved across both sets.

2. Model Performance Evaluation

The models were estimated using the training set and validated against the hold-out test set (n = 4,471). A probability threshold of 0.5 was applied to classify outcomes.

Model 1: Baseline (Income Only)

The initial model attempted to predict attrition solely based on MonthlyIncome.

##           Reference
## Prediction Stayed Left
##     Stayed   2361 2110
##     Left        0    0
  • Observation: This model exhibits zero predictive power for attrition. It classifies all employees as “Stayed,” likely because the variation in income alone does not cross the statistical threshold required to predict a departure.

Model 2: Expanded (Income + Overtime)

The second iteration added Overtime as a binary predictor.

##           Reference
## Prediction Stayed Left
##     Stayed   1657 1362
##     Left      704  748
  • Observation: The inclusion of work-life balance indicators (Overtime) significantly improved the model. It correctly identified 748 departures. However, the high number of False Negatives (1,362) suggests the model remains underfitted.

Model 3: Full Predictive Model (All Variables)

The final model utilized all available features, including role, satisfaction levels, and tenure.

##           Reference
## Prediction Stayed Left
##     Stayed   1795  540
##     Left      566 1570
  • Observation: This model is the most robust. It correctly identified 1,570 employees who left while maintaining a high accuracy for those who stayed (1,795). The balance between Sensitivity and Specificity is significantly optimized here compared to the previous iterations.

3. Final Recommendations

Based on the results generated by the analyst: 1. Complexity Matters: Attrition is not a product of a single factor like salary; it is multi-dimensional. 2. Focus on Overtime: Overtime emerged as a key inflection point for predictive accuracy. 3. Deployment: Model 3 is recommended for internal HR auditing to identify “at-risk” employees during quarterly reviews.