Makayla Maroney and Annetta Allen
4/26/2020
A study by Employee Benefits News found that the average cost of losing an employee is 33% of their annual salary. And in 2016, the SHRM Benchmarking Report found that the average cost-per-hire is $4,129.
Therefore, our goal is to accurately predict employees that are planning to leave so that companies can prepare and plan appropriately.
Source:
De Leon, L. (2019, September 20). The Costs and Trends of Employee Turnover - Part 1 | Employers Resource. Employers Resource. https://www.employersresource.com/blog/the-costs-and-trends-of-employee-turnover-part-1/
To evaluate employees that are a potential threat to companies we manipulated the following variables
The team evaluated these variables by applying the following models
Variables for Model
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: left
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 10512 41983
## satisfaction_level 1 3572.8 10511 38411 < 2.2e-16 ***
## number_project 1 26.4 10510 38384 2.749e-07 ***
## average_montly_hours 1 694.1 10509 37690 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: glm(formula = left ~ satisfaction_level + number_project + average_montly_hours,
## family = binomial(link = "logit"), data = training, weights = time_spend_company)
##
## Coefficients:
## (Intercept) satisfaction_level number_project
## -0.62071 -2.82125 -0.09198
## average_montly_hours
## 0.00734
##
## Degrees of Freedom: 10512 Total (i.e. Null); 10509 Residual
## Null Deviance: 41980
## Residual Deviance: 37690 AIC: 37700
## function (object, ...)
## UseMethod("anova")
## <bytecode: 0x7f9ded467a90>
## <environment: namespace:stats>
Statistics of Model
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.04552 0.15030 0.21606 0.25197 0.30221 0.69723
Cutoff Number
## [1] 1109
Cross Table
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 4486
##
##
## | validation$predicted_left01
## validation$left | 0 | 1 | Row Total |
## ----------------|-----------|-----------|-----------|
## 0 | 2553 | 827 | 3380 |
## | 29.177 | 61.655 | |
## | 0.755 | 0.245 | 0.753 |
## | 0.838 | 0.574 | |
## | 0.569 | 0.184 | |
## ----------------|-----------|-----------|-----------|
## 1 | 492 | 614 | 1106 |
## | 89.168 | 188.421 | |
## | 0.445 | 0.555 | 0.247 |
## | 0.162 | 0.426 | |
## | 0.110 | 0.137 | |
## ----------------|-----------|-----------|-----------|
## Column Total | 3045 | 1441 | 4486 |
## | 0.679 | 0.321 | |
## ----------------|-----------|-----------|-----------|
##
##
Evaluation of Logistic Regression Model:
This is a 70/30 model that is not precise enough to use as it is accurate around 70% of the time.
This model can not definitively predict the next move of employees. It can accurately predict who is going to leave about 70% of the time and who is going to stay about 55% of the time.
This is not a reliable model given that it results in almost as many errors as it does in correct evaluations of individuals planning on staying with the company.
Classification And Regression Tree Model
Original tree
Importance of Variables
## ct1.variable.importance
## satisfaction_level 2175.241356
## number_project 1084.322610
## last_evaluation 1038.331747
## average_montly_hours 1025.296189
## time_spend_company 741.219323
## Work_accident 37.789442
## sales 1.575106
Specified Tree
Cross Table
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 4486
##
##
## | validation$left_predicted
## validation$left | 0 | 1 | Row Total |
## ----------------|-----------|-----------|-----------|
## 0 | 3306 | 74 | 3380 |
## | 222.805 | 686.660 | |
## | 0.978 | 0.022 | 0.753 |
## | 0.976 | 0.067 | |
## | 0.737 | 0.016 | |
## ----------------|-----------|-----------|-----------|
## 1 | 81 | 1025 | 1106 |
## | 680.904 | 2098.474 | |
## | 0.073 | 0.927 | 0.247 |
## | 0.024 | 0.933 | |
## | 0.018 | 0.228 | |
## ----------------|-----------|-----------|-----------|
## Column Total | 3387 | 1099 | 4486 |
## | 0.755 | 0.245 | |
## ----------------|-----------|-----------|-----------|
##
##
Evaluation of CART Model:
This model provides a company with a visual representation of significant variables along with the level of each variable that causes an employee to leave.
This model is accurate around 95% of the time.
The proportions of this model leads to a semi-reliable prediction of employees that are going to leave. It can accurately predict who is going to leave about 98% of the time and who is going to stay about 90% of the time.
Importance of Variables
## 0 1 MeanDecreaseAccuracy
## satisfaction_level 5.446333e-02 0.616412496 0.1900664705
## last_evaluation 3.381621e-03 0.435970239 0.1077605458
## number_project 1.695594e-02 0.445154688 0.1202630587
## average_montly_hours 1.751295e-02 0.389145079 0.1071978776
## time_spend_company 1.287218e-02 0.358765420 0.0963319358
## Work_accident 3.931738e-04 0.006996927 0.0019856997
## promotion_last_5years 4.891124e-05 0.001022939 0.0002838257
## sales 1.353753e-03 0.017837940 0.0053317769
## salary 1.161440e-03 0.012340326 0.0038593896
## MeanDecreaseGini
## satisfaction_level 1325.776180
## last_evaluation 454.402929
## number_project 668.172244
## average_montly_hours 557.907961
## time_spend_company 700.415139
## Work_accident 21.247696
## promotion_last_5years 3.214526
## sales 63.180888
## salary 31.689225
Cross Table
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 4498
##
##
## | validation2$predicted_left
## validation2$left | 0 | 1 | Row Total |
## -----------------|-----------|-----------|-----------|
## 0 | 3462 | 1 | 3463 |
## | 224.076 | 774.828 | |
## | 1.000 | 0.000 | 0.770 |
## | 0.992 | 0.001 | |
## | 0.770 | 0.000 | |
## -----------------|-----------|-----------|-----------|
## 1 | 27 | 1008 | 1035 |
## | 749.735 | 2592.492 | |
## | 0.026 | 0.974 | 0.230 |
## | 0.008 | 0.999 | |
## | 0.006 | 0.224 | |
## -----------------|-----------|-----------|-----------|
## Column Total | 3489 | 1009 | 4498 |
## | 0.776 | 0.224 | |
## -----------------|-----------|-----------|-----------|
##
##
Evaluation of Random Forest Model:
This model is accurate about 98% of the time which makes it highly reliable.
The proportions of the model are also dependable given that it can definitively predicted employees that are going to leave about 99% of the time and employees that are going to stay about 95% of the time.
Estimated Accuracy of Models:
| Logistic Regression | CART | Random Forest | |
|---|---|---|---|
| Predicted to leave, actually left | 70% | 98% | 99% |
| Predicted to stay, actually left | 30% | 2% | 1% |
| Predicted to leave, actually stayed | 45% | 10% | 5% |
| Predicted to stay, actually stayed | 55% | 90% | 95% |
As can be seen in the table above, the Random Forest Model most accurately predicts employees that are going to leave and stay at a company.
The model can definitively predict employees that are going to stay about 95% of the time and employees that are going to leave about 99% of the time.
To prepare and plan appropriately, companies should use the Random Forest Model as it will prepare HR for an adequate estimation of yearly turnover.