Employee Turnover

Makayla Maroney and Annetta Allen

4/26/2020

High employee turnover is unhealthy for an organization

Source:
De Leon, L. (2019, September 20). The Costs and Trends of Employee Turnover - Part 1 | Employers Resource. Employers Resource. https://www.employersresource.com/blog/the-costs-and-trends-of-employee-turnover-part-1/

To evaluate employees that are a potential threat to companies we manipulated the following variables

The team evaluated these variables by applying the following models

Logistic Regression Model

Variables for Model

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: left
## 
## Terms added sequentially (first to last)
## 
## 
##                      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                 10512      41983              
## satisfaction_level    1   3572.8     10511      38411 < 2.2e-16 ***
## number_project        1     26.4     10510      38384 2.749e-07 ***
## average_montly_hours  1    694.1     10509      37690 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call:  glm(formula = left ~ satisfaction_level + number_project + average_montly_hours, 
##     family = binomial(link = "logit"), data = training, weights = time_spend_company)
## 
## Coefficients:
##          (Intercept)    satisfaction_level        number_project  
##             -0.62071              -2.82125              -0.09198  
## average_montly_hours  
##              0.00734  
## 
## Degrees of Freedom: 10512 Total (i.e. Null);  10509 Residual
## Null Deviance:       41980 
## Residual Deviance: 37690     AIC: 37700
## function (object, ...) 
## UseMethod("anova")
## <bytecode: 0x7f9ded467a90>
## <environment: namespace:stats>

Statistics of Model

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.04552 0.15030 0.21606 0.25197 0.30221 0.69723

Cutoff Number

## [1] 1109

Cross Table

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  4486 
## 
##  
##                 | validation$predicted_left01 
## validation$left |         0 |         1 | Row Total | 
## ----------------|-----------|-----------|-----------|
##               0 |      2553 |       827 |      3380 | 
##                 |    29.177 |    61.655 |           | 
##                 |     0.755 |     0.245 |     0.753 | 
##                 |     0.838 |     0.574 |           | 
##                 |     0.569 |     0.184 |           | 
## ----------------|-----------|-----------|-----------|
##               1 |       492 |       614 |      1106 | 
##                 |    89.168 |   188.421 |           | 
##                 |     0.445 |     0.555 |     0.247 | 
##                 |     0.162 |     0.426 |           | 
##                 |     0.110 |     0.137 |           | 
## ----------------|-----------|-----------|-----------|
##    Column Total |      3045 |      1441 |      4486 | 
##                 |     0.679 |     0.321 |           | 
## ----------------|-----------|-----------|-----------|
## 
## 

Evaluation of Logistic Regression Model:

CART Model

Classification And Regression Tree Model

Original tree

Importance of Variables

##                      ct1.variable.importance
## satisfaction_level               2175.241356
## number_project                   1084.322610
## last_evaluation                  1038.331747
## average_montly_hours             1025.296189
## time_spend_company                741.219323
## Work_accident                      37.789442
## sales                               1.575106

Specified Tree

Cross Table

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  4486 
## 
##  
##                 | validation$left_predicted 
## validation$left |         0 |         1 | Row Total | 
## ----------------|-----------|-----------|-----------|
##               0 |      3306 |        74 |      3380 | 
##                 |   222.805 |   686.660 |           | 
##                 |     0.978 |     0.022 |     0.753 | 
##                 |     0.976 |     0.067 |           | 
##                 |     0.737 |     0.016 |           | 
## ----------------|-----------|-----------|-----------|
##               1 |        81 |      1025 |      1106 | 
##                 |   680.904 |  2098.474 |           | 
##                 |     0.073 |     0.927 |     0.247 | 
##                 |     0.024 |     0.933 |           | 
##                 |     0.018 |     0.228 |           | 
## ----------------|-----------|-----------|-----------|
##    Column Total |      3387 |      1099 |      4486 | 
##                 |     0.755 |     0.245 |           | 
## ----------------|-----------|-----------|-----------|
## 
## 

Evaluation of CART Model:

Random Forest Model

Importance of Variables

##                                  0           1 MeanDecreaseAccuracy
## satisfaction_level    5.446333e-02 0.616412496         0.1900664705
## last_evaluation       3.381621e-03 0.435970239         0.1077605458
## number_project        1.695594e-02 0.445154688         0.1202630587
## average_montly_hours  1.751295e-02 0.389145079         0.1071978776
## time_spend_company    1.287218e-02 0.358765420         0.0963319358
## Work_accident         3.931738e-04 0.006996927         0.0019856997
## promotion_last_5years 4.891124e-05 0.001022939         0.0002838257
## sales                 1.353753e-03 0.017837940         0.0053317769
## salary                1.161440e-03 0.012340326         0.0038593896
##                       MeanDecreaseGini
## satisfaction_level         1325.776180
## last_evaluation             454.402929
## number_project              668.172244
## average_montly_hours        557.907961
## time_spend_company          700.415139
## Work_accident                21.247696
## promotion_last_5years         3.214526
## sales                        63.180888
## salary                       31.689225

Cross Table

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  4498 
## 
##  
##                  | validation2$predicted_left 
## validation2$left |         0 |         1 | Row Total | 
## -----------------|-----------|-----------|-----------|
##                0 |      3462 |         1 |      3463 | 
##                  |   224.076 |   774.828 |           | 
##                  |     1.000 |     0.000 |     0.770 | 
##                  |     0.992 |     0.001 |           | 
##                  |     0.770 |     0.000 |           | 
## -----------------|-----------|-----------|-----------|
##                1 |        27 |      1008 |      1035 | 
##                  |   749.735 |  2592.492 |           | 
##                  |     0.026 |     0.974 |     0.230 | 
##                  |     0.008 |     0.999 |           | 
##                  |     0.006 |     0.224 |           | 
## -----------------|-----------|-----------|-----------|
##     Column Total |      3489 |      1009 |      4498 | 
##                  |     0.776 |     0.224 |           | 
## -----------------|-----------|-----------|-----------|
## 
## 

Evaluation of Random Forest Model:

Conclusion

Estimated Accuracy of Models:

Logistic Regression CART Random Forest
Predicted to leave, actually left 70% 98% 99%
Predicted to stay, actually left 30% 2% 1%
Predicted to leave, actually stayed 45% 10% 5%
Predicted to stay, actually stayed 55% 90% 95%

As can be seen in the table above, the Random Forest Model most accurately predicts employees that are going to leave and stay at a company.

The model can definitively predict employees that are going to stay about 95% of the time and employees that are going to leave about 99% of the time.

To prepare and plan appropriately, companies should use the Random Forest Model as it will prepare HR for an adequate estimation of yearly turnover.