ACKNOWELDGEMENT & DISCLAIMER



ACKNOWLDGEMENT


* The dashboard is for educational purpose only and it should be used for understanding the survival analysis technique. The outputs and analysis displayed in the flexdashboard should not strictly be considered for medical related advice or judgement.


DISCLAIMER

* > Authors don’t bear any responsibility for any consequences emnating from the content covered in the dashboard, the web url and links.

What is Employee Attrition?



Employee Attrition occurs when workforce of organization diminishes gradually over the period of time for many reasons like retirement, resignation for professional or personal reason.

LinkedIn’s 2018 talent turnover report states about attritions in various sectors. Readers interested in this report, they can visit LinkedIn site. According to 2019 Retention Report by Work Institute, the attrition rate will be as high as 35% by 2023. It also points out that only job satisfaction can not keep the employees loyal. Career development, work-life balance and supervisor behavior are the key reasons for leaving the organization.

The challenge in HR retention is not about creating an umbrella HR Planning but offering the sparsely different HR models which are highly customizable at micro level or individual level. HR manager needs an predictive model which can predict the propensity of attrition at individual level so that he can plan in way before employee actually walks out.But how to do that as HR manager is not an prediction expert and they come from no mathematics modeling background, generally.

This flexdashboard is an attempt to address this problem. Let’s begin, so click on next menu item on “MENU” tab.

Overview of Dataset

Row

Dataset Table

Variables in the Dataset

 [1] "Age"                      "Attrition"               
 [3] "BusinessTravel"           "DailyRate"               
 [5] "Department"               "DistanceFromHome"        
 [7] "Education"                "EducationField"          
 [9] "EmployeeCount"            "EmployeeNumber"          
[11] "EnvironmentSatisfaction"  "Gender"                  
[13] "HourlyRate"               "JobInvolvement"          
[15] "JobLevel"                 "JobRole"                 
[17] "JobSatisfaction"          "MaritalStatus"           
[19] "MonthlyIncome"            "MonthlyRate"             
[21] "NumCompaniesWorked"       "Over18"                  
[23] "OverTime"                 "PercentSalaryHike"       
[25] "PerformanceRating"        "RelationshipSatisfaction"
[27] "StandardHours"            "StockOptionLevel"        
[29] "TotalWorkingYears"        "TrainingTimesLastYear"   
[31] "WorkLifeBalance"          "YearsAtCompany"          
[33] "YearsInCurrentRole"       "YearsSinceLastPromotion" 
[35] "YearsWithCurrManager"    

Structure of the Dataset

'data.frame':   1470 obs. of  35 variables:
 $ Age                     : int  41 49 37 33 27 32 59 30 38 36 ...
 $ Attrition               : chr  "Yes" "No" "Yes" "No" ...
 $ BusinessTravel          : chr  "Travel_Rarely" "Travel_Frequently" "Travel_Rarely" "Travel_Frequently" ...
 $ DailyRate               : int  1102 279 1373 1392 591 1005 1324 1358 216 1299 ...
 $ Department              : chr  "Sales" "Research & Development" "Research & Development" "Research & Development" ...
 $ DistanceFromHome        : int  1 8 2 3 2 2 3 24 23 27 ...
 $ Education               : int  2 1 2 4 1 2 3 1 3 3 ...
 $ EducationField          : chr  "Life Sciences" "Life Sciences" "Other" "Life Sciences" ...
 $ EmployeeCount           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ EmployeeNumber          : int  1 2 4 5 7 8 10 11 12 13 ...
 $ EnvironmentSatisfaction : int  2 3 4 4 1 4 3 4 4 3 ...
 $ Gender                  : chr  "Female" "Male" "Male" "Female" ...
 $ HourlyRate              : int  94 61 92 56 40 79 81 67 44 94 ...
 $ JobInvolvement          : int  3 2 2 3 3 3 4 3 2 3 ...
 $ JobLevel                : int  2 2 1 1 1 1 1 1 3 2 ...
 $ JobRole                 : chr  "Sales Executive" "Research Scientist" "Laboratory Technician" "Research Scientist" ...
 $ JobSatisfaction         : int  4 2 3 3 2 4 1 3 3 3 ...
 $ MaritalStatus           : chr  "Single" "Married" "Single" "Married" ...
 $ MonthlyIncome           : int  5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
 $ MonthlyRate             : int  19479 24907 2396 23159 16632 11864 9964 13335 8787 16577 ...
 $ NumCompaniesWorked      : int  8 1 6 1 9 0 4 1 0 6 ...
 $ Over18                  : chr  "Y" "Y" "Y" "Y" ...
 $ OverTime                : chr  "Yes" "No" "Yes" "Yes" ...
 $ PercentSalaryHike       : int  11 23 15 11 12 13 20 22 21 13 ...
 $ PerformanceRating       : int  3 4 3 3 3 3 4 4 4 3 ...
 $ RelationshipSatisfaction: int  1 4 2 3 4 3 1 2 2 2 ...
 $ StandardHours           : int  80 80 80 80 80 80 80 80 80 80 ...
 $ StockOptionLevel        : int  0 1 0 0 1 0 3 1 0 2 ...
 $ TotalWorkingYears       : int  8 10 7 8 6 8 12 1 10 17 ...
 $ TrainingTimesLastYear   : int  0 3 3 3 3 2 3 2 2 3 ...
 $ WorkLifeBalance         : int  1 3 3 3 3 2 2 3 3 2 ...
 $ YearsAtCompany          : int  6 10 0 8 2 7 1 1 9 7 ...
 $ YearsInCurrentRole      : int  4 7 0 7 2 7 0 0 7 7 ...
 $ YearsSinceLastPromotion : int  0 1 0 3 2 3 0 0 1 7 ...
 $ YearsWithCurrManager    : int  5 7 0 0 2 6 0 0 8 7 ...
NULL

Summary Statistics of the Dataset

      Age         Attrition         BusinessTravel       DailyRate     
 Min.   :18.00   Length:1470        Length:1470        Min.   : 102.0  
 1st Qu.:30.00   Class :character   Class :character   1st Qu.: 465.0  
 Median :36.00   Mode  :character   Mode  :character   Median : 802.0  
 Mean   :36.92                                         Mean   : 802.5  
 3rd Qu.:43.00                                         3rd Qu.:1157.0  
 Max.   :60.00                                         Max.   :1499.0  
  Department        DistanceFromHome   Education     EducationField    
 Length:1470        Min.   : 1.000   Min.   :1.000   Length:1470       
 Class :character   1st Qu.: 2.000   1st Qu.:2.000   Class :character  
 Mode  :character   Median : 7.000   Median :3.000   Mode  :character  
                    Mean   : 9.193   Mean   :2.913                     
                    3rd Qu.:14.000   3rd Qu.:4.000                     
                    Max.   :29.000   Max.   :5.000                     
 EmployeeCount EmployeeNumber   EnvironmentSatisfaction    Gender         
 Min.   :1     Min.   :   1.0   Min.   :1.000           Length:1470       
 1st Qu.:1     1st Qu.: 491.2   1st Qu.:2.000           Class :character  
 Median :1     Median :1020.5   Median :3.000           Mode  :character  
 Mean   :1     Mean   :1024.9   Mean   :2.722                             
 3rd Qu.:1     3rd Qu.:1555.8   3rd Qu.:4.000                             
 Max.   :1     Max.   :2068.0   Max.   :4.000                             
   HourlyRate     JobInvolvement    JobLevel       JobRole         
 Min.   : 30.00   Min.   :1.00   Min.   :1.000   Length:1470       
 1st Qu.: 48.00   1st Qu.:2.00   1st Qu.:1.000   Class :character  
 Median : 66.00   Median :3.00   Median :2.000   Mode  :character  
 Mean   : 65.89   Mean   :2.73   Mean   :2.064                     
 3rd Qu.: 83.75   3rd Qu.:3.00   3rd Qu.:3.000                     
 Max.   :100.00   Max.   :4.00   Max.   :5.000                     
 JobSatisfaction MaritalStatus      MonthlyIncome    MonthlyRate   
 Min.   :1.000   Length:1470        Min.   : 1009   Min.   : 2094  
 1st Qu.:2.000   Class :character   1st Qu.: 2911   1st Qu.: 8047  
 Median :3.000   Mode  :character   Median : 4919   Median :14236  
 Mean   :2.729                      Mean   : 6503   Mean   :14313  
 3rd Qu.:4.000                      3rd Qu.: 8379   3rd Qu.:20462  
 Max.   :4.000                      Max.   :19999   Max.   :26999  
 NumCompaniesWorked    Over18            OverTime         PercentSalaryHike
 Min.   :0.000      Length:1470        Length:1470        Min.   :11.00    
 1st Qu.:1.000      Class :character   Class :character   1st Qu.:12.00    
 Median :2.000      Mode  :character   Mode  :character   Median :14.00    
 Mean   :2.693                                            Mean   :15.21    
 3rd Qu.:4.000                                            3rd Qu.:18.00    
 Max.   :9.000                                            Max.   :25.00    
 PerformanceRating RelationshipSatisfaction StandardHours StockOptionLevel
 Min.   :3.000     Min.   :1.000            Min.   :80    Min.   :0.0000  
 1st Qu.:3.000     1st Qu.:2.000            1st Qu.:80    1st Qu.:0.0000  
 Median :3.000     Median :3.000            Median :80    Median :1.0000  
 Mean   :3.154     Mean   :2.712            Mean   :80    Mean   :0.7939  
 3rd Qu.:3.000     3rd Qu.:4.000            3rd Qu.:80    3rd Qu.:1.0000  
 Max.   :4.000     Max.   :4.000            Max.   :80    Max.   :3.0000  
 TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany  
 Min.   : 0.00     Min.   :0.000         Min.   :1.000   Min.   : 0.000  
 1st Qu.: 6.00     1st Qu.:2.000         1st Qu.:2.000   1st Qu.: 3.000  
 Median :10.00     Median :3.000         Median :3.000   Median : 5.000  
 Mean   :11.28     Mean   :2.799         Mean   :2.761   Mean   : 7.008  
 3rd Qu.:15.00     3rd Qu.:3.000         3rd Qu.:3.000   3rd Qu.: 9.000  
 Max.   :40.00     Max.   :6.000         Max.   :4.000   Max.   :40.000  
 YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
 Min.   : 0.000     Min.   : 0.000          Min.   : 0.000      
 1st Qu.: 2.000     1st Qu.: 0.000          1st Qu.: 2.000      
 Median : 3.000     Median : 1.000          Median : 3.000      
 Mean   : 4.229     Mean   : 2.188          Mean   : 4.123      
 3rd Qu.: 7.000     3rd Qu.: 3.000          3rd Qu.: 7.000      
 Max.   :18.000     Max.   :15.000          Max.   :17.000      

Corrplot of Numeric Variables

Data Manipulation

Row

Data Manipulation

attr$EmployeeCount=NULL  # Removing variable because of constant value
attr$EmployeeNumber=NULL # Removing variable because it is simply a serial number
attr$StandardHours=NULL  # Removing variable because of constant value

# Converting Attrition character into factor

attr$Attrition=as.factor(attr$Attrition)

# Attrition=data.frame(Attrition)

# Converting BusinessTravel into factor variable

attr$BusinessTravel=as.factor(attr$BusinessTravel)

# Creating HighRate variable based on DailyRate

attr$HighRate=as.factor(ifelse(attr$DailyRate>750,"HighRate","LowRate"))

# Converting Department Variable into factor vairable

attr$Department=as.factor(attr$Department)

# Coverting Education into factor variable

attr$Education=as.factor(attr$Education)

# Converting EducationField variable into factor variable

attr$EducationField=as.factor(attr$EducationField)

# Converting Gender into factor variable 

attr$Gender=as.factor(attr$Gender)

# Changing JobRole into factor variable

attr$JobRole=as.factor(attr$JobRole)

# Changing Marital Status into Factor Variable

attr$MaritalStatus=as.factor(attr$MaritalStatus)

# Removing Over18 Variable

attr$Over18=NULL

# Changing OverTime variable into factor variable

attr$OverTime=as.factor(attr$OverTime)

# Changing Attrition variable into numeric variable

Attrition=ifelse(attr$Attrition=="Yes",1,0)

attr$status=Attrition

attr$Attrition=NULL

time=attr$YearsAtCompany

attr$time=time

attr$YearsAtCompany=NULL

attr$BusinessTravel=as.factor(attr$BusinessTravel)

attr$Department=as.factor(attr$Department)

attr$EducationField=as.factor(attr$EducationField)

attr$Gender=as.factor(attr$Gender)

attr$JobRole=as.factor(attr$JobRole)

attr$MaritalStatus=as.factor(attr$MaritalStatus)

attr$OverTime=as.factor(attr$OverTime)

attr$Over18=NULL

Data Table after Data Manipulation

Survival Analysis: Explainer

Row

WHAT IS SURVIVAL ANALYSIS
Survival Analysis is a branch of statistics for analyzing the expected duration of time until one or more events happen, such as death in biological organisms and failure in mechanical systems.

This topic is called reliability theory or reliability analysis in engineering, duration analysis or duration modelling in economics, and event history analysis in sociology. Survival analysis attempts to answer questions such as: what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?

More generally, survival analysis involves the modelling of time to event data; in this context, death or failure is considered an “event” in the survival analysis literature – traditionally only a single event occurs for each subject, after which the organism or mechanism is dead or broken. Recurring event or repeated event models relax that assumption. The study of recurring events is relevant in systems reliability, and in many areas of social sciences and medical research. Survival Analysis can also be used in non traditional fields like predicting the employee attrition, predicting the customer churn, predictive maintenance, earthquake prediction and many other.

The case study in focus is Estimating Employee Attrition based on Survival Analysis Techniques. In this flexdashbaord, we are going to use Kaplan Meier Model, Cox Proportional Hazard Model, Random Forest Model and Weibull’s Model.

Source:https://en.wikipedia.org/wiki/Survival_analysis

The dataset used in this Flexdashboard can be accessed from the following link—

https://www.kaggle.com/patelprashant/employee-attrition

Survival Analysis: Employee Attrition

Row

Kaplan Meier Model

Call: survfit(formula = Surv(time, status) ~ 1, data = attr)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0   1470      16    0.989 0.00271        0.984        0.994
    1   1426      59    0.948 0.00583        0.937        0.960
    2   1255      27    0.928 0.00690        0.914        0.941
    3   1128      20    0.911 0.00769        0.896        0.927
    4   1000      19    0.894 0.00851        0.877        0.911
    5    890      21    0.873 0.00947        0.855        0.892
    6    694       9    0.862 0.01007        0.842        0.882
    7    618      11    0.846 0.01091        0.825        0.868
    8    528       9    0.832 0.01173        0.809        0.855
    9    448       8    0.817 0.01264        0.793        0.842
   10    366      18    0.777 0.01516        0.748        0.807
   11    246       2    0.770 0.01568        0.740        0.802
   13    200       2    0.763 0.01644        0.731        0.796
   14    176       2    0.754 0.01736        0.721        0.789
   15    158       1    0.749 0.01789        0.715        0.785
   16    138       1    0.744 0.01857        0.708        0.781
   17    126       1    0.738 0.01934        0.701        0.777
   18    117       1    0.732 0.02018        0.693        0.772
   19    104       1    0.725 0.02117        0.684        0.767
   20     93       1    0.717 0.02233        0.674        0.762
   21     66       1    0.706 0.02449        0.660        0.756
   22     52       1    0.692 0.02753        0.641        0.749
   23     37       1    0.674 0.03253        0.613        0.741
   24     35       1    0.654 0.03686        0.586        0.731
   31     16       1    0.614 0.05256        0.519        0.726
   32     13       1    0.566 0.06641        0.450        0.713
   33     10       1    0.510 0.08037        0.374        0.694
   40      1       1    0.000     NaN           NA           NA

KM Model Plot

KM Model With Gender Comparison

Call: survfit(formula = Surv(time, status) ~ Gender, data = attr)

                n events median 0.95LCL 0.95UCL
Gender=Female 588     87     32      32      NA
Gender=Male   882    150     NA      31      NA

KM Model Plot with Gender Comparison

KM Model With Performance Rating

Call: survfit(formula = Surv(time, status) ~ PerformanceRating, data = attr)

                       n events median 0.95LCL 0.95UCL
PerformanceRating=3 1244    200     33      32      NA
PerformanceRating=4  226     37     NA      NA      NA

KM Model Plot with Performance Rating

KM Model With OverTime

Call: survfit(formula = Surv(time, status) ~ OverTime, data = attr)

                n events median 0.95LCL 0.95UCL
OverTime=No  1054    110     40      32      NA
OverTime=Yes  416    127     24      16      NA

KM Model Plot with OverTime

KM Model With Marital Status

Call: survfit(formula = Surv(time, status) ~ MaritalStatus, data = attr)

                         n events median 0.95LCL 0.95UCL
MaritalStatus=Divorced 327     33     NA      NA      NA
MaritalStatus=Married  673     84     33      32      NA
MaritalStatus=Single   470    120     24      19      NA

KM Model Plot with Marital Status

KM Model With OverTime

Call: survfit(formula = Surv(time, status) ~ JobRole, data = attr)

                                    n events median 0.95LCL 0.95UCL
JobRole=Healthcare Representative 131      9     40      NA      NA
JobRole=Human Resources            52     12     20      NA      NA
JobRole=Laboratory Technician     259     62     NA      17      NA
JobRole=Manager                   102      5     NA      NA      NA
JobRole=Manufacturing Director    145     10     33      NA      NA
JobRole=Research Director          80      2     NA      31      NA
JobRole=Research Scientist        292     47     NA      NA      NA
JobRole=Sales Executive           326     57     23      19      NA
JobRole=Sales Representative       83     33      5       3      NA

KM Model Plot with Job Role

KM Model With WorkLifeBalance

Call: survfit(formula = Surv(time, status) ~ WorkLifeBalance, data = attr)

                    n events median 0.95LCL 0.95UCL
WorkLifeBalance=1  80     25     14      10      NA
WorkLifeBalance=2 344     58     40      NA      NA
WorkLifeBalance=3 893    127     33      32      NA
WorkLifeBalance=4 153     27     NA      NA      NA

KM Model Plot with WorkLifeBalance

KM Model With BusinessTravel

Call: survfit(formula = Surv(time, status) ~ BusinessTravel, data = attr)

                                    n events median 0.95LCL 0.95UCL
BusinessTravel=Non-Travel         150     12     NA      NA      NA
BusinessTravel=Travel_Frequently  277     69     NA      21      NA
BusinessTravel=Travel_Rarely     1043    156     33      31      NA

KM Model Plot with WorkLifeBalance

KM Model With Job Involvement

Call: survfit(formula = Surv(time, status) ~ JobInvolvement, data = attr)

                   n events median 0.95LCL 0.95UCL
JobInvolvement=1  83     28     23      10      NA
JobInvolvement=2 375     71     NA      33      NA
JobInvolvement=3 868    125     40      31      NA
JobInvolvement=4 144     13     NA      19      NA

KM Model Plot with WorkLifeBalance

KM Model With Job Level

Call: survfit(formula = Surv(time, status) ~ JobRole, data = attr)

                                    n events median 0.95LCL 0.95UCL
JobRole=Healthcare Representative 131      9     40      NA      NA
JobRole=Human Resources            52     12     20      NA      NA
JobRole=Laboratory Technician     259     62     NA      17      NA
JobRole=Manager                   102      5     NA      NA      NA
JobRole=Manufacturing Director    145     10     33      NA      NA
JobRole=Research Director          80      2     NA      31      NA
JobRole=Research Scientist        292     47     NA      NA      NA
JobRole=Sales Executive           326     57     23      19      NA
JobRole=Sales Representative       83     33      5       3      NA

KM Model Plot with Job Level

KM Model With Job Satisfaction

Call: survfit(formula = Surv(time, status) ~ JobSatisfaction, data = attr)

                    n events median 0.95LCL 0.95UCL
JobSatisfaction=1 289     66     NA      20      NA
JobSatisfaction=2 280     46     31      31      NA
JobSatisfaction=3 442     73     NA      23      NA
JobSatisfaction=4 459     52     33      32      NA

KM Model Plot with WorkLifeBalance

COX PH Model Multivariate Analysis

Call:
coxph(formula = Surv(time, status) ~ ., data = attr)

                                       coef  exp(coef)   se(coef)      z
Age                              -1.847e-02  9.817e-01  1.116e-02 -1.655
BusinessTravelTravel_Frequently   1.466e+00  4.332e+00  3.480e-01  4.212
BusinessTravelTravel_Rarely       9.173e-01  2.503e+00  3.277e-01  2.799
DailyRate                         3.353e-04  1.000e+00  3.451e-04  0.972
DepartmentResearch & Development  1.569e+01  6.486e+06  1.648e+03  0.010
DepartmentSales                   1.585e+01  7.662e+06  1.648e+03  0.010
DistanceFromHome                  2.357e-02  1.024e+00  8.307e-03  2.837
Education2                        4.350e-01  1.545e+00  2.580e-01  1.686
Education3                        3.934e-01  1.482e+00  2.267e-01  1.736
Education4                        1.772e-01  1.194e+00  2.538e-01  0.698
Education5                       -2.248e-01  7.987e-01  5.625e-01 -0.400
EducationFieldLife Sciences      -5.309e-01  5.881e-01  6.282e-01 -0.845
EducationFieldMarketing          -3.597e-02  9.647e-01  6.638e-01 -0.054
EducationFieldMedical            -5.633e-01  5.693e-01  6.249e-01 -0.901
EducationFieldOther              -4.182e-01  6.582e-01  6.823e-01 -0.613
EducationFieldTechnical Degree   -6.625e-02  9.359e-01  6.385e-01 -0.104
EnvironmentSatisfaction          -3.022e-01  7.392e-01  6.252e-02 -4.834
GenderMale                        3.783e-01  1.460e+00  1.465e-01  2.582
HourlyRate                        3.078e-03  1.003e+00  3.515e-03  0.876
JobInvolvement                   -2.948e-01  7.447e-01  9.185e-02 -3.209
JobLevel                         -1.070e-01  8.985e-01  2.716e-01 -0.394
JobRoleHuman Resources            1.667e+01  1.744e+07  1.648e+03  0.010
JobRoleLaboratory Technician      1.511e+00  4.530e+00  4.496e-01  3.360
JobRoleManager                    7.133e-01  2.041e+00  7.914e-01  0.901
JobRoleManufacturing Director     6.167e-01  1.853e+00  5.055e-01  1.220
JobRoleResearch Director         -5.654e-01  5.681e-01  9.647e-01 -0.586
JobRoleResearch Scientist         5.766e-01  1.780e+00  4.651e-01  1.240
JobRoleSales Executive            7.812e-01  2.184e+00  1.068e+00  0.732
JobRoleSales Representative       1.687e+00  5.405e+00  1.096e+00  1.540
JobSatisfaction                  -3.635e-01  6.952e-01  6.316e-02 -5.756
MaritalStatusMarried              1.682e-01  1.183e+00  2.247e-01  0.749
MaritalStatusSingle               7.689e-01  2.157e+00  2.799e-01  2.747
MonthlyIncome                    -2.678e-05  1.000e+00  6.661e-05 -0.402
MonthlyRate                       1.474e-05  1.000e+00  9.880e-06  1.492
NumCompaniesWorked                2.484e-01  1.282e+00  2.958e-02  8.397
OverTimeYes                       1.478e+00  4.385e+00  1.457e-01 10.148
PercentSalaryHike                 1.354e-03  1.001e+00  3.072e-02  0.044
PerformanceRating                 4.181e-03  1.004e+00  3.138e-01  0.013
RelationshipSatisfaction         -2.035e-01  8.158e-01  6.612e-02 -3.078
StockOptionLevel                 -1.994e-01  8.192e-01  1.293e-01 -1.542
TotalWorkingYears                -1.490e-01  8.615e-01  3.063e-02 -4.866
TrainingTimesLastYear            -2.022e-01  8.169e-01  6.074e-02 -3.329
WorkLifeBalance                  -1.326e-01  8.758e-01  9.355e-02 -1.418
YearsInCurrentRole               -3.429e-01  7.097e-01  3.520e-02 -9.743
YearsSinceLastPromotion           5.580e-02  1.057e+00  3.294e-02  1.694
YearsWithCurrManager             -3.343e-01  7.158e-01  3.567e-02 -9.374
HighRateLowRate                   3.898e-01  1.477e+00  2.857e-01  1.364
                                        p
Age                              0.097877
BusinessTravelTravel_Frequently  2.53e-05
BusinessTravelTravel_Rarely      0.005127
DailyRate                        0.331244
DepartmentResearch & Development 0.992405
DepartmentSales                  0.992324
DistanceFromHome                 0.004555
Education2                       0.091746
Education3                       0.082626
Education4                       0.485103
Education5                       0.689376
EducationFieldLife Sciences      0.397995
EducationFieldMarketing          0.956789
EducationFieldMedical            0.367387
EducationFieldOther              0.539932
EducationFieldTechnical Degree   0.917352
EnvironmentSatisfaction          1.34e-06
GenderMale                       0.009820
HourlyRate                       0.381277
JobInvolvement                   0.001331
JobLevel                         0.693507
JobRoleHuman Resources           0.991926
JobRoleLaboratory Technician     0.000780
JobRoleManager                   0.367421
JobRoleManufacturing Director    0.222527
JobRoleResearch Director         0.557789
JobRoleResearch Scientist        0.215075
JobRoleSales Executive           0.464374
JobRoleSales Representative      0.123522
JobSatisfaction                  8.63e-09
MaritalStatusMarried             0.454110
MaritalStatusSingle              0.006014
MonthlyIncome                    0.687687
MonthlyRate                      0.135778
NumCompaniesWorked                < 2e-16
OverTimeYes                       < 2e-16
PercentSalaryHike                0.964844
PerformanceRating                0.989370
RelationshipSatisfaction         0.002082
StockOptionLevel                 0.123135
TotalWorkingYears                1.14e-06
TrainingTimesLastYear            0.000871
WorkLifeBalance                  0.156242
YearsInCurrentRole                < 2e-16
YearsSinceLastPromotion          0.090279
YearsWithCurrManager              < 2e-16
HighRateLowRate                  0.172430

Likelihood ratio test=843.6  on 47 df, p=< 2.2e-16
n= 1470, number of events= 237 

About Us

DASHBOARD PREPARED BY (CONTACT FOR MACHINE LEARNING TRAINING, COACHING & CONSULTING)

* Dr AMITA SHARMA Post Doc from Erasmus University, Rotterdam, the Netherlands Assistant Professor Institute of Agri Business Management, Swami Keshwanand Rajasthan Agricultural University, Bikaner (Raj),India Blog: www.thinkingai.in

* ARUN KUMAR SHARMA Machine Learning Enthusiast 13 Years of Financial Services Marketing Exp Blogger, Writer and Machine Learning Consutlant Certified Business Analytics Professional Certified in Predictive Analytics, Indian Institute of Mnamagement,IIMx Bangalore Certified in Macroeconomic Forecasting, International Monetary Fund(IMFx) Certified in Text Analytics, openSAP Email: Tel:9468567418

---
title: "Understanding Employee Attrition with Survival Analysis"
output: 
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: fill
    social: ["facebook","twitter", "menu"]
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(survival)
library(ggplot2)
library(ggfortify)
library(ranger)
library(DT)

```


ACKNOWELDGEMENT & DISCLAIMER
====================================================


ACKNOWLDGEMENT


* The dashboard is for educational purpose only and it should be used for understanding the survival analysis technique. The outputs and analysis displayed in the flexdashboard should not strictly be considered for medical related advice or judgement.


DISCLAIMER

* > Authors don't bear any responsibility for any consequences emnating from the content covered in the dashboard, the web url and links. What is Employee Attrition? {data-navmenu="MENU"} ===================================================

Employee Attrition occurs when workforce of organization diminishes gradually over the period of time for many reasons like retirement, resignation for professional or personal reason.

LinkedIn's 2018 talent turnover report states about attritions in various sectors. Readers interested in this report, they can visit LinkedIn site. According to 2019 Retention Report by Work Institute, the attrition rate will be as high as 35% by 2023. It also points out that only job satisfaction can not keep the employees loyal. Career development, work-life balance and supervisor behavior are the key reasons for leaving the organization. The challenge in HR retention is not about creating an umbrella HR Planning but offering the sparsely different HR models which are highly customizable at micro level or individual level. HR manager needs an predictive model which can predict the propensity of attrition at individual level so that he can plan in way before employee actually walks out.But how to do that as HR manager is not an prediction expert and they come from no mathematics modeling background, generally. This flexdashboard is an attempt to address this problem. Let's begin, so click on next menu item on "MENU" tab. ```{r} setwd("C:/Users/arunkumar/Desktop/R/flexdashboard/HR Employee Attrition") attr=read.csv("attrition.csv", header=TRUE, stringsAsFactors = FALSE) ``` Overview of Dataset {data-navmenu="MENU"} ========================================================= Row {.tabset} --------------------------------------------------------- ### Dataset Table ```{r} DT::datatable(attr, filter="top") ``` ### Variables in the Dataset ```{r} print(colnames(attr)) ``` ### Structure of the Dataset ```{r} print(str(attr)) ``` ### Summary Statistics of the Dataset ```{r} print(summary(attr)) ``` ### Corrplot of Numeric Variables ```{r} cor_data=dplyr::select_if(attr, is.numeric) cor_data=cor_data[,-c(5,18)] cormat=cor(cor_data) corrplot::corrplot(cormat, title = "Correlation Plot of All Numeric Variables") ``` Data Manipulation {data-navmenu="MENU"} ======================================== Row {.tabset} ------------------------------------------ ### Data Manipulation ```{r echo=TRUE} attr$EmployeeCount=NULL # Removing variable because of constant value attr$EmployeeNumber=NULL # Removing variable because it is simply a serial number attr$StandardHours=NULL # Removing variable because of constant value # Converting Attrition character into factor attr$Attrition=as.factor(attr$Attrition) # Attrition=data.frame(Attrition) # Converting BusinessTravel into factor variable attr$BusinessTravel=as.factor(attr$BusinessTravel) # Creating HighRate variable based on DailyRate attr$HighRate=as.factor(ifelse(attr$DailyRate>750,"HighRate","LowRate")) # Converting Department Variable into factor vairable attr$Department=as.factor(attr$Department) # Coverting Education into factor variable attr$Education=as.factor(attr$Education) # Converting EducationField variable into factor variable attr$EducationField=as.factor(attr$EducationField) # Converting Gender into factor variable attr$Gender=as.factor(attr$Gender) # Changing JobRole into factor variable attr$JobRole=as.factor(attr$JobRole) # Changing Marital Status into Factor Variable attr$MaritalStatus=as.factor(attr$MaritalStatus) # Removing Over18 Variable attr$Over18=NULL # Changing OverTime variable into factor variable attr$OverTime=as.factor(attr$OverTime) # Changing Attrition variable into numeric variable Attrition=ifelse(attr$Attrition=="Yes",1,0) attr$status=Attrition attr$Attrition=NULL time=attr$YearsAtCompany attr$time=time attr$YearsAtCompany=NULL attr$BusinessTravel=as.factor(attr$BusinessTravel) attr$Department=as.factor(attr$Department) attr$EducationField=as.factor(attr$EducationField) attr$Gender=as.factor(attr$Gender) attr$JobRole=as.factor(attr$JobRole) attr$MaritalStatus=as.factor(attr$MaritalStatus) attr$OverTime=as.factor(attr$OverTime) attr$Over18=NULL ``` ### Data Table after Data Manipulation ```{r} datatable(attr, filter="top") ``` Survival Analysis: Explainer {data-navmenu="MENU"} =================================================== Row ---------------------------------------- WHAT IS SURVIVAL ANALYSIS Survival Analysis is a branch of statistics for analyzing the expected duration of time until one or more events happen, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, duration analysis or duration modelling in economics, and event history analysis in sociology. Survival analysis attempts to answer questions such as: what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?

More generally, survival analysis involves the modelling of time to event data; in this context, death or failure is considered an "event" in the survival analysis literature – traditionally only a single event occurs for each subject, after which the organism or mechanism is dead or broken. Recurring event or repeated event models relax that assumption. The study of recurring events is relevant in systems reliability, and in many areas of social sciences and medical research. Survival Analysis can also be used in non traditional fields like predicting the employee attrition, predicting the customer churn, predictive maintenance, earthquake prediction and many other.

The case study in focus is Estimating Employee Attrition based on Survival Analysis Techniques. In this flexdashbaord, we are going to use Kaplan Meier Model, Cox Proportional Hazard Model, Random Forest Model and Weibull's Model.

Source:https://en.wikipedia.org/wiki/Survival_analysis

The dataset used in this Flexdashboard can be accessed from the following link---

https://www.kaggle.com/patelprashant/employee-attrition Survival Analysis: Employee Attrition {data-navmenu="MENU"} ====================================================== Row {.tabset} ------------------------------------------------------ ### Kaplan Meier Model ```{r} fit1=survfit(Surv(time, status)~1, data=attr) summary(fit1) ``` ### KM Model Plot ```{r} plot(fit1, xlab="Survival Time of Employee in Years",ylab="Survival Probabilities", main="Plot of Kaplan Meier Model on Attrition", col="blue", lwd=2, ylim=c(0.4,1)) ``` ### KM Model With Gender Comparison ```{r} fit2=survfit(Surv(time, status)~Gender,data=attr) print(fit2) ``` ### KM Model Plot with Gender Comparison ```{r} plot(fit2, col=1:2, lwd=2, mark.time=TRUE, xlab="Survival Time of Employee in Years", ylab="Survival Probabilities", main="KM Model Plot with Gender Comparison") legend(32, 1, c("Female", "Male"), col=1:2, lwd=2, bty='n') ``` ### KM Model With Performance Rating ```{r} fit3=survfit(Surv(time, status)~PerformanceRating, data=attr) fit3 ``` ### KM Model Plot with Performance Rating ```{r} plot(fit3,fun='event',col=1:2, xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Performance Rating") legend(20,1, paste("PerformanceRating", 3:4, sep=' ='), col=1:2, lty=1:2, lwd=2, bty='n') ``` ### KM Model With OverTime ```{r} fit4=survfit(Surv(time, status)~OverTime, data=attr) fit4 ``` ### KM Model Plot with OverTime ```{r} plot(fit4,fun='event',col=3:4, xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with OverTime") legend(20,1,c("OverTime","No OverTime"), lty=1:2,col=3:4, lwd=2, bty='n') ``` ### KM Model With Marital Status ```{r} fit5=survfit(Surv(time, status)~MaritalStatus, data=attr) fit5 ``` ### KM Model Plot with Marital Status ```{r} plot(fit5,fun='event',col=5:7, xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Marital Status") legend(20,1,c("Divorced","Married","Single"), lty=1:3,col=5:7, lwd=2, bty='n') ``` ### KM Model With OverTime ```{r} fit6=survfit(Surv(time, status)~JobRole, data=attr) fit6 ``` ### KM Model Plot with Job Role ```{r} plot(fit6,fun='event',col=1:9, xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Job Role") legend(25,1,c("Healthcare Representative","Human Resources","Laboratory Technician", "Manager","Manufacturing Director","Research Director","Research Scientist","Sales Executive","Sales Representative"), lty=1:9,col=1:9, lwd=2, bty='n', cex=0.5) ``` ### KM Model With WorkLifeBalance ```{r} fit7=survfit(Surv(time, status)~WorkLifeBalance, data=attr) fit7 ``` ### KM Model Plot with WorkLifeBalance ```{r} plot(fit7,fun='event',col=11:14, xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Work Life Balance") legend(25,1,c("WLB=1","WLB=2","WLB=3","WLB=4"), lty=1:4,col=11:14, lwd=2, bty='n', cex=0.5) ``` ### KM Model With BusinessTravel ```{r} fit8=survfit(Surv(time, status)~BusinessTravel, data=attr) fit8 ``` ### KM Model Plot with WorkLifeBalance ```{r} plot(fit8,fun='event',col=21:23, xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Business Travel") legend(25,1,c("Non-Travel","Travel-Frequently","Travel-Rarely"), lty=9:11,col=21:23, lwd=2, bty='n', cex=0.5) ``` ### KM Model With Job Involvement ```{r} fit9=survfit(Surv(time, status)~JobInvolvement, data=attr) fit9 ``` ### KM Model Plot with WorkLifeBalance ```{r} plot(fit9,fun='event',col=21:24, xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Job involvement") legend(25,1,c("JB=1","JB=2","JB=3","JB=4"), lty=9:12,col=21:24, lwd=2, bty='n', cex=0.5) ``` ### KM Model With Job Level ```{r} fit10=survfit(Surv(time, status)~JobRole, data=attr) fit10 ``` ### KM Model Plot with Job Level ```{r} plot(fit10,fun='event',col=21:25, xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Job Level") legend(25,1,c("JL=1","JL=2","JL=3","JL=4","JL=5"), lty=9:13,col=21:25, lwd=2, bty='n', cex=0.5) ``` ### KM Model With Job Satisfaction ```{r} fit11=survfit(Surv(time, status)~JobSatisfaction, data=attr) fit11 ``` ### KM Model Plot with WorkLifeBalance ```{r} plot(fit11,fun='event',col=1:4, xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Job Satisfaction") legend(25,1,c("JS=1","JS=2","JS=3","JS=4"), lty=9:11,col=1:4, lwd=2, bty='n', cex=0.5) ``` ### COX PH Model Multivariate Analysis ```{r} fit12=coxph(Surv(time, status)~., data=attr) fit12 ``` About Us {data-navmenu="MENU"} ============================================== ### DASHBOARD PREPARED BY (CONTACT FOR MACHINE LEARNING TRAINING, COACHING & CONSULTING) * Dr AMITA SHARMA Post Doc from Erasmus University, Rotterdam, the Netherlands Assistant Professor Institute of Agri Business Management, Swami Keshwanand Rajasthan Agricultural University, Bikaner (Raj),India Blog: www.thinkingai.in

* ARUN KUMAR SHARMA Machine Learning Enthusiast 13 Years of Financial Services Marketing Exp Blogger, Writer and Machine Learning Consutlant Certified Business Analytics Professional Certified in Predictive Analytics, Indian Institute of Mnamagement,IIMx Bangalore Certified in Macroeconomic Forecasting, International Monetary Fund(IMFx) Certified in Text Analytics, openSAP Email: aks10000@gmail.com Tel:9468567418