Objective:

The objective of the project for a company’s HR department is to use data science approach to study the employee churn and develop a prediction model to forecast the employee churn. The HR department is trying to seek answers for the following questions:
  1. What is the current attrition rate?
  2. Where is the attrition happening?
  3. What are the factors that contributing to attrition? (Employee Job Level, Job Role, Employee Satisfaction, Monthly Salary, etc.)
  4. Can we build a prediction model to find employee churn?
  5. If yes, how well is the model performing?

Analytics Approach:

To develop a employee churn prediction model, we will perform the following steps:

  1. Collect historical data from HR Database for all employees / by sampling if the number of employees is very huge.
  2. Explore data and draw insights from the exploratory data analysis
  3. Prepare data for further analysis (model building, exploration)
  4. Perform exploratory data analysis on the dataset that will be used for model building
  5. Split the dataset into training and test data
  6. Build various machine learning models training data
  7. Evaluate the performance of the model and tune the model to improve its accuracy
  8. Validate the model performnace using test data
  9. Recommend the model to be used based on accuracy measures
    Solution steps
  1. Read input data from Attrition.csv file

  2. Input Data analysis

    ## [1] 2940   35
    ## 'data.frame':    2940 obs. of  35 variables:
    ##  $ EmployeeNumber          : int  1 2 3 4 5 6 7 8 9 10 ...
    ##  $ Attrition               : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 1 1 1 1 1 ...
    ##  $ Age                     : int  41 49 37 33 27 32 59 30 38 36 ...
    ##  $ BusinessTravel          : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 2 3 2 3 2 3 3 2 3 ...
    ##  $ DailyRate               : int  1102 279 1373 1392 591 1005 1324 1358 216 1299 ...
    ##  $ Department              : Factor w/ 3 levels "Human Resources",..: 3 2 2 2 2 2 2 2 2 2 ...
    ##  $ DistanceFromHome        : int  1 8 2 3 2 2 3 24 23 27 ...
    ##  $ Education               : int  2 1 2 4 1 2 3 1 3 3 ...
    ##  $ EducationField          : Factor w/ 6 levels "Human Resources",..: 2 2 5 2 4 2 4 2 2 4 ...
    ##  $ EmployeeCount           : int  1 1 1 1 1 1 1 1 1 1 ...
    ##  $ EnvironmentSatisfaction : int  2 3 4 4 1 4 3 4 4 3 ...
    ##  $ Gender                  : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 2 2 2 ...
    ##  $ HourlyRate              : int  94 61 92 56 40 79 81 67 44 94 ...
    ##  $ JobInvolvement          : int  3 2 2 3 3 3 4 3 2 3 ...
    ##  $ JobLevel                : int  2 2 1 1 1 1 1 1 3 2 ...
    ##  $ JobRole                 : Factor w/ 9 levels "Healthcare Representative",..: 8 7 3 7 3 3 3 3 5 1 ...
    ##  $ JobSatisfaction         : int  4 2 3 3 2 4 1 3 3 3 ...
    ##  $ MaritalStatus           : Factor w/ 3 levels "Divorced","Married",..: 3 2 3 2 2 3 2 1 3 2 ...
    ##  $ MonthlyIncome           : int  5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
    ##  $ MonthlyRate             : int  19479 24907 2396 23159 16632 11864 9964 13335 8787 16577 ...
    ##  $ NumCompaniesWorked      : int  8 1 6 1 9 0 4 1 0 6 ...
    ##  $ Over18                  : Factor w/ 1 level "Y": 1 1 1 1 1 1 1 1 1 1 ...
    ##  $ OverTime                : Factor w/ 2 levels "No","Yes": 2 1 2 2 1 1 2 1 1 1 ...
    ##  $ PercentSalaryHike       : int  11 23 15 11 12 13 20 22 21 13 ...
    ##  $ PerformanceRating       : int  3 4 3 3 3 3 4 4 4 3 ...
    ##  $ RelationshipSatisfaction: int  1 4 2 3 4 3 1 2 2 2 ...
    ##  $ StandardHours           : int  80 80 80 80 80 80 80 80 80 80 ...
    ##  $ StockOptionLevel        : int  0 1 0 0 1 0 3 1 0 2 ...
    ##  $ TotalWorkingYears       : int  8 10 7 8 6 8 12 1 10 17 ...
    ##  $ TrainingTimesLastYear   : int  0 3 3 3 3 2 3 2 2 3 ...
    ##  $ WorkLifeBalance         : int  1 3 3 3 3 2 2 3 3 2 ...
    ##  $ YearsAtCompany          : int  6 10 0 8 2 7 1 1 9 7 ...
    ##  $ YearsInCurrentRole      : int  4 7 0 7 2 7 0 0 7 7 ...
    ##  $ YearsSinceLastPromotion : int  0 1 0 3 2 3 0 0 1 7 ...
    ##  $ YearsWithCurrManager    : int  5 7 0 0 2 6 0 0 8 7 ...
    ##  EmployeeNumber   Attrition       Age                  BusinessTravel
    ##  Min.   :   1.0   No :2466   Min.   :18.00   Non-Travel       : 300  
    ##  1st Qu.: 735.8   Yes: 474   1st Qu.:30.00   Travel_Frequently: 554  
    ##  Median :1470.5              Median :36.00   Travel_Rarely    :2086  
    ##  Mean   :1470.5              Mean   :36.92                           
    ##  3rd Qu.:2205.2              3rd Qu.:43.00                           
    ##  Max.   :2940.0              Max.   :60.00                           
    ##                                                                      
    ##    DailyRate                       Department   DistanceFromHome
    ##  Min.   : 102.0   Human Resources       : 126   Min.   : 1.000  
    ##  1st Qu.: 465.0   Research & Development:1922   1st Qu.: 2.000  
    ##  Median : 802.0   Sales                 : 892   Median : 7.000  
    ##  Mean   : 802.5                                 Mean   : 9.193  
    ##  3rd Qu.:1157.0                                 3rd Qu.:14.000  
    ##  Max.   :1499.0                                 Max.   :29.000  
    ##                                                                 
    ##    Education              EducationField EmployeeCount
    ##  Min.   :1.000   Human Resources :  54   Min.   :1    
    ##  1st Qu.:2.000   Life Sciences   :1212   1st Qu.:1    
    ##  Median :3.000   Marketing       : 318   Median :1    
    ##  Mean   :2.913   Medical         : 928   Mean   :1    
    ##  3rd Qu.:4.000   Other           : 164   3rd Qu.:1    
    ##  Max.   :5.000   Technical Degree: 264   Max.   :1    
    ##                                                       
    ##  EnvironmentSatisfaction    Gender       HourlyRate     JobInvolvement
    ##  Min.   :1.000           Female:1176   Min.   : 30.00   Min.   :1.00  
    ##  1st Qu.:2.000           Male  :1764   1st Qu.: 48.00   1st Qu.:2.00  
    ##  Median :3.000                         Median : 66.00   Median :3.00  
    ##  Mean   :2.722                         Mean   : 65.89   Mean   :2.73  
    ##  3rd Qu.:4.000                         3rd Qu.: 84.00   3rd Qu.:3.00  
    ##  Max.   :4.000                         Max.   :100.00   Max.   :4.00  
    ##                                                                       
    ##     JobLevel                          JobRole    JobSatisfaction
    ##  Min.   :1.000   Sales Executive          :652   Min.   :1.000  
    ##  1st Qu.:1.000   Research Scientist       :584   1st Qu.:2.000  
    ##  Median :2.000   Laboratory Technician    :518   Median :3.000  
    ##  Mean   :2.064   Manufacturing Director   :290   Mean   :2.729  
    ##  3rd Qu.:3.000   Healthcare Representative:262   3rd Qu.:4.000  
    ##  Max.   :5.000   Manager                  :204   Max.   :4.000  
    ##                  (Other)                  :430                  
    ##   MaritalStatus  MonthlyIncome    MonthlyRate    NumCompaniesWorked
    ##  Divorced: 654   Min.   : 1009   Min.   : 2094   Min.   :0.000     
    ##  Married :1346   1st Qu.: 2911   1st Qu.: 8045   1st Qu.:1.000     
    ##  Single  : 940   Median : 4919   Median :14236   Median :2.000     
    ##                  Mean   : 6503   Mean   :14313   Mean   :2.693     
    ##                  3rd Qu.: 8380   3rd Qu.:20462   3rd Qu.:4.000     
    ##                  Max.   :19999   Max.   :26999   Max.   :9.000     
    ##                                                                    
    ##  Over18   OverTime   PercentSalaryHike PerformanceRating
    ##  Y:2940   No :2108   Min.   :11.00     Min.   :3.000    
    ##           Yes: 832   1st Qu.:12.00     1st Qu.:3.000    
    ##                      Median :14.00     Median :3.000    
    ##                      Mean   :15.21     Mean   :3.154    
    ##                      3rd Qu.:18.00     3rd Qu.:3.000    
    ##                      Max.   :25.00     Max.   :4.000    
    ##                                                         
    ##  RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears
    ##  Min.   :1.000            Min.   :80    Min.   :0.0000   Min.   : 0.00    
    ##  1st Qu.:2.000            1st Qu.:80    1st Qu.:0.0000   1st Qu.: 6.00    
    ##  Median :3.000            Median :80    Median :1.0000   Median :10.00    
    ##  Mean   :2.712            Mean   :80    Mean   :0.7939   Mean   :11.28    
    ##  3rd Qu.:4.000            3rd Qu.:80    3rd Qu.:1.0000   3rd Qu.:15.00    
    ##  Max.   :4.000            Max.   :80    Max.   :3.0000   Max.   :40.00    
    ##                                                                           
    ##  TrainingTimesLastYear WorkLifeBalance YearsAtCompany   YearsInCurrentRole
    ##  Min.   :0.000         Min.   :1.000   Min.   : 0.000   Min.   : 0.000    
    ##  1st Qu.:2.000         1st Qu.:2.000   1st Qu.: 3.000   1st Qu.: 2.000    
    ##  Median :3.000         Median :3.000   Median : 5.000   Median : 3.000    
    ##  Mean   :2.799         Mean   :2.761   Mean   : 7.008   Mean   : 4.229    
    ##  3rd Qu.:3.000         3rd Qu.:3.000   3rd Qu.: 9.000   3rd Qu.: 7.000    
    ##  Max.   :6.000         Max.   :4.000   Max.   :40.000   Max.   :18.000    
    ##                                                                           
    ##  YearsSinceLastPromotion YearsWithCurrManager
    ##  Min.   : 0.000          Min.   : 0.000      
    ##  1st Qu.: 0.000          1st Qu.: 2.000      
    ##  Median : 1.000          Median : 3.000      
    ##  Mean   : 2.188          Mean   : 4.123      
    ##  3rd Qu.: 3.000          3rd Qu.: 7.000      
    ##  Max.   :15.000          Max.   :17.000      
    ## 
    ## 
    ##   No  Yes 
    ## 2466  474
    ## 
    ##        No       Yes 
    ## 0.8387755 0.1612245

    From the above analysis we see that the employee attrition rate is 16.12%, 474 out of 2940 employees have churned.

  3. Exploratory Data Analysis: We will explore the

    ## Warning: package 'ggplot2' was built under R version 3.5.1

  4. Let us further explore the other variables and attrition data

    • Attrition by Department

      ##                         
      ##                            No  Yes
      ##   Human Resources         102   24
      ##   Research & Development 1656  266
      ##   Sales                   708  184

      Highest attrition is seen in Sales Department, then in Human Resources. Low attrition is seen in Research & Development

    • Attrition and Job Levels

      ##    
      ##      No Yes
      ##   1 800 286
      ##   2 964 104
      ##   3 372  64
      ##   4 202  10
      ##   5 128  10

      We see that highest attrition is seen at Job level 1 and then at level 3

    • Attrition and Job Satisfaction

      ##    
      ##      No Yes
      ##   1 446 132
      ##   2 468  92
      ##   3 738 146
      ##   4 814 104

      Lower the Job Satisfaction index, higher is the attrition (as expected).

    • Attrition and Overtime

      ##      
      ##         No  Yes
      ##   No  1888  220
      ##   Yes  578  254

      Employees who are working extra hours are more likely to attrite as compared to employees who are working during normal working hours.

    • Monthly Income and Attrition

      From the above plot we see that monthly income has inverse relationship with Attrition for all departments.

      ##                            
      ##                              No Yes
      ##   Healthcare Representative 244  18
      ##   Human Resources            80  24
      ##   Laboratory Technician     394 124
      ##   Manager                   194  10
      ##   Manufacturing Director    270  20
      ##   Research Director         156   4
      ##   Research Scientist        490  94
      ##   Sales Executive           538 114
      ##   Sales Representative      100  66

      Highest Attrition is seen for Sales Representatives and least Attrition is found for Research Director role.

    • Gender And Attrition

      ##         
      ##            No  Yes
      ##   Female 1002  174
      ##   Male   1464  300

      Attrition is higher for Male employees as compared with female employees.

    • Attrition and Education

      ##    
      ##      No Yes
      ##   1 278  62
      ##   2 476  88
      ##   3 946 198
      ##   4 680 116
      ##   5  86  10

      Highest Attrition is seen for Education levels 1 and 3 and least Attrition is found forEducation level 5.

    • Age and Attrition From the above boxplot, we see that younger employees are more likely to attrite as compared to employees of higher age.

  5. Hypothesis Tests using Chi-square test of independence

    ##      
    ##         No  Yes
    ##   No  1888  220
    ##   Yes  578  254
    ## 
    ##  Pearson's Chi-squared test with Yates' continuity correction
    ## 
    ## data:  tabovertime
    ## X-squared = 176.61, df = 1, p-value < 2.2e-16
    ##                            
    ##                              No Yes
    ##   Healthcare Representative 244  18
    ##   Human Resources            80  24
    ##   Laboratory Technician     394 124
    ##   Manager                   194  10
    ##   Manufacturing Director    270  20
    ##   Research Director         156   4
    ##   Research Scientist        490  94
    ##   Sales Executive           538 114
    ##   Sales Representative      100  66
    ## 
    ##  Pearson's Chi-squared test
    ## 
    ## data:  tabjobrole
    ## X-squared = 172.38, df = 8, p-value < 2.2e-16
    ##    
    ##      No Yes
    ##   1 278  62
    ##   2 476  88
    ##   3 946 198
    ##   4 680 116
    ##   5  86  10
    ## 
    ##  Pearson's Chi-squared test
    ## 
    ## data:  tabEducation
    ## X-squared = 6.1479, df = 4, p-value = 0.1884
    ##           
    ##              No  Yes
    ##   Divorced  588   66
    ##   Married  1178  168
    ##   Single    700  240
    ## 
    ##  Pearson's Chi-squared test
    ## 
    ## data:  tabMaritalStatus
    ## X-squared = 92.327, df = 2, p-value < 2.2e-16
    ##    
    ##      No Yes
    ##   1 446 132
    ##   2 468  92
    ##   3 738 146
    ##   4 814 104
    ## 
    ##  Pearson's Chi-squared test
    ## 
    ## data:  tabJobSatisfaction
    ## X-squared = 35.01, df = 3, p-value = 1.212e-07
    ##    
    ##       No  Yes
    ##   1  110   50
    ##   2  572  116
    ##   3 1532  254
    ##   4  252   54
    ## 
    ##  Pearson's Chi-squared test
    ## 
    ## data:  tabWorkLifeBalance
    ## X-squared = 32.65, df = 3, p-value = 3.817e-07
    ##                         
    ##                            No  Yes
    ##   Human Resources         102   24
    ##   Research & Development 1656  266
    ##   Sales                   708  184
    ## 
    ##  Pearson's Chi-squared test
    ## 
    ## data:  tabDepartment
    ## X-squared = 21.592, df = 2, p-value = 2.048e-05
    ##    
    ##      No Yes
    ##   1 438 114
    ##   2 516  90
    ##   3 776 142
    ##   4 736 128
    ## 
    ##  Pearson's Chi-squared test
    ## 
    ## data:  tabRelSatisfaction
    ## X-squared = 10.482, df = 3, p-value = 0.01488
    ##                    
    ##                       No  Yes
    ##   Non-Travel         276   24
    ##   Travel_Frequently  416  138
    ##   Travel_Rarely     1774  312
    ## 
    ##  Pearson's Chi-squared test
    ## 
    ## data:  tabBusTravel
    ## X-squared = 48.365, df = 2, p-value = 3.146e-11
    ##    
    ##       No  Yes
    ##   0  954  308
    ##   1 1080  112
    ##   2  292   24
    ##   3  140   30
    ## 
    ##  Pearson's Chi-squared test
    ## 
    ## data:  tabStockOptionLevel
    ## X-squared = 121.2, df = 3, p-value < 2.2e-16
  6. Correlation Matrix

    ##                              Age DailyRate DistanceFromHome
    ## Age                      1.00000  0.010661         -0.00169
    ## DailyRate                0.01066  1.000000         -0.00499
    ## DistanceFromHome        -0.00169 -0.004985          1.00000
    ## EnvironmentSatisfaction  0.01015  0.018355         -0.01608
    ## HourlyRate               0.02429  0.023381          0.03113
    ## JobLevel                 0.50960  0.002966          0.00530
    ## MonthlyIncome            0.49785  0.007707         -0.01701
    ## MonthlyRate              0.02805 -0.032182          0.02747
    ## NumCompaniesWorked       0.29963  0.038153         -0.02925
    ## PercentSalaryHike        0.00363  0.022704          0.04024
    ## PerformanceRating        0.00190  0.000473          0.02711
    ## TotalWorkingYears        0.68038  0.014515          0.00463
    ## TrainingTimesLastYear   -0.01962  0.002453         -0.03694
    ## YearsAtCompany           0.31131 -0.034055          0.00951
    ## YearsInCurrentRole       0.21290  0.009932          0.01884
    ## YearsSinceLastPromotion  0.21651 -0.033229          0.01003
    ## YearsWithCurrManager     0.20209 -0.026363          0.01441
    ##                         EnvironmentSatisfaction HourlyRate JobLevel
    ## Age                                     0.01015    0.02429  0.50960
    ## DailyRate                               0.01835    0.02338  0.00297
    ## DistanceFromHome                       -0.01608    0.03113  0.00530
    ## EnvironmentSatisfaction                 1.00000   -0.04986  0.00121
    ## HourlyRate                             -0.04986    1.00000 -0.02785
    ## JobLevel                                0.00121   -0.02785  1.00000
    ## MonthlyIncome                          -0.00626   -0.01579  0.95030
    ## MonthlyRate                             0.03760   -0.01530  0.03956
    ## NumCompaniesWorked                      0.01259    0.02216  0.14250
    ## PercentSalaryHike                      -0.03170   -0.00906 -0.03473
    ## PerformanceRating                      -0.02955   -0.00217 -0.02122
    ## TotalWorkingYears                      -0.00269   -0.00233  0.78221
    ## TrainingTimesLastYear                  -0.01936   -0.00855 -0.01819
    ## YearsAtCompany                          0.00146   -0.01958  0.53474
    ## YearsInCurrentRole                      0.01801   -0.02411  0.38945
    ## YearsSinceLastPromotion                 0.01619   -0.02672  0.35389
    ## YearsWithCurrManager                   -0.00500   -0.02012  0.37528
    ##                         MonthlyIncome MonthlyRate NumCompaniesWorked
    ## Age                           0.49785     0.02805             0.2996
    ## DailyRate                     0.00771    -0.03218             0.0382
    ## DistanceFromHome             -0.01701     0.02747            -0.0293
    ## EnvironmentSatisfaction      -0.00626     0.03760             0.0126
    ## HourlyRate                   -0.01579    -0.01530             0.0222
    ## JobLevel                      0.95030     0.03956             0.1425
    ## MonthlyIncome                 1.00000     0.03481             0.1495
    ## MonthlyRate                   0.03481     1.00000             0.0175
    ## NumCompaniesWorked            0.14952     0.01752             1.0000
    ## PercentSalaryHike            -0.02727    -0.00643            -0.0102
    ## PerformanceRating            -0.01712    -0.00981            -0.0141
    ## TotalWorkingYears             0.77289     0.02644             0.2376
    ## TrainingTimesLastYear        -0.02174     0.00147            -0.0661
    ## YearsAtCompany                0.51428    -0.02366            -0.1184
    ## YearsInCurrentRole            0.36382    -0.01281            -0.0908
    ## YearsSinceLastPromotion       0.34498     0.00157            -0.0368
    ## YearsWithCurrManager          0.34408    -0.03675            -0.1103
    ##                         PercentSalaryHike PerformanceRating
    ## Age                               0.00363          0.001904
    ## DailyRate                         0.02270          0.000473
    ## DistanceFromHome                  0.04024          0.027110
    ## EnvironmentSatisfaction          -0.03170         -0.029548
    ## HourlyRate                       -0.00906         -0.002172
    ## JobLevel                         -0.03473         -0.021222
    ## MonthlyIncome                    -0.02727         -0.017120
    ## MonthlyRate                      -0.00643         -0.009811
    ## NumCompaniesWorked               -0.01024         -0.014095
    ## PercentSalaryHike                 1.00000          0.773550
    ## PerformanceRating                 0.77355          1.000000
    ## TotalWorkingYears                -0.02061          0.006744
    ## TrainingTimesLastYear            -0.00522         -0.015579
    ## YearsAtCompany                   -0.03599          0.003435
    ## YearsInCurrentRole               -0.00152          0.034986
    ## YearsSinceLastPromotion          -0.02215          0.017896
    ## YearsWithCurrManager             -0.01199          0.022827
    ##                         TotalWorkingYears TrainingTimesLastYear
    ## Age                               0.68038              -0.01962
    ## DailyRate                         0.01451               0.00245
    ## DistanceFromHome                  0.00463              -0.03694
    ## EnvironmentSatisfaction          -0.00269              -0.01936
    ## HourlyRate                       -0.00233              -0.00855
    ## JobLevel                          0.78221              -0.01819
    ## MonthlyIncome                     0.77289              -0.02174
    ## MonthlyRate                       0.02644               0.00147
    ## NumCompaniesWorked                0.23764              -0.06605
    ## PercentSalaryHike                -0.02061              -0.00522
    ## PerformanceRating                 0.00674              -0.01558
    ## TotalWorkingYears                 1.00000              -0.03566
    ## TrainingTimesLastYear            -0.03566               1.00000
    ## YearsAtCompany                    0.62813               0.00357
    ## YearsInCurrentRole                0.46036              -0.00574
    ## YearsSinceLastPromotion           0.40486              -0.00207
    ## YearsWithCurrManager              0.45919              -0.00410
    ##                         YearsAtCompany YearsInCurrentRole
    ## Age                            0.31131            0.21290
    ## DailyRate                     -0.03405            0.00993
    ## DistanceFromHome               0.00951            0.01884
    ## EnvironmentSatisfaction        0.00146            0.01801
    ## HourlyRate                    -0.01958           -0.02411
    ## JobLevel                       0.53474            0.38945
    ## MonthlyIncome                  0.51428            0.36382
    ## MonthlyRate                   -0.02366           -0.01281
    ## NumCompaniesWorked            -0.11842           -0.09075
    ## PercentSalaryHike             -0.03599           -0.00152
    ## PerformanceRating              0.00344            0.03499
    ## TotalWorkingYears              0.62813            0.46036
    ## TrainingTimesLastYear          0.00357           -0.00574
    ## YearsAtCompany                 1.00000            0.75875
    ## YearsInCurrentRole             0.75875            1.00000
    ## YearsSinceLastPromotion        0.61841            0.54806
    ## YearsWithCurrManager           0.76921            0.71436
    ##                         YearsSinceLastPromotion YearsWithCurrManager
    ## Age                                     0.21651               0.2021
    ## DailyRate                              -0.03323              -0.0264
    ## DistanceFromHome                        0.01003               0.0144
    ## EnvironmentSatisfaction                 0.01619              -0.0050
    ## HourlyRate                             -0.02672              -0.0201
    ## JobLevel                                0.35389               0.3753
    ## MonthlyIncome                           0.34498               0.3441
    ## MonthlyRate                             0.00157              -0.0367
    ## NumCompaniesWorked                     -0.03681              -0.1103
    ## PercentSalaryHike                      -0.02215              -0.0120
    ## PerformanceRating                       0.01790               0.0228
    ## TotalWorkingYears                       0.40486               0.4592
    ## TrainingTimesLastYear                  -0.00207              -0.0041
    ## YearsAtCompany                          0.61841               0.7692
    ## YearsInCurrentRole                      0.54806               0.7144
    ## YearsSinceLastPromotion                 1.00000               0.5102
    ## YearsWithCurrManager                    0.51022               1.0000
    ## Warning: package 'corrplot' was built under R version 3.5.1
    ## corrplot 0.84 loaded

  7. Regression Model : Let us predict the number of years at company an employee will serve based on other independent variables in the dataset.

    ##  [1] "EmployeeNumber"           "Attrition"               
    ##  [3] "Age"                      "BusinessTravel"          
    ##  [5] "DailyRate"                "Department"              
    ##  [7] "DistanceFromHome"         "Education"               
    ##  [9] "EducationField"           "EnvironmentSatisfaction" 
    ## [11] "Gender"                   "HourlyRate"              
    ## [13] "JobInvolvement"           "JobLevel"                
    ## [15] "JobRole"                  "JobSatisfaction"         
    ## [17] "MaritalStatus"            "MonthlyIncome"           
    ## [19] "MonthlyRate"              "NumCompaniesWorked"      
    ## [21] "OverTime"                 "PercentSalaryHike"       
    ## [23] "PerformanceRating"        "RelationshipSatisfaction"
    ## [25] "StandardHours"            "StockOptionLevel"        
    ## [27] "TotalWorkingYears"        "TrainingTimesLastYear"   
    ## [29] "WorkLifeBalance"          "YearsAtCompany"          
    ## [31] "YearsInCurrentRole"       "YearsSinceLastPromotion" 
    ## [33] "YearsWithCurrManager"
    ## 
    ## Call:
    ## lm(formula = train$YearsAtCompany ~ train$DistanceFromHome + 
    ##     train$MonthlyIncome + train$PercentSalaryHike + train$TotalWorkingYears + 
    ##     train$YearsInCurrentRole + train$YearsSinceLastPromotion + 
    ##     train$YearsWithCurrManager, data = train)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -9.4257 -1.4584 -0.0825  1.0890 20.2286 
    ## 
    ## Coefficients:
    ##                                 Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept)                   -9.519e-02  3.001e-01  -0.317   0.7511    
    ## train$DistanceFromHome         2.188e-03  7.736e-03   0.283   0.7773    
    ## train$MonthlyIncome            8.428e-05  2.087e-05   4.038 5.58e-05 ***
    ## train$PercentSalaryHike       -3.659e-02  1.740e-02  -2.102   0.0356 *  
    ## train$TotalWorkingYears        1.619e-01  1.350e-02  11.991  < 2e-16 ***
    ## train$YearsInCurrentRole       4.980e-01  2.622e-02  18.995  < 2e-16 ***
    ## train$YearsSinceLastPromotion  3.413e-01  2.460e-02  13.874  < 2e-16 ***
    ## train$YearsWithCurrManager     5.848e-01  2.615e-02  22.368  < 2e-16 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 2.872 on 2050 degrees of freedom
    ## Multiple R-squared:  0.772,  Adjusted R-squared:  0.7713 
    ## F-statistic: 991.9 on 7 and 2050 DF,  p-value: < 2.2e-16

    ## Warning: package 'car' was built under R version 3.5.1
    ## Loading required package: carData
    ##        train$DistanceFromHome           train$MonthlyIncome 
    ##                      1.002486                      2.345415 
    ##       train$PercentSalaryHike       train$TotalWorkingYears 
    ##                      1.002499                      2.648315 
    ##      train$YearsInCurrentRole train$YearsSinceLastPromotion 
    ##                      2.207798                      1.548921 
    ##    train$YearsWithCurrManager 
    ##                      2.185022
    ## [1] 10193.22
    ## [1] 10243.89
    ## [1] 8.218202
    ## [1] 2.866741
    ## 
    ## Call:
    ## lm(formula = train$YearsAtCompany ~ train$MonthlyIncome + train$YearsInCurrentRole + 
    ##     train$YearsSinceLastPromotion + train$YearsWithCurrManager, 
    ##     data = train)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -9.2123 -1.4223 -0.1086  0.8207 23.1071 
    ## 
    ## Coefficients:
    ##                                 Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept)                   -0.3187272  0.1238384  -2.574   0.0101 *  
    ## train$MonthlyIncome            0.0002590  0.0000155  16.710   <2e-16 ***
    ## train$YearsInCurrentRole       0.5262255  0.0270209  19.475   <2e-16 ***
    ## train$YearsSinceLastPromotion  0.3643281  0.0253856  14.352   <2e-16 ***
    ## train$YearsWithCurrManager     0.6355921  0.0266973  23.807   <2e-16 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 2.973 on 2053 degrees of freedom
    ## Multiple R-squared:  0.7555, Adjusted R-squared:  0.755 
    ## F-statistic:  1586 on 4 and 2053 DF,  p-value: < 2.2e-16
    ##           train$MonthlyIncome      train$YearsInCurrentRole 
    ##                      1.207962                      2.189769 
    ## train$YearsSinceLastPromotion    train$YearsWithCurrManager 
    ##                      1.539617                      2.127238
    ## [1] 10331.33
    ## [1] 10365.11
    ## [1] 8.814305
    ## [1] 2.96889
    ## Warning: 'newdata' had 882 rows but variables found have 2058 rows
    ## Warning in test$YearsAtCompany - pred.testMLE: longer object length is not
    ## a multiple of shorter object length
    ## [1] 8.357591
    ## Warning: 'newdata' had 882 rows but variables found have 2058 rows
    ##         1         2         3         4         5         6 
    ##  4.412022  8.446663  8.597393 10.946814  2.744178  4.296685
    ## Warning in test$YearsAtCompany - pred.testMLE: longer object length is not
    ## a multiple of shorter object length
    ## [1] 8.311958
    ## Warning in cbind(test$EmployeeNumber, test$YearsAtCompany, pred.testMLE):
    ## number of rows of result is not a multiple of vector length (arg 1)
    ##    EmployeeNumber YearsAtCompany YearsAtCompany-Predicted
    ## 1               2             10                 4.412022
    ## 2               3              0                 8.446663
    ## 3               6              7                 8.597393
    ## 4              13              5                10.946814
    ## 5              15              4                 2.744178
    ## 6              26             14                 4.296685
    ## 7              28              9                17.174903
    ## 8              31              1                17.027760
    ## 9              32              4                 3.914374
    ## 10             34              1                 3.156766

    Regression model to predict the YearsAtCompany has been built. The model is yielding poor performance on test data. There could be problem of skewness of data and we will not be able to use this model for prediction with good accuracy.

  8. Build CART Model to predict Attrition

    ## Warning: package 'rpart' was built under R version 3.5.1
    ## Warning: package 'caret' was built under R version 3.5.1
    ## Loading required package: lattice
    ## Warning: package 'rattle' was built under R version 3.5.1
    ## Rattle: A free graphical interface for data science with R.
    ## Version 5.2.0 Copyright (c) 2006-2018 Togaware Pty Ltd.
    ## Type 'rattle()' to shake, rattle, and roll your data.
    ## Warning: package 'rpart.plot' was built under R version 3.5.1
    ## 'data.frame':    2058 obs. of  33 variables:
    ##  $ EmployeeNumber          : int  905 758 1623 166 1376 1420 2384 1087 1603 500 ...
    ##  $ Attrition               : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 1 2 1 2 1 ...
    ##  $ Age                     : int  48 34 53 50 32 42 45 50 31 33 ...
    ##  $ BusinessTravel          : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 3 3 3 2 3 3 2 3 3 ...
    ##  $ DailyRate               : int  715 216 1436 1452 238 557 1449 333 542 1216 ...
    ##  $ Department              : Factor w/ 3 levels "Human Resources",..: 2 3 3 2 2 2 3 2 3 3 ...
    ##  $ DistanceFromHome        : int  1 1 6 11 5 18 2 22 20 8 ...
    ##  $ Education               : Factor w/ 5 levels "1","2","3","4",..: 3 4 2 3 2 4 3 5 3 4 ...
    ##  $ EducationField          : Factor w/ 6 levels "Human Resources",..: 2 3 3 2 2 2 3 4 2 3 ...
    ##  $ EnvironmentSatisfaction : Factor w/ 4 levels "1","2","3","4": 4 2 2 3 1 4 1 3 2 3 ...
    ##  $ Gender                  : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 2 1 2 1 2 ...
    ##  $ HourlyRate              : int  76 75 34 53 47 35 94 88 71 39 ...
    ##  $ JobInvolvement          : Factor w/ 4 levels "1","2","3","4": 2 4 3 3 4 3 1 1 1 3 ...
    ##  $ JobLevel                : Factor w/ 5 levels "1","2","3","4",..: 5 2 2 5 1 2 5 4 2 2 ...
    ##  $ JobRole                 : Factor w/ 9 levels "Healthcare Representative",..: 6 8 9 4 7 7 4 6 8 8 ...
    ##  $ JobSatisfaction         : Factor w/ 4 levels "1","2","3","4": 4 4 3 2 3 1 2 4 3 3 ...
    ##  $ MaritalStatus           : Factor w/ 3 levels "Divorced","Married",..: 3 1 2 3 3 1 3 3 2 1 ...
    ##  $ MonthlyIncome           : int  18265 9725 2306 19926 2432 5410 18824 14411 4559 7104 ...
    ##  $ MonthlyRate             : int  8733 12278 16047 17053 15318 11189 2493 24450 24788 20431 ...
    ##  $ NumCompaniesWorked      : int  6 0 2 3 3 6 2 1 3 0 ...
    ##  $ OverTime                : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 2 2 2 1 ...
    ##  $ PercentSalaryHike       : int  12 11 20 15 14 17 16 13 11 12 ...
    ##  $ PerformanceRating       : Factor w/ 2 levels "3","4": 1 1 2 1 1 1 1 1 1 1 ...
    ##  $ RelationshipSatisfaction: Factor w/ 4 levels "1","2","3","4": 3 4 4 2 1 3 1 4 3 4 ...
    ##  $ StandardHours           : int  80 80 80 80 80 80 80 80 80 80 ...
    ##  $ StockOptionLevel        : Factor w/ 4 levels "0","1","2","3": 1 2 2 1 1 2 1 1 2 1 ...
    ##  $ TotalWorkingYears       : int  25 16 13 21 8 9 26 32 4 6 ...
    ##  $ TrainingTimesLastYear   : int  3 2 3 5 2 3 2 2 2 3 ...
    ##  $ WorkLifeBalance         : Factor w/ 4 levels "1","2","3","4": 4 2 1 3 3 2 3 3 3 3 ...
    ##  $ YearsAtCompany          : int  1 15 7 5 4 4 24 32 2 5 ...
    ##  $ YearsInCurrentRole      : int  0 1 7 4 1 3 10 6 2 0 ...
    ##  $ YearsSinceLastPromotion : int  0 0 4 4 0 1 1 13 2 1 ...
    ##  $ YearsWithCurrManager    : int  0 9 5 4 3 2 11 9 2 2 ...
    ##  [1] "EmployeeNumber"           "Attrition"               
    ##  [3] "Age"                      "BusinessTravel"          
    ##  [5] "DailyRate"                "Department"              
    ##  [7] "DistanceFromHome"         "Education"               
    ##  [9] "EducationField"           "EnvironmentSatisfaction" 
    ## [11] "Gender"                   "HourlyRate"              
    ## [13] "JobInvolvement"           "JobLevel"                
    ## [15] "JobRole"                  "JobSatisfaction"         
    ## [17] "MaritalStatus"            "MonthlyIncome"           
    ## [19] "MonthlyRate"              "NumCompaniesWorked"      
    ## [21] "OverTime"                 "PercentSalaryHike"       
    ## [23] "PerformanceRating"        "RelationshipSatisfaction"
    ## [25] "StandardHours"            "StockOptionLevel"        
    ## [27] "TotalWorkingYears"        "TrainingTimesLastYear"   
    ## [29] "WorkLifeBalance"          "YearsAtCompany"          
    ## [31] "YearsInCurrentRole"       "YearsSinceLastPromotion" 
    ## [33] "YearsWithCurrManager"
    ## n= 2058 
    ## 
    ## node), split, n, loss, yval, (yprob)
    ##       * denotes terminal node
    ## 
    ##   1) root 2058 330 No (0.839650146 0.160349854)  
    ##     2) OverTime=No 1474 157 No (0.893487110 0.106512890)  
    ##       4) TotalWorkingYears>=2.5 1346 114 No (0.915304606 0.084695394)  
    ##         8) StockOptionLevel=1,2 715  30 No (0.958041958 0.041958042)  
    ##          16) YearsWithCurrManager>=1.5 590  17 No (0.971186441 0.028813559)  
    ##            32) TotalWorkingYears< 33.5 583  15 No (0.974271012 0.025728988)  
    ##              64) PercentSalaryHike>=11.5 503   8 No (0.984095427 0.015904573)  
    ##               128) HourlyRate< 90.5 404   3 No (0.992574257 0.007425743) *
    ##               129) HourlyRate>=90.5 99   5 No (0.949494949 0.050505051)  
    ##                 258) JobRole=Healthcare Representative,Manager,Manufacturing Director,Research Director,Research Scientist,Sales Executive 74   0 No (1.000000000 0.000000000) *
    ##                 259) JobRole=Human Resources,Laboratory Technician,Sales Representative 25   5 No (0.800000000 0.200000000)  
    ##                   518) HourlyRate>=93.5 17   0 No (1.000000000 0.000000000) *
    ##                   519) HourlyRate< 93.5 8   3 Yes (0.375000000 0.625000000) *
    ##              65) PercentSalaryHike< 11.5 80   7 No (0.912500000 0.087500000) *
    ##            33) TotalWorkingYears>=33.5 7   2 No (0.714285714 0.285714286) *
    ##          17) YearsWithCurrManager< 1.5 125  13 No (0.896000000 0.104000000)  
    ##            34) MonthlyIncome>=2375.5 113   8 No (0.929203540 0.070796460)  
    ##              68) BusinessTravel=Non-Travel,Travel_Rarely 93   3 No (0.967741935 0.032258065) *
    ##              69) BusinessTravel=Travel_Frequently 20   5 No (0.750000000 0.250000000)  
    ##               138) Age< 36.5 11   0 No (1.000000000 0.000000000) *
    ##               139) Age>=36.5 9   4 Yes (0.444444444 0.555555556) *
    ##            35) MonthlyIncome< 2375.5 12   5 No (0.583333333 0.416666667) *
    ##         9) StockOptionLevel=0,3 631  84 No (0.866877971 0.133122029)  
    ##          18) Age>=33.5 389  31 No (0.920308483 0.079691517)  
    ##            36) WorkLifeBalance=2,3,4 370  25 No (0.932432432 0.067567568)  
    ##              72) JobRole=Healthcare Representative,Human Resources,Manager,Manufacturing Director 133   1 No (0.992481203 0.007518797) *
    ##              73) JobRole=Laboratory Technician,Research Director,Research Scientist,Sales Executive,Sales Representative 237  24 No (0.898734177 0.101265823)  
    ##               146) HourlyRate< 81.5 185  12 No (0.935135135 0.064864865) *
    ##               147) HourlyRate>=81.5 52  12 No (0.769230769 0.230769231)  
    ##                 294) JobSatisfaction=2,3,4 40   5 No (0.875000000 0.125000000) *
    ##                 295) JobSatisfaction=1 12   5 Yes (0.416666667 0.583333333) *
    ##            37) WorkLifeBalance=1 19   6 No (0.684210526 0.315789474) *
    ##          19) Age< 33.5 242  53 No (0.780991736 0.219008264)  
    ##            38) NumCompaniesWorked< 4.5 188  23 No (0.877659574 0.122340426)  
    ##              76) JobLevel=1,2,4 165  14 No (0.915151515 0.084848485) *
    ##              77) JobLevel=3 23   9 No (0.608695652 0.391304348)  
    ##               154) DailyRate< 1031.5 15   2 No (0.866666667 0.133333333) *
    ##               155) DailyRate>=1031.5 8   1 Yes (0.125000000 0.875000000) *
    ##            39) NumCompaniesWorked>=4.5 54  24 Yes (0.444444444 0.555555556)  
    ##              78) DailyRate>=467 41  17 No (0.585365854 0.414634146)  
    ##               156) HourlyRate>=47.5 31   8 No (0.741935484 0.258064516)  
    ##                 312) TotalWorkingYears>=4.5 23   2 No (0.913043478 0.086956522) *
    ##                 313) TotalWorkingYears< 4.5 8   2 Yes (0.250000000 0.750000000) *
    ##               157) HourlyRate< 47.5 10   1 Yes (0.100000000 0.900000000) *
    ##              79) DailyRate< 467 13   0 Yes (0.000000000 1.000000000) *
    ##       5) TotalWorkingYears< 2.5 128  43 No (0.664062500 0.335937500)  
    ##        10) JobInvolvement=2,3,4 118  33 No (0.720338983 0.279661017)  
    ##          20) MonthlyRate< 24646 105  23 No (0.780952381 0.219047619)  
    ##            40) Department=Research & Development,Sales 98  17 No (0.826530612 0.173469388)  
    ##              80) WorkLifeBalance=1,3 68   6 No (0.911764706 0.088235294) *
    ##              81) WorkLifeBalance=2,4 30  11 No (0.633333333 0.366666667)  
    ##               162) DistanceFromHome< 7.5 20   3 No (0.850000000 0.150000000) *
    ##               163) DistanceFromHome>=7.5 10   2 Yes (0.200000000 0.800000000) *
    ##            41) Department=Human Resources 7   1 Yes (0.142857143 0.857142857) *
    ##          21) MonthlyRate>=24646 13   3 Yes (0.230769231 0.769230769) *
    ##        11) JobInvolvement=1 10   0 Yes (0.000000000 1.000000000) *
    ##     3) OverTime=Yes 584 173 No (0.703767123 0.296232877)  
    ##       6) MonthlyIncome>=2475 487 110 No (0.774127310 0.225872690)  
    ##        12) MaritalStatus=Divorced,Married 337  53 No (0.842729970 0.157270030)  
    ##          24) JobRole=Healthcare Representative,Manager,Manufacturing Director,Research Director,Research Scientist,Sales Executive 271  31 No (0.885608856 0.114391144)  
    ##            48) DistanceFromHome< 12.5 189  11 No (0.941798942 0.058201058) *
    ##            49) DistanceFromHome>=12.5 82  20 No (0.756097561 0.243902439)  
    ##              98) YearsInCurrentRole>=6.5 34   2 No (0.941176471 0.058823529) *
    ##              99) YearsInCurrentRole< 6.5 48  18 No (0.625000000 0.375000000)  
    ##               198) YearsSinceLastPromotion< 1.5 30   5 No (0.833333333 0.166666667) *
    ##               199) YearsSinceLastPromotion>=1.5 18   5 Yes (0.277777778 0.722222222) *
    ##          25) JobRole=Human Resources,Laboratory Technician,Sales Representative 66  22 No (0.666666667 0.333333333)  
    ##            50) JobLevel=2 18   0 No (1.000000000 0.000000000) *
    ##            51) JobLevel=1,3 48  22 No (0.541666667 0.458333333)  
    ##             102) MonthlyRate< 10019 23   4 No (0.826086957 0.173913043)  
    ##               204) EducationField=Life Sciences,Medical,Other 17   0 No (1.000000000 0.000000000) *
    ##               205) EducationField=Marketing,Technical Degree 6   2 Yes (0.333333333 0.666666667) *
    ##             103) MonthlyRate>=10019 25   7 Yes (0.280000000 0.720000000)  
    ##               206) YearsAtCompany< 3.5 13   6 No (0.538461538 0.461538462) *
    ##               207) YearsAtCompany>=3.5 12   0 Yes (0.000000000 1.000000000) *
    ##        13) MaritalStatus=Single 150  57 No (0.620000000 0.380000000)  
    ##          26) JobRole=Healthcare Representative,Human Resources,Manager,Manufacturing Director,Research Director,Research Scientist 81  17 No (0.790123457 0.209876543)  
    ##            52) JobLevel=2,3,4 48   2 No (0.958333333 0.041666667) *
    ##            53) JobLevel=1,5 33  15 No (0.545454545 0.454545455)  
    ##             106) HourlyRate< 86.5 24   7 No (0.708333333 0.291666667)  
    ##               212) YearsSinceLastPromotion< 5 16   0 No (1.000000000 0.000000000) *
    ##               213) YearsSinceLastPromotion>=5 8   1 Yes (0.125000000 0.875000000) *
    ##             107) HourlyRate>=86.5 9   1 Yes (0.111111111 0.888888889) *
    ##          27) JobRole=Laboratory Technician,Sales Executive,Sales Representative 69  29 Yes (0.420289855 0.579710145)  
    ##            54) TrainingTimesLastYear>=2.5 28  10 No (0.642857143 0.357142857)  
    ##             108) EducationField=Life Sciences,Medical 15   1 No (0.933333333 0.066666667) *
    ##             109) EducationField=Marketing,Other,Technical Degree 13   4 Yes (0.307692308 0.692307692) *
    ##            55) TrainingTimesLastYear< 2.5 41  11 Yes (0.268292683 0.731707317)  
    ##             110) MonthlyRate< 8860.5 14   6 No (0.571428571 0.428571429) *
    ##             111) MonthlyRate>=8860.5 27   3 Yes (0.111111111 0.888888889) *
    ##       7) MonthlyIncome< 2475 97  34 Yes (0.350515464 0.649484536)  
    ##        14) DailyRate>=888 46  19 No (0.586956522 0.413043478)  
    ##          28) YearsInCurrentRole>=2.5 16   1 No (0.937500000 0.062500000) *
    ##          29) YearsInCurrentRole< 2.5 30  12 Yes (0.400000000 0.600000000)  
    ##            58) WorkLifeBalance=2 6   0 No (1.000000000 0.000000000) *
    ##            59) WorkLifeBalance=1,3,4 24   6 Yes (0.250000000 0.750000000) *
    ##        15) DailyRate< 888 51   7 Yes (0.137254902 0.862745098) *
    ## [1] "rpart"
    ## Warning: labs do not fit even at cex 0.15, there may be some overplotting

    ## Warning: labs do not fit even at cex 0.15, there may be some overplotting

    ## 
    ## Classification tree:
    ## rpart(formula = Attrition ~ ., data = train[, -1], method = "class", 
    ##     model = TRUE, control = control.parameter)
    ## 
    ## Variables actually used in tree construction:
    ##  [1] Age                     BusinessTravel         
    ##  [3] DailyRate               Department             
    ##  [5] DistanceFromHome        EducationField         
    ##  [7] HourlyRate              JobInvolvement         
    ##  [9] JobLevel                JobRole                
    ## [11] JobSatisfaction         MaritalStatus          
    ## [13] MonthlyIncome           MonthlyRate            
    ## [15] NumCompaniesWorked      OverTime               
    ## [17] PercentSalaryHike       StockOptionLevel       
    ## [19] TotalWorkingYears       TrainingTimesLastYear  
    ## [21] WorkLifeBalance         YearsAtCompany         
    ## [23] YearsInCurrentRole      YearsSinceLastPromotion
    ## [25] YearsWithCurrManager   
    ## 
    ## Root node error: 330/2058 = 0.16035
    ## 
    ## n= 2058 
    ## 
    ##           CP nsplit rel error  xerror     xstd
    ## 1  0.0439394      0   1.00000 1.00000 0.050442
    ## 2  0.0242424      2   0.91212 0.98788 0.050193
    ## 3  0.0181818      3   0.88788 0.93030 0.048975
    ## 4  0.0166667      5   0.85152 0.91212 0.048577
    ## 5  0.0151515      8   0.79394 0.91818 0.048711
    ## 6  0.0127273     13   0.71212 0.88788 0.048036
    ## 7  0.0121212     18   0.64848 0.87273 0.047692
    ## 8  0.0111111     19   0.63636 0.86970 0.047623
    ## 9  0.0106061     22   0.60303 0.86970 0.047623
    ## 10 0.0090909     25   0.56364 0.87273 0.047692
    ## 11 0.0080808     29   0.52727 0.86061 0.047413
    ## 12 0.0060606     32   0.50303 0.86364 0.047483
    ## 13 0.0030303     34   0.49091 0.88788 0.048036
    ## 14 0.0015152     35   0.48788 0.90606 0.048443
    ## 15 0.0010101     39   0.48182 0.92727 0.048909
    ## 16 0.0001000     48   0.47273 0.94545 0.049302

    ## n= 2058 
    ## 
    ## node), split, n, loss, yval, (yprob)
    ##       * denotes terminal node
    ## 
    ##   1) root 2058 330 No (0.83965015 0.16034985)  
    ##     2) OverTime=No 1474 157 No (0.89348711 0.10651289)  
    ##       4) TotalWorkingYears>=2.5 1346 114 No (0.91530461 0.08469539)  
    ##         8) StockOptionLevel=1,2 715  30 No (0.95804196 0.04195804) *
    ##         9) StockOptionLevel=0,3 631  84 No (0.86687797 0.13312203)  
    ##          18) Age>=33.5 389  31 No (0.92030848 0.07969152) *
    ##          19) Age< 33.5 242  53 No (0.78099174 0.21900826)  
    ##            38) NumCompaniesWorked< 4.5 188  23 No (0.87765957 0.12234043)  
    ##              76) JobLevel=1,2,4 165  14 No (0.91515152 0.08484848) *
    ##              77) JobLevel=3 23   9 No (0.60869565 0.39130435)  
    ##               154) DailyRate< 1031.5 15   2 No (0.86666667 0.13333333) *
    ##               155) DailyRate>=1031.5 8   1 Yes (0.12500000 0.87500000) *
    ##            39) NumCompaniesWorked>=4.5 54  24 Yes (0.44444444 0.55555556)  
    ##              78) DailyRate>=467 41  17 No (0.58536585 0.41463415)  
    ##               156) HourlyRate>=47.5 31   8 No (0.74193548 0.25806452)  
    ##                 312) TotalWorkingYears>=4.5 23   2 No (0.91304348 0.08695652) *
    ##                 313) TotalWorkingYears< 4.5 8   2 Yes (0.25000000 0.75000000) *
    ##               157) HourlyRate< 47.5 10   1 Yes (0.10000000 0.90000000) *
    ##              79) DailyRate< 467 13   0 Yes (0.00000000 1.00000000) *
    ##       5) TotalWorkingYears< 2.5 128  43 No (0.66406250 0.33593750)  
    ##        10) JobInvolvement=2,3,4 118  33 No (0.72033898 0.27966102)  
    ##          20) MonthlyRate< 24646 105  23 No (0.78095238 0.21904762)  
    ##            40) Department=Research & Development,Sales 98  17 No (0.82653061 0.17346939)  
    ##              80) WorkLifeBalance=1,3 68   6 No (0.91176471 0.08823529) *
    ##              81) WorkLifeBalance=2,4 30  11 No (0.63333333 0.36666667)  
    ##               162) DistanceFromHome< 7.5 20   3 No (0.85000000 0.15000000) *
    ##               163) DistanceFromHome>=7.5 10   2 Yes (0.20000000 0.80000000) *
    ##            41) Department=Human Resources 7   1 Yes (0.14285714 0.85714286) *
    ##          21) MonthlyRate>=24646 13   3 Yes (0.23076923 0.76923077) *
    ##        11) JobInvolvement=1 10   0 Yes (0.00000000 1.00000000) *
    ##     3) OverTime=Yes 584 173 No (0.70376712 0.29623288)  
    ##       6) MonthlyIncome>=2475 487 110 No (0.77412731 0.22587269)  
    ##        12) MaritalStatus=Divorced,Married 337  53 No (0.84272997 0.15727003)  
    ##          24) JobRole=Healthcare Representative,Manager,Manufacturing Director,Research Director,Research Scientist,Sales Executive 271  31 No (0.88560886 0.11439114)  
    ##            48) DistanceFromHome< 12.5 189  11 No (0.94179894 0.05820106) *
    ##            49) DistanceFromHome>=12.5 82  20 No (0.75609756 0.24390244)  
    ##              98) YearsInCurrentRole>=6.5 34   2 No (0.94117647 0.05882353) *
    ##              99) YearsInCurrentRole< 6.5 48  18 No (0.62500000 0.37500000)  
    ##               198) YearsSinceLastPromotion< 1.5 30   5 No (0.83333333 0.16666667) *
    ##               199) YearsSinceLastPromotion>=1.5 18   5 Yes (0.27777778 0.72222222) *
    ##          25) JobRole=Human Resources,Laboratory Technician,Sales Representative 66  22 No (0.66666667 0.33333333)  
    ##            50) JobLevel=2 18   0 No (1.00000000 0.00000000) *
    ##            51) JobLevel=1,3 48  22 No (0.54166667 0.45833333)  
    ##             102) MonthlyRate< 10019 23   4 No (0.82608696 0.17391304)  
    ##               204) EducationField=Life Sciences,Medical,Other 17   0 No (1.00000000 0.00000000) *
    ##               205) EducationField=Marketing,Technical Degree 6   2 Yes (0.33333333 0.66666667) *
    ##             103) MonthlyRate>=10019 25   7 Yes (0.28000000 0.72000000) *
    ##        13) MaritalStatus=Single 150  57 No (0.62000000 0.38000000)  
    ##          26) JobRole=Healthcare Representative,Human Resources,Manager,Manufacturing Director,Research Director,Research Scientist 81  17 No (0.79012346 0.20987654)  
    ##            52) JobLevel=2,3,4 48   2 No (0.95833333 0.04166667) *
    ##            53) JobLevel=1,5 33  15 No (0.54545455 0.45454545)  
    ##             106) HourlyRate< 86.5 24   7 No (0.70833333 0.29166667)  
    ##               212) YearsSinceLastPromotion< 5 16   0 No (1.00000000 0.00000000) *
    ##               213) YearsSinceLastPromotion>=5 8   1 Yes (0.12500000 0.87500000) *
    ##             107) HourlyRate>=86.5 9   1 Yes (0.11111111 0.88888889) *
    ##          27) JobRole=Laboratory Technician,Sales Executive,Sales Representative 69  29 Yes (0.42028986 0.57971014)  
    ##            54) TrainingTimesLastYear>=2.5 28  10 No (0.64285714 0.35714286)  
    ##             108) EducationField=Life Sciences,Medical 15   1 No (0.93333333 0.06666667) *
    ##             109) EducationField=Marketing,Other,Technical Degree 13   4 Yes (0.30769231 0.69230769) *
    ##            55) TrainingTimesLastYear< 2.5 41  11 Yes (0.26829268 0.73170732)  
    ##             110) MonthlyRate< 8860.5 14   6 No (0.57142857 0.42857143) *
    ##             111) MonthlyRate>=8860.5 27   3 Yes (0.11111111 0.88888889) *
    ##       7) MonthlyIncome< 2475 97  34 Yes (0.35051546 0.64948454)  
    ##        14) DailyRate>=888 46  19 No (0.58695652 0.41304348)  
    ##          28) YearsInCurrentRole>=2.5 16   1 No (0.93750000 0.06250000) *
    ##          29) YearsInCurrentRole< 2.5 30  12 Yes (0.40000000 0.60000000)  
    ##            58) WorkLifeBalance=2 6   0 No (1.00000000 0.00000000) *
    ##            59) WorkLifeBalance=1,3,4 24   6 Yes (0.25000000 0.75000000) *
    ##        15) DailyRate< 888 51   7 Yes (0.13725490 0.86274510) *

    ## 
    ## Classification tree:
    ## rpart(formula = Attrition ~ ., data = train[, -c(1)], method = "class", 
    ##     model = TRUE, control = control.parameter1)
    ## 
    ## Variables actually used in tree construction:
    ##  [1] Age                     DailyRate              
    ##  [3] Department              DistanceFromHome       
    ##  [5] EducationField          HourlyRate             
    ##  [7] JobInvolvement          JobLevel               
    ##  [9] JobRole                 MaritalStatus          
    ## [11] MonthlyIncome           MonthlyRate            
    ## [13] NumCompaniesWorked      OverTime               
    ## [15] StockOptionLevel        TotalWorkingYears      
    ## [17] TrainingTimesLastYear   WorkLifeBalance        
    ## [19] YearsInCurrentRole      YearsSinceLastPromotion
    ## 
    ## Root node error: 330/2058 = 0.16035
    ## 
    ## n= 2058 
    ## 
    ##           CP nsplit rel error  xerror     xstd
    ## 1  0.0439394      0   1.00000 1.00000 0.050442
    ## 2  0.0242424      2   0.91212 0.96364 0.049688
    ## 3  0.0181818      3   0.88788 0.96364 0.049688
    ## 4  0.0166667      5   0.85152 0.96364 0.049688
    ## 5  0.0151515      8   0.79394 0.93939 0.049172
    ## 6  0.0127273     13   0.71212 0.92727 0.048909
    ## 7  0.0121212     18   0.64848 0.85152 0.047202
    ## 8  0.0111111     19   0.63636 0.86364 0.047483
    ## 9  0.0106061     22   0.60303 0.86364 0.047483
    ## 10 0.0090909     25   0.56364 0.84848 0.047131
    ## 11 0.0080808     29   0.52727 0.84848 0.047131
    ## 12 0.0060606     32   0.50303 0.84242 0.046989
    ## 13 0.0060606     34   0.49091 0.84545 0.047060

    ## 
    ## Classification tree:
    ## rpart(formula = Attrition ~ ., data = train[, -c(1)], method = "class", 
    ##     model = TRUE, control = control.parameter1)
    ## 
    ## Variables actually used in tree construction:
    ##  [1] Age                     DailyRate              
    ##  [3] Department              DistanceFromHome       
    ##  [5] EducationField          HourlyRate             
    ##  [7] JobInvolvement          JobLevel               
    ##  [9] JobRole                 MaritalStatus          
    ## [11] MonthlyIncome           MonthlyRate            
    ## [13] NumCompaniesWorked      OverTime               
    ## [15] StockOptionLevel        TotalWorkingYears      
    ## [17] TrainingTimesLastYear   WorkLifeBalance        
    ## [19] YearsInCurrentRole      YearsSinceLastPromotion
    ## 
    ## Root node error: 330/2058 = 0.16035
    ## 
    ## n= 2058 
    ## 
    ##           CP nsplit rel error  xerror     xstd
    ## 1  0.0439394      0   1.00000 1.00000 0.050442
    ## 2  0.0242424      2   0.91212 0.96364 0.049688
    ## 3  0.0181818      3   0.88788 0.96364 0.049688
    ## 4  0.0166667      5   0.85152 0.96364 0.049688
    ## 5  0.0151515      8   0.79394 0.93939 0.049172
    ## 6  0.0127273     13   0.71212 0.92727 0.048909
    ## 7  0.0121212     18   0.64848 0.85152 0.047202
    ## 8  0.0111111     19   0.63636 0.86364 0.047483
    ## 9  0.0106061     22   0.60303 0.86364 0.047483
    ## 10 0.0090909     25   0.56364 0.84848 0.047131
    ## 11 0.0080808     29   0.52727 0.84848 0.047131
    ## 12 0.0060606     32   0.50303 0.84242 0.046989

    ##  905  758 1623  166 1376 1420 
    ##   No   No   No   No  Yes   No 
    ## Levels: No Yes
    ##      predictTrain
    ##         No  Yes
    ##   No  1676   52
    ##   Yes  114  216
    ## Warning: package 'ROCR' was built under R version 3.5.1
    ## Loading required package: gplots
    ## Warning: package 'gplots' was built under R version 3.5.1
    ## 
    ## Attaching package: 'gplots'
    ## The following object is masked from 'package:stats':
    ## 
    ##     lowess

    ##          predict.class
    ## Attrition   No  Yes
    ##       No  1676   52
    ##       Yes  114  216
    ## [1] "True Positive Rate/Precision: 0.806"
    ## [1] "Sensistivity/Recall Rate: 0.806"
    ## [1] "False Positive Rate: 0.0637"
    ## [1] "Accuracy: 0.9193"
    ## [1] "Specificity/TNR: 0.9363"
    ## [1] "Area under ROC Curve: 0.856222818462402"
    ## [1] "KS: 0.626262626262626"
    ## [1] "Gini Index 0.59820508289896"

    CART Model performance on Test Data

    ##      predictTest
    ##        No Yes
    ##   No  710  28
    ##   Yes  68  76

    ##          predict.class
    ## Attrition  No Yes
    ##       No  710  28
    ##       Yes  68  76
    ## [1] "True Positive Rate/Precision: 0.5278"
    ## [1] "Sensistivity/Recall Rate: 0.5278"
    ## [1] "False Positive Rate: 0.0379"
    ## [1] "Accuracy: 0.8912"
    ## [1] "Specificity/TNR: 0.9621"
    ## [1] "Area under ROC Curve: 0.7742"
    ## [1] "KS: 0.6263"
    ## [1] "Gini Index 0.5862"

    CART Model has 89.12% accuracy, Sensitivity of 52.78%, KS value 0.62 and Gini index of 0.586 on test samples. Let us now evaluate the performance of the Random Forest Model

  9. Employee Attrition prediction using Random Forest

    ## Warning: package 'randomForest' was built under R version 3.5.1
    ## randomForest 4.6-14
    ## Type rfNews() to see new features/changes/bug fixes.
    ## 
    ## Attaching package: 'randomForest'
    ## The following object is masked from 'package:rattle':
    ## 
    ##     importance
    ## The following object is masked from 'package:ggplot2':
    ## 
    ##     margin
    ## [1] "No of observations in training dataset: 2058"
    ## [1] "No of observations in training dataset: 882"
    ## 'data.frame':    2058 obs. of  33 variables:
    ##  $ EmployeeNumber          : int  905 758 1623 166 1376 1420 2384 1087 1603 500 ...
    ##  $ Attrition               : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 1 2 1 2 1 ...
    ##  $ Age                     : int  48 34 53 50 32 42 45 50 31 33 ...
    ##  $ BusinessTravel          : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 3 3 3 2 3 3 2 3 3 ...
    ##  $ DailyRate               : int  715 216 1436 1452 238 557 1449 333 542 1216 ...
    ##  $ Department              : Factor w/ 3 levels "Human Resources",..: 2 3 3 2 2 2 3 2 3 3 ...
    ##  $ DistanceFromHome        : int  1 1 6 11 5 18 2 22 20 8 ...
    ##  $ Education               : Factor w/ 5 levels "1","2","3","4",..: 3 4 2 3 2 4 3 5 3 4 ...
    ##  $ EducationField          : Factor w/ 6 levels "Human Resources",..: 2 3 3 2 2 2 3 4 2 3 ...
    ##  $ EnvironmentSatisfaction : Factor w/ 4 levels "1","2","3","4": 4 2 2 3 1 4 1 3 2 3 ...
    ##  $ Gender                  : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 2 1 2 1 2 ...
    ##  $ HourlyRate              : int  76 75 34 53 47 35 94 88 71 39 ...
    ##  $ JobInvolvement          : Factor w/ 4 levels "1","2","3","4": 2 4 3 3 4 3 1 1 1 3 ...
    ##  $ JobLevel                : Factor w/ 5 levels "1","2","3","4",..: 5 2 2 5 1 2 5 4 2 2 ...
    ##  $ JobRole                 : Factor w/ 9 levels "Healthcare Representative",..: 6 8 9 4 7 7 4 6 8 8 ...
    ##  $ JobSatisfaction         : Factor w/ 4 levels "1","2","3","4": 4 4 3 2 3 1 2 4 3 3 ...
    ##  $ MaritalStatus           : Factor w/ 3 levels "Divorced","Married",..: 3 1 2 3 3 1 3 3 2 1 ...
    ##  $ MonthlyIncome           : int  18265 9725 2306 19926 2432 5410 18824 14411 4559 7104 ...
    ##  $ MonthlyRate             : int  8733 12278 16047 17053 15318 11189 2493 24450 24788 20431 ...
    ##  $ NumCompaniesWorked      : int  6 0 2 3 3 6 2 1 3 0 ...
    ##  $ OverTime                : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 2 2 2 1 ...
    ##  $ PercentSalaryHike       : int  12 11 20 15 14 17 16 13 11 12 ...
    ##  $ PerformanceRating       : Factor w/ 2 levels "3","4": 1 1 2 1 1 1 1 1 1 1 ...
    ##  $ RelationshipSatisfaction: Factor w/ 4 levels "1","2","3","4": 3 4 4 2 1 3 1 4 3 4 ...
    ##  $ StandardHours           : int  80 80 80 80 80 80 80 80 80 80 ...
    ##  $ StockOptionLevel        : Factor w/ 4 levels "0","1","2","3": 1 2 2 1 1 2 1 1 2 1 ...
    ##  $ TotalWorkingYears       : int  25 16 13 21 8 9 26 32 4 6 ...
    ##  $ TrainingTimesLastYear   : int  3 2 3 5 2 3 2 2 2 3 ...
    ##  $ WorkLifeBalance         : Factor w/ 4 levels "1","2","3","4": 4 2 1 3 3 2 3 3 3 3 ...
    ##  $ YearsAtCompany          : int  1 15 7 5 4 4 24 32 2 5 ...
    ##  $ YearsInCurrentRole      : int  0 1 7 4 1 3 10 6 2 0 ...
    ##  $ YearsSinceLastPromotion : int  0 0 4 4 0 1 1 13 2 1 ...
    ##  $ YearsWithCurrManager    : int  0 9 5 4 3 2 11 9 2 2 ...
    ## starting httpd help server ...
    ##  done
    ## 
    ## Call:
    ##  randomForest(formula = Attrition ~ ., data = trainRF[, -1], ntree = 301,      mtry = 3, nodesize = 2, importance = TRUE) 
    ##                Type of random forest: classification
    ##                      Number of trees: 301
    ## No. of variables tried at each split: 3
    ## 
    ##         OOB estimate of  error rate: 4.91%
    ## Confusion matrix:
    ##       No Yes class.error
    ## No  1726   2 0.001157407
    ## Yes   99 231 0.300000000

    ## 
    ## Call:
    ##  randomForest(formula = Attrition ~ ., data = trainRF[, -1], ntree = 50,      mtry = 3, nodesize = 2, importance = TRUE) 
    ##                Type of random forest: classification
    ##                      Number of trees: 50
    ## No. of variables tried at each split: 3
    ## 
    ##         OOB estimate of  error rate: 5.49%
    ## Confusion matrix:
    ##       No Yes class.error
    ## No  1724   4 0.002314815
    ## Yes  109 221 0.330303030

    ##                            No   Yes MeanDecreaseAccuracy MeanDecreaseGini
    ## EducationField           8.74 11.58                11.61            18.83
    ## DailyRate                7.55  9.47                11.38            26.24
    ## Age                      9.55  9.05                11.36            33.75
    ## TotalWorkingYears        9.26  8.14                11.33            28.17
    ## OverTime                 9.32  9.86                11.04            21.55
    ## JobInvolvement           7.52  6.91                10.09            14.31
    ## HourlyRate               7.01  9.15                 9.56            26.28
    ## JobRole                  7.85  8.68                 9.51            24.19
    ## JobSatisfaction          7.00  9.05                 9.16            17.33
    ## WorkLifeBalance          6.32  7.56                 8.93            15.52
    ## MonthlyRate              7.15  6.73                 8.87            26.82
    ## StockOptionLevel         7.24  8.69                 8.70            15.55
    ## RelationshipSatisfaction 5.75  9.13                 8.28            15.67
    ## EnvironmentSatisfaction  6.64  8.53                 8.24            19.13
    ## BusinessTravel           6.79  7.56                 8.06            12.60
    ## MonthlyIncome            5.81  9.40                 8.05            32.37
    ## YearsAtCompany           6.33  7.86                 7.96            20.90
    ## PercentSalaryHike        6.27  7.52                 7.78            18.22
    ## YearsWithCurrManager     5.47  7.06                 7.41            18.16
    ## JobLevel                 6.32  6.61                 7.34            12.00
    ## NumCompaniesWorked       6.65  6.16                 7.06            17.51
    ## TrainingTimesLastYear    5.51  6.08                 6.81            12.52
    ## Education                5.62  5.49                 6.61            14.78
    ## DistanceFromHome         4.68  7.63                 6.54            21.20
    ## MaritalStatus            5.93  6.03                 6.36            11.69
    ## YearsInCurrentRole       5.21  4.77                 6.26            13.13
    ## Gender                   3.84  4.67                 5.70             4.86
    ## Department               4.40  5.49                 5.51             6.87
    ## YearsSinceLastPromotion  3.52  5.97                 4.75            11.95
    ## PerformanceRating        3.74  2.43                 4.06             2.17
    ## StandardHours            0.00  0.00                 0.00             0.00
    ##  [1] "EmployeeNumber"           "Attrition"               
    ##  [3] "Age"                      "BusinessTravel"          
    ##  [5] "DailyRate"                "Department"              
    ##  [7] "DistanceFromHome"         "Education"               
    ##  [9] "EducationField"           "EnvironmentSatisfaction" 
    ## [11] "Gender"                   "HourlyRate"              
    ## [13] "JobInvolvement"           "JobLevel"                
    ## [15] "JobRole"                  "JobSatisfaction"         
    ## [17] "MaritalStatus"            "MonthlyIncome"           
    ## [19] "MonthlyRate"              "NumCompaniesWorked"      
    ## [21] "OverTime"                 "PercentSalaryHike"       
    ## [23] "PerformanceRating"        "RelationshipSatisfaction"
    ## [25] "StandardHours"            "StockOptionLevel"        
    ## [27] "TotalWorkingYears"        "TrainingTimesLastYear"   
    ## [29] "WorkLifeBalance"          "YearsAtCompany"          
    ## [31] "YearsInCurrentRole"       "YearsSinceLastPromotion" 
    ## [33] "YearsWithCurrManager"
    ## mtry = 5  OOB error = 6.41% 
    ## Searching left ...
    ## mtry = 3     OOB error = 6.85% 
    ## -0.06818182 0.005 
    ## Searching right ...
    ## mtry = 10    OOB error = 6.22% 
    ## 0.03030303 0.005 
    ## mtry = 20    OOB error = 6.12% 
    ## 0.015625 0.005 
    ## mtry = 29    OOB error = 5.83% 
    ## 0.04761905 0.005

    ##                                    No          Yes MeanDecreaseAccuracy
    ## OverTime                 3.421312e-02 0.1497481075         0.0525983030
    ## MonthlyIncome            2.305700e-02 0.0868095144         0.0332689226
    ## StockOptionLevel         2.053927e-02 0.0883051888         0.0313137840
    ## JobRole                  2.199264e-02 0.0674163280         0.0292430100
    ## Age                      1.776303e-02 0.0701639178         0.0260889968
    ## YearsAtCompany           1.307768e-02 0.0372733649         0.0169107197
    ## YearsWithCurrManager     1.063692e-02 0.0354600267         0.0145660908
    ## JobLevel                 8.565560e-03 0.0451302376         0.0143403514
    ## JobSatisfaction          9.510402e-03 0.0328168496         0.0132370804
    ## DistanceFromHome         7.871728e-03 0.0394132275         0.0129167246
    ## DailyRate                9.211615e-03 0.0296247003         0.0124675638
    ## EnvironmentSatisfaction  7.873530e-03 0.0345239847         0.0121113658
    ## NumCompaniesWorked       7.351471e-03 0.0347614642         0.0117274889
    ## EducationField           7.763148e-03 0.0299764154         0.0113065589
    ## HourlyRate               6.465411e-03 0.0251595161         0.0094442778
    ## RelationshipSatisfaction 5.686965e-03 0.0222657718         0.0083454513
    ## MonthlyRate              5.311772e-03 0.0216995825         0.0079193504
    ## WorkLifeBalance          5.389711e-03 0.0205250929         0.0078118417
    ## PercentSalaryHike        5.049804e-03 0.0153337177         0.0066869957
    ## YearsSinceLastPromotion  4.507368e-03 0.0178477923         0.0066511095
    ## MaritalStatus            4.370844e-03 0.0181945840         0.0065769187
    ## BusinessTravel           4.349745e-03 0.0160812637         0.0062225412
    ## JobInvolvement           4.087245e-03 0.0174524284         0.0062126648
    ## YearsInCurrentRole       3.815498e-03 0.0110609897         0.0049648003
    ## TrainingTimesLastYear    3.199792e-03 0.0126043172         0.0046966303
    ## Education                2.778793e-03 0.0074466212         0.0035200736
    ## Department               7.324137e-04 0.0026808524         0.0010468276
    ## Gender                   2.589403e-04 0.0011203404         0.0003930719
    ## PerformanceRating        8.542784e-05 0.0002876599         0.0001171267
    ##                          MeanDecreaseGini
    ## OverTime                       32.1663063
    ## MonthlyIncome                  48.9927319
    ## StockOptionLevel               20.2533097
    ## JobRole                        30.2818387
    ## Age                            36.0998391
    ## YearsAtCompany                 20.4645003
    ## YearsWithCurrManager           15.4587802
    ## JobLevel                       11.7340973
    ## JobSatisfaction                16.5851081
    ## DistanceFromHome               25.7156570
    ## DailyRate                      31.7446777
    ## EnvironmentSatisfaction        19.6163238
    ## NumCompaniesWorked             19.2439180
    ## EducationField                 18.9344575
    ## HourlyRate                     23.2559919
    ## RelationshipSatisfaction       14.1076408
    ## MonthlyRate                    23.4294266
    ## WorkLifeBalance                14.8967726
    ## PercentSalaryHike              13.8480141
    ## YearsSinceLastPromotion        12.2889447
    ## MaritalStatus                   8.3304856
    ## BusinessTravel                 10.6497942
    ## JobInvolvement                 14.2958563
    ## YearsInCurrentRole              8.6512194
    ## TrainingTimesLastYear          12.7096981
    ## Education                       9.2211762
    ## Department                      1.7209162
    ## Gender                          1.2697352
    ## PerformanceRating               0.2669939

    ##      predictionRf
    ##         No  Yes
    ##   No  1728    0
    ##   Yes    6  324
    ## [1] "True Positive Rate/Precision: 0.9758"
    ## [1] "Sensistivity/Recall Rate: 0.9758"
    ## [1] "False Positive Rate: 0"
    ## [1] "Accuracy: 0.9961"
    ## [1] "Specificity/TNR: 1"

    ## [1] "KSRF: 1"
    ## [1] "Gini Index 0.736"

    ## [1] "KSRF - Test : 0.8855"
    ## [1] "Gini Index 0.6309"
    ##      predictionRf.test
    ##        No Yes
    ##   No  734   4
    ##   Yes  30 114
    ## [1] "True Positive Rate/Precision: 0.7778"
    ## [1] "Sensistivity/Recall Rate: 0.7778"
    ## [1] "False Positive Rate: 0.0054"
    ## [1] "Accuracy: 0.9592"
    ## [1] "Specificity/TNR: 0.9946"