1. Introduction

With many tech companies and start ups coming up there is a huge wave of employees “switching” companies.Losing a single employee can cost tens of thousands of dollars, which means a wave of employee turnover can quickly run up costs ranging into the hundreds of thousands.No one can afford that kind of hit financially or to productivity. It has become a common belief of the managers that it is only because of not so good bosses or work environment but people’s perceptions are often far waway from the reality. so, the question arises why this trend is so regular? This paper specifically addresses the situation in a different perspective. The paper tries to study “who” exactly are leaving the company in large numbers, the good/valuable ones or the employees who do not work and also how factors like promotion, satisfaction level , evaluation scores, number of projects handled and time spent at the company.In this paper we will evaluate if promotion is the major push for “valuable” employees to leave the company.

2. Overview Of The Data

The data used in this paper is obtained from a secondary source.(kaggle website) This data of 15000 employees has been taken from a company by the host website. It has 10 columns which are Satisfaction Level, Last evaluation, Number of projects, Average monthly hours, Time spent at the company, Whether they have had a work accident, Whether they have had a promotion in the last 5 years, Departments (column sales), Salary and Whether the employee has left. Satisfaction Level, Last evaluation, Number of projects, Average monthly hours and Time spent at the company are numeric data type where satisfaction level ranges from 0-1 , 1 being highest level of satisfaction while Whether they have had a work accident, Whether they have had a promotion in the last 5 years and Whether the employee has left are numeric but binary, that for work accident 0 implies no accident while 1 implies occurance of accident, for promotion 0 implies no promotion while 1 implies promotion of the employee and for leaving the company code 0 implies wmployee hasn’t left the job while 1 represents he has left the job.Other things Departments (column sales) and Salary are strings.

3. Overview of the Study

The study defines the good employees or valuable employees on the basis of their last evaluation scores, time spend by them with the company and number of project handled . So our data shall be dealing with those valuable employees who have an evaluation score of 0.6 or higher, time spend with the company is 4 years or more and number of projects handled are 4 or more but yet have left the company. We consider if factors such as satisfaction level, promotion and average number of hours spend in work (on monthly basis). Our regression analysis shall show that people who have left is depending on low satisfaction level, higher working hours and no promotion recieved in last 5 years. So, in this study , we analyse if satisfaction level, number of working hours and promotion affects the employees. And accordingly we construct the hypotheses

Hypothesis H1: Satisfaction level, number of working hours and promotion affects the employees that is these are the reasons making people leave the company

3.1 Model

In order to test Hypothesis 1a, we proposed the following model:

\[Left= \beta_0 + \beta_1 Promotion in last 5 years + \beta_2 average monthly working hours + \beta_3 satisfaction level + \epsilon\] # Read the data

setwd("~/R")
mydata <- read.csv(paste("HR_comma_sep.csv", sep = ""))
View(mydata)

OLS MODEL

m1 <- lm(left~ promotion_last_5years + satisfaction_level + average_montly_hours, data=mydata)
summary(m1)
## 
## Call:
## lm(formula = left ~ promotion_last_5years + satisfaction_level + 
##     average_montly_hours, data = mydata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.61348 -0.25560 -0.12630  0.03161  1.06187 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            5.377e-01  1.554e-02  34.597  < 2e-16 ***
## promotion_last_5years -1.526e-01  2.213e-02  -6.893 5.67e-12 ***
## satisfaction_level    -6.609e-01  1.285e-02 -51.441  < 2e-16 ***
## average_montly_hours   5.404e-04  6.394e-05   8.453  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.391 on 14995 degrees of freedom
## Multiple R-squared:  0.1575, Adjusted R-squared:  0.1574 
## F-statistic: 934.7 on 3 and 14995 DF,  p-value: < 2.2e-16

We established the effect of promotion, average monthly hours and satisfaction level on number of people who have left the company with the simplest model. We regressed left on promotion, average monthly hours and satisfaction level. We estimated model, using linear least squares.

3.2 Results

We found empirical support for H1. Factors such as promotion, satisfaction level and average monthly hours spent by the employees affects an employee’s decision to “switch” or leave the company. On an average employees with low satisfaction level, who work for more hours and didn’t get promotion are most likely to leave the company. The regression analysis using Ordinary Least Squares yielded \(\beta_1>0\), with \(p<0.05\) for promotion in last 5 years, for average monthly hours spent the coefficient \(\beta_2>0\), with \(p<0.05\) and for satisfaction level the coffefficient \(\beta_2>0\), with \(p<0.05\).

4. Conclusion

This study is important for managers to observe what exactly makes valuable employees to leave.Factors such as promotion, satisfaction level and average monthly hours spent by the employees significanty affects the decison of an employee to leave or not.

5. References

www.kaggle.com. HR Data Analysis. www.wikipedia.com www.investopedia.com www.firstround.com

Appendix 1

Descriptive Statistics

setwd("~/R")
mydata <- read.csv(paste("HR_comma_sep.csv", sep = ""))
View(mydata)
dim(mydata)
## [1] 14999    10
library(psych)
## Warning: package 'psych' was built under R version 3.4.3
describe(mydata)
##                       vars     n   mean    sd median trimmed   mad   min
## satisfaction_level       1 14999   0.61  0.25   0.64    0.63  0.28  0.09
## last_evaluation          2 14999   0.72  0.17   0.72    0.72  0.22  0.36
## number_project           3 14999   3.80  1.23   4.00    3.74  1.48  2.00
## average_montly_hours     4 14999 201.05 49.94 200.00  200.64 65.23 96.00
## time_spend_company       5 14999   3.50  1.46   3.00    3.28  1.48  2.00
## Work_accident            6 14999   0.14  0.35   0.00    0.06  0.00  0.00
## left                     7 14999   0.24  0.43   0.00    0.17  0.00  0.00
## promotion_last_5years    8 14999   0.02  0.14   0.00    0.00  0.00  0.00
## sales*                   9 14999   6.94  2.75   8.00    7.23  2.97  1.00
## salary*                 10 14999   2.35  0.63   2.00    2.41  1.48  1.00
##                       max  range  skew kurtosis   se
## satisfaction_level      1   0.91 -0.48    -0.67 0.00
## last_evaluation         1   0.64 -0.03    -1.24 0.00
## number_project          7   5.00  0.34    -0.50 0.01
## average_montly_hours  310 214.00  0.05    -1.14 0.41
## time_spend_company     10   8.00  1.85     4.77 0.01
## Work_accident           1   1.00  2.02     2.08 0.00
## left                    1   1.00  1.23    -0.49 0.00
## promotion_last_5years   1   1.00  6.64    42.03 0.00
## sales*                 10   9.00 -0.79    -0.62 0.02
## salary*                 3   2.00 -0.42    -0.67 0.01
summary(mydata)
##  satisfaction_level last_evaluation  number_project  average_montly_hours
##  Min.   :0.0900     Min.   :0.3600   Min.   :2.000   Min.   : 96.0       
##  1st Qu.:0.4400     1st Qu.:0.5600   1st Qu.:3.000   1st Qu.:156.0       
##  Median :0.6400     Median :0.7200   Median :4.000   Median :200.0       
##  Mean   :0.6128     Mean   :0.7161   Mean   :3.803   Mean   :201.1       
##  3rd Qu.:0.8200     3rd Qu.:0.8700   3rd Qu.:5.000   3rd Qu.:245.0       
##  Max.   :1.0000     Max.   :1.0000   Max.   :7.000   Max.   :310.0       
##                                                                          
##  time_spend_company Work_accident         left       
##  Min.   : 2.000     Min.   :0.0000   Min.   :0.0000  
##  1st Qu.: 3.000     1st Qu.:0.0000   1st Qu.:0.0000  
##  Median : 3.000     Median :0.0000   Median :0.0000  
##  Mean   : 3.498     Mean   :0.1446   Mean   :0.2381  
##  3rd Qu.: 4.000     3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :10.000     Max.   :1.0000   Max.   :1.0000  
##                                                      
##  promotion_last_5years         sales         salary    
##  Min.   :0.00000       sales      :4140   high  :1237  
##  1st Qu.:0.00000       technical  :2720   low   :7316  
##  Median :0.00000       support    :2229   medium:6446  
##  Mean   :0.02127       IT         :1227                
##  3rd Qu.:0.00000       product_mng: 902                
##  Max.   :1.00000       marketing  : 858                
##                        (Other)    :2923

One way contingency table

table(mydata$satisfaction_level)
## 
## 0.09  0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19  0.2 0.21 0.22 0.23 
##  195  358  335   30   54   73   76   79   72   63   74   69   67   60   54 
## 0.24 0.25 0.26 0.27 0.28 0.29  0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 
##   80   34   30   30   31   38   39   59   50   36   48   37  139  241  189 
## 0.39  0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49  0.5 0.51 0.52 0.53 
##  175  209  171  155  224  211  203   95   42  149  209  229  187  196  179 
## 0.54 0.55 0.56 0.57 0.58 0.59  0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 
##  185  179  187  210  182  219  193  208  188  209  187  199  228  177  162 
## 0.69  0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79  0.8 0.81 0.82 0.83 
##  209  205  171  230  246  257  226  234  252  241  217  222  220  241  234 
## 0.84 0.85 0.86 0.87 0.88 0.89  0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 
##  247  207  200  225  187  237  220  224  198  169  167  181  203  176  183 
## 0.99    1 
##  172  111
table(mydata$left)
## 
##     0     1 
## 11428  3571
table(mydata$promotion_last_5years)
## 
##     0     1 
## 14680   319
table(mydata$average_montly_hours)
## 
##  96  97  98  99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 
##   6  14  23  11  19  16  17  17  28  17  19  10  18  18  12  26  10  29 
## 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 
##  15  14  10  18  12  10  10  24  11  20  13  19  25  72  65  63  59  69 
## 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 
## 100  87 114 153 104 122  88 120 129 115 112 127 102 134 110 118 123 148 
## 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 
## 108 147 112 122 121 125 153 126 124 121 136  87  96  73  78  78  73  94 
## 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 
##  92  86  76  83  70  96  78  76  81  81  85  73  88  78  75  84  80  93 
## 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 
##  76  68  73  85  75  80  96  67  71  67  79  70  86  79  58  86  80  72 
## 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 
##  68  73  83  71  72  72  72  79  72  71  78  68  76  87  79  85  64  81 
## 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 
##  84  93 112  95  93  77  76  93  59  77  97 102  74  76  83  90 108  96 
## 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 
##  93  85  98 112  98 124 102 108  86  93 100  98  86 101 113 115  87 126 
## 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 
## 110  98 124 102  86 110 111  91 105  88  93 102  93 104  86  88  94  82 
## 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 
##  30  21  35  32  29  34  36  25  24  33  50  30   6  19  15  17  15  13 
## 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 
##  16  12  21   7  13   6  11  24   8   6  17  18  18  14  20  16  18

Boxplots

par(mar = rep(2, 4))
boxplot(satisfaction_level~left,data=mydata, main="Satisfaction level of those left", 
xlab="Satisfaction Level", ylab="Left")

boxplot(average_montly_hours~left,data=mydata, main="Average Monthly Hours Spent by those left",
xlab="Monthly Hours Spent", ylab="Left")

Histograms

left <- mydata[ which(mydata$left=='1') , ]
View(left)

hist(left$satisfaction_level,col="#3090C7", main = "Satisfaction level")

hist(left$average_montly_hours,col="#3090C7", main = "Average montly hours")

Appendix 2

Two Way Contingency Tables

table(mydata\(satisfaction_level, mydata\)left) table(mydata\(average_montly_hours, mydata\)left) table(mydata\(promotion_last_5years, mydata\)left)

Plotting

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
goodpeopleleft <- left %>% filter(last_evaluation >= 0.60 | time_spend_company >= 4 | number_project > 4)
## Warning: package 'bindrcpp' was built under R version 3.4.3
View(goodpeopleleft)
goodpeopleleft2 <- mydata %>% filter(last_evaluation >= 0.60 | time_spend_company >= 4 | number_project > 4)
goodpeople <- goodpeopleleft2 %>% select(satisfaction_level, number_project: promotion_last_5years)
M <- cor(goodpeople)
library(corrplot)
## Warning: package 'corrplot' was built under R version 3.4.3
## corrplot 0.84 loaded
corrplot(M, method="number")