With many tech companies and start ups coming up there is a huge wave of employees “switching” companies.Losing a single employee can cost tens of thousands of dollars, which means a wave of employee turnover can quickly run up costs ranging into the hundreds of thousands.No one can afford that kind of hit financially or to productivity. It has become a common belief of the managers that it is only because of not so good bosses or work environment but people’s perceptions are often far waway from the reality. so, the question arises why this trend is so regular? This paper specifically addresses the situation in a different perspective. The paper tries to study “who” exactly are leaving the company in large numbers, the good/valuable ones or the employees who do not work and also how factors like promotion, satisfaction level , evaluation scores, number of projects handled and time spent at the company.In this paper we will evaluate if promotion is the major push for “valuable” employees to leave the company.
The data used in this paper is obtained from a secondary source.(kaggle website) This data of 15000 employees has been taken from a company by the host website. It has 10 columns which are Satisfaction Level, Last evaluation, Number of projects, Average monthly hours, Time spent at the company, Whether they have had a work accident, Whether they have had a promotion in the last 5 years, Departments (column sales), Salary and Whether the employee has left. Satisfaction Level, Last evaluation, Number of projects, Average monthly hours and Time spent at the company are numeric data type where satisfaction level ranges from 0-1 , 1 being highest level of satisfaction while Whether they have had a work accident, Whether they have had a promotion in the last 5 years and Whether the employee has left are numeric but binary, that for work accident 0 implies no accident while 1 implies occurance of accident, for promotion 0 implies no promotion while 1 implies promotion of the employee and for leaving the company code 0 implies wmployee hasn’t left the job while 1 represents he has left the job.Other things Departments (column sales) and Salary are strings.
The study defines the good employees or valuable employees on the basis of their last evaluation scores, time spend by them with the company and number of project handled . So our data shall be dealing with those valuable employees who have an evaluation score of 0.6 or higher, time spend with the company is 4 years or more and number of projects handled are 4 or more but yet have left the company. We consider if factors such as satisfaction level, promotion and average number of hours spend in work (on monthly basis). Our regression analysis shall show that people who have left is depending on low satisfaction level, higher working hours and no promotion recieved in last 5 years. So, in this study , we analyse if satisfaction level, number of working hours and promotion affects the employees. And accordingly we construct the hypotheses
Hypothesis H1: Satisfaction level, number of working hours and promotion affects the employees that is these are the reasons making people leave the company
In order to test Hypothesis 1a, we proposed the following model:
\[Left= \beta_0 + \beta_1 Promotion in last 5 years + \beta_2 average monthly working hours + \beta_3 satisfaction level + \epsilon\] # Read the data
setwd("~/R")
mydata <- read.csv(paste("HR_comma_sep.csv", sep = ""))
View(mydata)
m1 <- lm(left~ promotion_last_5years + satisfaction_level + average_montly_hours, data=mydata)
summary(m1)
##
## Call:
## lm(formula = left ~ promotion_last_5years + satisfaction_level +
## average_montly_hours, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.61348 -0.25560 -0.12630 0.03161 1.06187
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.377e-01 1.554e-02 34.597 < 2e-16 ***
## promotion_last_5years -1.526e-01 2.213e-02 -6.893 5.67e-12 ***
## satisfaction_level -6.609e-01 1.285e-02 -51.441 < 2e-16 ***
## average_montly_hours 5.404e-04 6.394e-05 8.453 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.391 on 14995 degrees of freedom
## Multiple R-squared: 0.1575, Adjusted R-squared: 0.1574
## F-statistic: 934.7 on 3 and 14995 DF, p-value: < 2.2e-16
We established the effect of promotion, average monthly hours and satisfaction level on number of people who have left the company with the simplest model. We regressed left on promotion, average monthly hours and satisfaction level. We estimated model, using linear least squares.
We found empirical support for H1. Factors such as promotion, satisfaction level and average monthly hours spent by the employees affects an employee’s decision to “switch” or leave the company. On an average employees with low satisfaction level, who work for more hours and didn’t get promotion are most likely to leave the company. The regression analysis using Ordinary Least Squares yielded \(\beta_1>0\), with \(p<0.05\) for promotion in last 5 years, for average monthly hours spent the coefficient \(\beta_2>0\), with \(p<0.05\) and for satisfaction level the coffefficient \(\beta_2>0\), with \(p<0.05\).
This study is important for managers to observe what exactly makes valuable employees to leave.Factors such as promotion, satisfaction level and average monthly hours spent by the employees significanty affects the decison of an employee to leave or not.
www.kaggle.com. HR Data Analysis. www.wikipedia.com www.investopedia.com www.firstround.com
Descriptive Statistics
setwd("~/R")
mydata <- read.csv(paste("HR_comma_sep.csv", sep = ""))
View(mydata)
dim(mydata)
## [1] 14999 10
library(psych)
## Warning: package 'psych' was built under R version 3.4.3
describe(mydata)
## vars n mean sd median trimmed mad min
## satisfaction_level 1 14999 0.61 0.25 0.64 0.63 0.28 0.09
## last_evaluation 2 14999 0.72 0.17 0.72 0.72 0.22 0.36
## number_project 3 14999 3.80 1.23 4.00 3.74 1.48 2.00
## average_montly_hours 4 14999 201.05 49.94 200.00 200.64 65.23 96.00
## time_spend_company 5 14999 3.50 1.46 3.00 3.28 1.48 2.00
## Work_accident 6 14999 0.14 0.35 0.00 0.06 0.00 0.00
## left 7 14999 0.24 0.43 0.00 0.17 0.00 0.00
## promotion_last_5years 8 14999 0.02 0.14 0.00 0.00 0.00 0.00
## sales* 9 14999 6.94 2.75 8.00 7.23 2.97 1.00
## salary* 10 14999 2.35 0.63 2.00 2.41 1.48 1.00
## max range skew kurtosis se
## satisfaction_level 1 0.91 -0.48 -0.67 0.00
## last_evaluation 1 0.64 -0.03 -1.24 0.00
## number_project 7 5.00 0.34 -0.50 0.01
## average_montly_hours 310 214.00 0.05 -1.14 0.41
## time_spend_company 10 8.00 1.85 4.77 0.01
## Work_accident 1 1.00 2.02 2.08 0.00
## left 1 1.00 1.23 -0.49 0.00
## promotion_last_5years 1 1.00 6.64 42.03 0.00
## sales* 10 9.00 -0.79 -0.62 0.02
## salary* 3 2.00 -0.42 -0.67 0.01
summary(mydata)
## satisfaction_level last_evaluation number_project average_montly_hours
## Min. :0.0900 Min. :0.3600 Min. :2.000 Min. : 96.0
## 1st Qu.:0.4400 1st Qu.:0.5600 1st Qu.:3.000 1st Qu.:156.0
## Median :0.6400 Median :0.7200 Median :4.000 Median :200.0
## Mean :0.6128 Mean :0.7161 Mean :3.803 Mean :201.1
## 3rd Qu.:0.8200 3rd Qu.:0.8700 3rd Qu.:5.000 3rd Qu.:245.0
## Max. :1.0000 Max. :1.0000 Max. :7.000 Max. :310.0
##
## time_spend_company Work_accident left
## Min. : 2.000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 3.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 3.000 Median :0.0000 Median :0.0000
## Mean : 3.498 Mean :0.1446 Mean :0.2381
## 3rd Qu.: 4.000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :10.000 Max. :1.0000 Max. :1.0000
##
## promotion_last_5years sales salary
## Min. :0.00000 sales :4140 high :1237
## 1st Qu.:0.00000 technical :2720 low :7316
## Median :0.00000 support :2229 medium:6446
## Mean :0.02127 IT :1227
## 3rd Qu.:0.00000 product_mng: 902
## Max. :1.00000 marketing : 858
## (Other) :2923
table(mydata$satisfaction_level)
##
## 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23
## 195 358 335 30 54 73 76 79 72 63 74 69 67 60 54
## 0.24 0.25 0.26 0.27 0.28 0.29 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38
## 80 34 30 30 31 38 39 59 50 36 48 37 139 241 189
## 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53
## 175 209 171 155 224 211 203 95 42 149 209 229 187 196 179
## 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68
## 185 179 187 210 182 219 193 208 188 209 187 199 228 177 162
## 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82 0.83
## 209 205 171 230 246 257 226 234 252 241 217 222 220 241 234
## 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98
## 247 207 200 225 187 237 220 224 198 169 167 181 203 176 183
## 0.99 1
## 172 111
table(mydata$left)
##
## 0 1
## 11428 3571
table(mydata$promotion_last_5years)
##
## 0 1
## 14680 319
table(mydata$average_montly_hours)
##
## 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113
## 6 14 23 11 19 16 17 17 28 17 19 10 18 18 12 26 10 29
## 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131
## 15 14 10 18 12 10 10 24 11 20 13 19 25 72 65 63 59 69
## 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
## 100 87 114 153 104 122 88 120 129 115 112 127 102 134 110 118 123 148
## 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
## 108 147 112 122 121 125 153 126 124 121 136 87 96 73 78 78 73 94
## 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185
## 92 86 76 83 70 96 78 76 81 81 85 73 88 78 75 84 80 93
## 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203
## 76 68 73 85 75 80 96 67 71 67 79 70 86 79 58 86 80 72
## 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
## 68 73 83 71 72 72 72 79 72 71 78 68 76 87 79 85 64 81
## 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239
## 84 93 112 95 93 77 76 93 59 77 97 102 74 76 83 90 108 96
## 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257
## 93 85 98 112 98 124 102 108 86 93 100 98 86 101 113 115 87 126
## 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275
## 110 98 124 102 86 110 111 91 105 88 93 102 93 104 86 88 94 82
## 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293
## 30 21 35 32 29 34 36 25 24 33 50 30 6 19 15 17 15 13
## 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310
## 16 12 21 7 13 6 11 24 8 6 17 18 18 14 20 16 18
par(mar = rep(2, 4))
boxplot(satisfaction_level~left,data=mydata, main="Satisfaction level of those left",
xlab="Satisfaction Level", ylab="Left")
boxplot(average_montly_hours~left,data=mydata, main="Average Monthly Hours Spent by those left",
xlab="Monthly Hours Spent", ylab="Left")
left <- mydata[ which(mydata$left=='1') , ]
View(left)
hist(left$satisfaction_level,col="#3090C7", main = "Satisfaction level")
hist(left$average_montly_hours,col="#3090C7", main = "Average montly hours")
table(mydata\(satisfaction_level, mydata\)left) table(mydata\(average_montly_hours, mydata\)left) table(mydata\(promotion_last_5years, mydata\)left)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
goodpeopleleft <- left %>% filter(last_evaluation >= 0.60 | time_spend_company >= 4 | number_project > 4)
## Warning: package 'bindrcpp' was built under R version 3.4.3
View(goodpeopleleft)
goodpeopleleft2 <- mydata %>% filter(last_evaluation >= 0.60 | time_spend_company >= 4 | number_project > 4)
goodpeople <- goodpeopleleft2 %>% select(satisfaction_level, number_project: promotion_last_5years)
M <- cor(goodpeople)
library(corrplot)
## Warning: package 'corrplot' was built under R version 3.4.3
## corrplot 0.84 loaded
corrplot(M, method="number")