Human resource analytics (HR analytics) is an area in the field of analytics that refers to applying analytic processes to the human resource department of an organization in the hope of improving employee performance and therefore getting a better return on investment. HR analytics does not just deal with gathering data on employee efficiency. Instead, it aims to provide insight into each process by gathering data and then using it to make relevant decisions about how to improve these processes.
What HR analytics does is correlate business data and people data, which can help establish important connections later on. The key aspect of HR analytics is to provide data on the impact the HR department has on the organization as a whole. Establishing a relationship between what HR does and business outcomes - and then creating strategies based on that information - is what HR analytics is all about. HR has core functions that can be enhanced by applying processes in analytics. These are acquisition, optimization, paying and developing the workforce of the organization. HR analytics can help to dig into problems and issues surrounding these requirements, and using analytical workflow, guide the managers to answer questions and gain insights from information at hand, then make relevant decisions and take appropriate actions.
Human resource management (HRM) is the strategic and coherent approach to the of management an organization’s most valued assets - the people working there who individually and collectively contribute to the achievement of the objectives of the business. The terms “human resource management” and “human resources” (HR) have largely replaced the term “personnel management” as a description of the processes involved in managing people in organizations.
Human Resources are critical resources of any organiazation. Organizations spend huge amount of time and money to hire and nuture their employees. It is a huge loss for companies if employees leave, especially the key resources. So if HR can predict weather employees are at risk for leaving the company, it will allow them to identify the attrition risks and help understand and provie necessary support to retain those employees or do preventive hiring to minimize the impact to the orgranization.
In an intensely competitive environment, where HR managers are poaching from each other, organisations can either hold on to their employees tight or lose them to competition. For gone are the days, when employees would stick to an employer for years for want of a better choice. Now, opportunities abound. It is a fact that, retention of key employees is critical to the long-term health and success of any organisation. The performance of employees is often linked directly to quality work, customer satisfaction, and increased product sales and even to the image of a company. Whereas the same is often indirectly linked to, satisfied colleagues and reporting staff, effective succession planning and deeply embedded organisational knowledge and learning.
Effective employee retention is a systematic effort by employers to create and foster an environment that encourages current employees to remain employed by having policies and practices in place that address their diverse needs. A strong retention strategy becomes a powerful recruitment tool. Retention of key employees is critical to the long-term health and success of any organization. It is a known fact that retaining your best employees ensures customer satisfaction, increased product sales, satisfied colleagues and reporting staff, effective succession planning and deeply imbedded organizational knowledge and learning. Employee retention matters as organizational issues such as training time and investment; lost knowledge; insecure employees and a costly candidate search are involved. Hence failing to retain a key employee is a costly proposition for an organization. Various estimates suggest that losing a middle manager in most organizations costs up to five times of his salary. Intelligent employers always realise the importance of retaining the best talent. Retaining talent has never been so important in the Indian scenario; however, things have changed in recent years. In prominent Indian metros at least, there is no dearth of opportunities for the best in the business, or even for the second or the third best. Retention of key employees and treating attrition troubles has never been so important to companies.
High turnover often leaves customers and employees in the lurch; departing employees take a great deal of knowledge with them. This lack of continuity makes it hard to meet your organization’s goals and serve customers well.
Replacing employees costs money. The cost of replacing an employee is estimated as up to twice the individual’s annual salary (or higher for some positions, such as middle management), and this doesn’t even include the cost of lost knowledge.
Recruiting employees consumes a great deal of time and effort, much of it futile. You’re not the only one out there vying for qualified employees, and job searchers make decisions based on more than the sum of salary and benefits.
Bringing employees up to speed takes even more time. And when you’re short-staffed, you often need to put in extra time to get the work done.
. sat_l: Level of satisfaction (0-1)
. last_evaluation: Time since last performance evaluation (in Years)
. number_project: Number of projects completed while at work
. average_montly_hours: Average monthly hours at workplace
. time_spend_company: Number of years spent in the company
. Work_accident: Whether the employee had a workplace accident
. left: Whether the employee left the workplace or not (1 or 0) Factor
. promotion_last_5years: Whether the employee was promoted in the last five years
. sales: Department in which they work for
. salary: Relative level of salary
hr<-read.csv("hr.csv")
View(hr)
summary(hr)
## sat_l last_evaluation number_project average_montly_hours
## Min. :0.0900 Min. :0.3600 Min. :2.000 Min. : 96.0
## 1st Qu.:0.4400 1st Qu.:0.5600 1st Qu.:3.000 1st Qu.:156.0
## Median :0.6400 Median :0.7200 Median :4.000 Median :200.0
## Mean :0.6128 Mean :0.7161 Mean :3.803 Mean :201.1
## 3rd Qu.:0.8200 3rd Qu.:0.8700 3rd Qu.:5.000 3rd Qu.:245.0
## Max. :1.0000 Max. :1.0000 Max. :7.000 Max. :310.0
##
## time_spend_company Work_accident left
## Min. : 2.000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 3.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 3.000 Median :0.0000 Median :0.0000
## Mean : 3.498 Mean :0.1446 Mean :0.2381
## 3rd Qu.: 4.000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :10.000 Max. :1.0000 Max. :1.0000
##
## promotion_last_5years sales salary
## Min. :0.00000 sales :4140 high :1237
## 1st Qu.:0.00000 technical :2720 low :7316
## Median :0.00000 support :2229 medium:6446
## Mean :0.02127 IT :1227
## 3rd Qu.:0.00000 product_mng: 902
## Max. :1.00000 marketing : 858
## (Other) :2923
library(psych)
## Warning: package 'psych' was built under R version 3.3.3
library(corrgram)
## Warning: package 'corrgram' was built under R version 3.3.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(lattice)
## Warning: package 'lattice' was built under R version 3.3.3
library(car)
## Warning: package 'car' was built under R version 3.3.3
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
library(corrplot)
## Warning: package 'corrplot' was built under R version 3.3.3
## corrplot 0.84 loaded
From here we can see that the dimensions of the dataset is 14999*10, that is it has 14999 rows and 10 columns which we will be using in our analysis.
dim(hr)
## [1] 14999 10
This gives us the various data types used in the dataset.
str(hr)
## 'data.frame': 14999 obs. of 10 variables:
## $ sat_l : num 0.38 0.8 0.11 0.72 0.37 0.41 0.1 0.92 0.89 0.42 ...
## $ last_evaluation : num 0.53 0.86 0.88 0.87 0.52 0.5 0.77 0.85 1 0.53 ...
## $ number_project : int 2 5 7 5 2 2 6 5 5 2 ...
## $ average_montly_hours : int 157 262 272 223 159 153 247 259 224 142 ...
## $ time_spend_company : int 3 6 4 5 3 3 4 5 5 3 ...
## $ Work_accident : int 0 0 0 0 0 0 0 0 0 0 ...
## $ left : int 1 1 1 1 1 1 1 1 1 1 ...
## $ promotion_last_5years: int 0 0 0 0 0 0 0 0 0 0 ...
## $ sales : Factor w/ 10 levels "accounting","hr",..: 8 8 8 8 8 8 8 8 8 8 ...
## $ salary : Factor w/ 3 levels "high","low","medium": 2 3 3 2 2 2 2 2 2 2 ...
Alternative method of viewing the datset summary.
describe(hr)
## vars n mean sd median trimmed mad min
## sat_l 1 14999 0.61 0.25 0.64 0.63 0.28 0.09
## last_evaluation 2 14999 0.72 0.17 0.72 0.72 0.22 0.36
## number_project 3 14999 3.80 1.23 4.00 3.74 1.48 2.00
## average_montly_hours 4 14999 201.05 49.94 200.00 200.64 65.23 96.00
## time_spend_company 5 14999 3.50 1.46 3.00 3.28 1.48 2.00
## Work_accident 6 14999 0.14 0.35 0.00 0.06 0.00 0.00
## left 7 14999 0.24 0.43 0.00 0.17 0.00 0.00
## promotion_last_5years 8 14999 0.02 0.14 0.00 0.00 0.00 0.00
## sales* 9 14999 6.94 2.75 8.00 7.23 2.97 1.00
## salary* 10 14999 2.35 0.63 2.00 2.41 1.48 1.00
## max range skew kurtosis se
## sat_l 1 0.91 -0.48 -0.67 0.00
## last_evaluation 1 0.64 -0.03 -1.24 0.00
## number_project 7 5.00 0.34 -0.50 0.01
## average_montly_hours 310 214.00 0.05 -1.14 0.41
## time_spend_company 10 8.00 1.85 4.77 0.01
## Work_accident 1 1.00 2.02 2.08 0.00
## left 1 1.00 1.23 -0.49 0.00
## promotion_last_5years 1 1.00 6.64 42.03 0.00
## sales* 10 9.00 -0.79 -0.62 0.02
## salary* 3 2.00 -0.42 -0.67 0.01
The summary statistics for Work_accident, left and promotion_last_5years does not make sense, as they are categorical variables
sum(is.na.data.frame(hr))
## [1] 0
Since there is no missing data, so this seems to be a good dataset.
left<-with(hr,table(left))
View(left)
left
## left
## 0 1
## 11428 3571
barplot(left,main="No.of Employees who left", ylab="No of employees",xlab = "left",col = c("lightblue", "mistyrose"))
From here we can see that the employees leaving the company is more that 3 times than those who are not leaving.
wa<-with(hr,table(Work_accident))
wa
## Work_accident
## 0 1
## 12830 2169
barplot(wa,main="Work Accidents",xlab="Work Accident",ylab="Number(s)",col = c("lightblue", "mistyrose"))
m<-xtabs(~left+Work_accident, data=hr)
barplot(m,beside = TRUE)
n<-xtabs(~left+sat_l, data=hr)
barplot(n,beside = TRUE)
boxplot(hr$sat_l~hr$left)
It can be noted, large number of people who had lower satisfaction levels (from the stating part of the graph), have left the company. Especially, people who have satisfaction level less than 0.5. This makes sense. But there is also a surge in at higher level of satisfaction. Need to understand and deal with these employees with a different stategy
o<-xtabs(~left+last_evaluation, data=hr)
barplot(o,beside = TRUE)
boxplot(hr$last_evaluation~hr$left)
From the plots we can clearly see that the people with low evaluation and very high evaluation are leaving, where as people with average evaluation scores are staying back.
r<-table(hr$left,hr$time_spend_company)
barplot(r, beside=TRUE, legend = rownames(r))
People who have spent 2 years are not leaving the company. But as experience grows people start leaving and highest after they spend 5 years in the company. But once they cross the golden years ‘7’, they are not leaving.
depa<-table(hr$left,hr$sales)
depee<-prop.table(depa)
barplot(depee, beside = TRUE)
The percentage of people leaving the company is evenly distributed across all depts. Surprisingly, the percentage is high in HR itself. Lowest in management
prom<-table(hr$left,hr$promotion_last_5years)
prom
##
## 0 1
## 0 11128 300
## 1 3552 19
barplot(prom,beside = TRUE)
Very few people who got promoted in last 5 years left the company, compared to people who are not promoted in last 5 years
sal<-table(hr$left,hr$salary)
sal
##
## high low medium
## 0 1155 5144 5129
## 1 82 2172 1317
barplot(sal, beside = TRUE)
So, we can see that the employees with lower salary are leaving the company whereas the ratio of employees leaving the company with high salary is very less. This is very obvious as one desires to have a hefty salary.
sat<-table(hr$sat_l,hr$salary)
sat
##
## high low medium
## 0.09 4 113 78
## 0.1 9 205 144
## 0.11 2 210 123
## 0.12 1 11 18
## 0.13 4 27 23
## 0.14 11 35 27
## 0.15 14 36 26
## 0.16 9 29 41
## 0.17 8 31 33
## 0.18 6 34 23
## 0.19 5 39 30
## 0.2 7 38 24
## 0.21 12 25 30
## 0.22 10 27 23
## 0.23 3 24 27
## 0.24 8 37 35
## 0.25 2 16 16
## 0.26 3 16 11
## 0.27 1 18 11
## 0.28 5 16 10
## 0.29 2 16 20
## 0.3 2 20 17
## 0.31 2 36 21
## 0.32 4 26 20
## 0.33 8 14 14
## 0.34 2 25 21
## 0.35 2 18 17
## 0.36 10 74 55
## 0.37 12 141 88
## 0.38 7 115 67
## 0.39 11 98 66
## 0.4 6 124 79
## 0.41 3 119 49
## 0.42 9 80 66
## 0.43 11 117 96
## 0.44 12 108 91
## 0.45 18 112 73
## 0.46 4 64 27
## 0.47 3 23 16
## 0.48 9 70 70
## 0.49 29 79 101
## 0.5 32 94 103
## 0.51 11 90 86
## 0.52 16 86 94
## 0.53 27 86 66
## 0.54 18 81 86
## 0.55 20 94 65
## 0.56 20 73 94
## 0.57 19 88 103
## 0.58 22 78 82
## 0.59 17 91 111
## 0.6 15 94 84
## 0.61 21 92 95
## 0.62 19 79 90
## 0.63 26 90 93
## 0.64 26 92 69
## 0.65 19 106 74
## 0.66 18 103 107
## 0.67 11 76 90
## 0.68 20 71 71
## 0.69 24 86 99
## 0.7 15 97 93
## 0.71 15 74 82
## 0.72 20 112 98
## 0.73 20 119 107
## 0.74 28 118 111
## 0.75 19 111 96
## 0.76 21 113 100
## 0.77 30 102 120
## 0.78 24 117 100
## 0.79 14 109 94
## 0.8 20 103 99
## 0.81 14 103 103
## 0.82 28 113 100
## 0.83 11 127 96
## 0.84 13 132 102
## 0.85 13 96 98
## 0.86 15 96 89
## 0.87 19 120 86
## 0.88 16 84 87
## 0.89 24 110 103
## 0.9 14 96 110
## 0.91 10 120 94
## 0.92 14 102 82
## 0.93 21 76 72
## 0.94 13 82 72
## 0.95 13 86 82
## 0.96 26 88 89
## 0.97 22 90 64
## 0.98 16 74 93
## 0.99 15 75 82
## 1 3 55 53
satt<-prop.table(sat)
barplot(satt, beside = TRUE)
scatterplot(hr$salary,hr$sat_l)
corrgram(hr,order=FALSE,lower.panel=panel.shade,upper.panel=panel.pie)
Satisfaction level reduces as people spend more time in the company. Also, interestingly when they work on more number of projects.
Evaluation score is positively correlated with spending more montly hours and number of projects.
As satisfaction level reduces, people tend to leave company.
x<- xtabs(~sat_l+left,hr)
x
## left
## sat_l 0 1
## 0.09 0 195
## 0.1 0 358
## 0.11 0 335
## 0.12 26 4
## 0.13 51 3
## 0.14 63 10
## 0.15 73 3
## 0.16 78 1
## 0.17 67 5
## 0.18 63 0
## 0.19 68 6
## 0.2 65 4
## 0.21 66 1
## 0.22 59 1
## 0.23 52 2
## 0.24 77 3
## 0.25 31 3
## 0.26 29 1
## 0.27 24 6
## 0.28 28 3
## 0.29 37 1
## 0.3 37 2
## 0.31 42 17
## 0.32 42 8
## 0.33 33 3
## 0.34 44 4
## 0.35 34 3
## 0.36 43 96
## 0.37 47 194
## 0.38 35 154
## 0.39 35 140
## 0.4 40 169
## 0.41 39 132
## 0.42 47 108
## 0.43 46 178
## 0.44 58 153
## 0.45 46 157
## 0.46 27 68
## 0.47 38 4
## 0.48 139 10
## 0.49 207 2
## 0.5 226 3
## 0.51 182 5
## 0.52 196 0
## 0.53 171 8
## 0.54 179 6
## 0.55 175 4
## 0.56 182 5
## 0.57 202 8
## 0.58 179 3
## 0.59 212 7
## 0.6 189 4
## 0.61 202 6
## 0.62 186 2
## 0.63 205 4
## 0.64 185 2
## 0.65 198 1
## 0.66 217 11
## 0.67 176 1
## 0.68 161 1
## 0.69 209 0
## 0.7 195 10
## 0.71 167 4
## 0.72 200 30
## 0.73 204 42
## 0.74 206 51
## 0.75 188 38
## 0.76 189 45
## 0.77 201 51
## 0.78 191 50
## 0.79 172 45
## 0.8 194 28
## 0.81 169 51
## 0.82 183 58
## 0.83 187 47
## 0.84 185 62
## 0.85 171 36
## 0.86 159 41
## 0.87 167 58
## 0.88 162 25
## 0.89 181 56
## 0.9 168 52
## 0.91 181 43
## 0.92 178 20
## 0.93 169 0
## 0.94 167 0
## 0.95 181 0
## 0.96 203 0
## 0.97 176 0
## 0.98 183 0
## 0.99 172 0
## 1 111 0
chisq.test(x)
##
## Pearson's Chi-squared test
##
## data: x
## X-squared = 7937.7, df = 91, p-value < 2.2e-16
t.test(sat_l~left,data=hr)
##
## Welch Two Sample t-test
##
## data: sat_l by left
## t = 46.636, df = 5167, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.2171815 0.2362417
## sample estimates:
## mean in group 0 mean in group 1
## 0.6668096 0.4400980
scatterplotMatrix(~sat_l+last_evaluation+number_project+average_montly_hours+time_spend_company+Work_accident+left+promotion_last_5years,data=hr,main="Overall Analysis")
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth