Human resource analytics

1.Introduction:

Human Resource analytics refers to “ analysing the employees based on their behaviours” during their job hours.The basic need for this is to maintain a data of how they are with their projects?how much time they take to complete a project?whether they are satisfied with their jobs or not?what are their requirments? etc. By analysisng the above things we will come to a conclusion of what exactly the things are happening in the company and how the employees are satisfied with their works.Thus using the dataset,the entire analysed is made and they are described in the following topics.

2.Data

For this study,I have collected data from the link https://www.kaggle.com/datasets .Thus using this website the data has been choosen and the analysis has been made using R programming

3.Overview of the Study

The study includes all the fields such as

a.Satisfaction level of the employee
b.Last evaluation by the employee
c.Number of projects done
d.Average hours worked by the employee in the month
e.Time spend by the employee in the company
f.Work accident done by the employee in their projects
g.Employees left the company
h.Promotions to the employee in the last 5 years
i.Sales made by the employee
j.Salary for each employee

By using these fields we can be easily able to compare things like the satisfaction level of the employee with their salary,number of projects done by them,average hours worked in a month,etc.There are several test cases that are also done in this project in which the dependence between each field can also be identified.

4.Result

After analysing the entire data,it comes to a result that various variables are dependent on some particulary variable and using the certain test method.Likewise in this study the satisfaction level of the employee with respect to his salary has lowest p-value and highest Multiple R-squared value while testing.Thus, the satisfaction level of the employee can be easily related to their salary.

5.Conclusion

This paper was motivated by the need for research that could improve our understanding of how the salary of the employee influences their satisfaction level in their work.The unique thing in this paper is that by knowing their satisfaction level we will come to a conclusion that how they are involved in their job,what is their average working hours in the month,sales expected,number of projects done and their work accident.Thus this study concludes

6.Model

Appendix 1:

Reading the .CSV File and finding the length and breadth of the dataset

setwd("C:/Users/simbu/Desktop/csvfiles/csvfiles")
hres<-read.csv(paste("hresource.csv",sep=""))
dim(hres)

## [1] 14999    10

Appendix 2:

Finding the factors in the dataset.

str(hres)

## 'data.frame':    14999 obs. of  10 variables:
##  $ satisfaction_level   : num  0.38 0.8 0.11 0.72 0.37 0.41 0.1 0.92 0.89 0.42 ...
##  $ last_evaluation      : num  0.53 0.86 0.88 0.87 0.52 0.5 0.77 0.85 1 0.53 ...
##  $ number_project       : int  2 5 7 5 2 2 6 5 5 2 ...
##  $ average_montly_hours : int  157 262 272 223 159 153 247 259 224 142 ...
##  $ time_spend_company   : int  3 6 4 5 3 3 4 5 5 3 ...
##  $ Work_accident        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ left                 : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ promotion_last_5years: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ sales                : Factor w/ 10 levels "accounting","hr",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ salary               : Factor w/ 3 levels "high","low","medium": 2 3 3 2 2 2 2 2 2 2 ...

Appendix 3:

Boxplot for the satisfaction level of the person vs their salary.

boxplot(hres$satisfaction_level~hres$salary , main="Satisfaction level vs Salary" , xlab="salary" , ylab="satisfaction level" ,col = c(" purple","orange","yellow"))

Boxplot for the satifaction level of the people with their jobs vs the last evaluation.

boxplot(hres$satisfaction_level,hres$last_evaluation,main = "Satisfaction Level vs last evaluation",col = c("red","yellow"))

Appendix 4:

Visualization of each variables.

Histogram of the satisfaction level

library(lattice)
hist(hres$satisfaction_level, main = "Histogram of satisfaction level" ,col = "lightgreen")

Histogram for the number of projects done

hist(hres$number_project,main="Histogram of Number of Projects",col = "brown" )

Histogram for the time spend in the company.

hist(hres$time_spend_company,main="Histogram of Time spend in  a company",col="green")

Histogram for the members left the company

hist(hres$left,main = "Histogram of members leaving company", col="blue")

##Appendix 5: Relation between the satisfaction level and the number of projects done which is wrapped using their salary details

library(ggplot2)
ggplot(hres,aes(satisfaction_level,number_project)) + geom_point(aes(color = left)) + scale_x_continuous("satisfaction level") + scale_y_continuous("salary") + labs("Satisfaction level with their salary") + facet_wrap(~salary)

##Appendix 6:

Corrgram for the entire data in the dataset

library(corrgram)
corrgram(hres,order=TRUE,lower.panel = panel.shade,upper.panel = panel.pie,text.panel = panel.txt,main="Human Resource Analysis")

##Appendix 7:

Scatter plotfor the number of projects done vs their sales

pairs(formula = ~ number_project + sales, data = hres,cex = 0.8, col = "red")

Scatter plot for the number of projects done vs their work done in the projects

pairs(formula = ~hres$number_project + hres$Work_accident, cex = 1.5,col = "green")

Appendix 8:

Correlation test for the satisfaction level of the employees vs the employees left the company.

cor.test(hres$satisfaction_level,hres$left)

## 
##  Pearson's product-moment correlation
## 
## data:  hres$satisfaction_level and hres$left
## t = -51.613, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4018809 -0.3747001
## sample estimates:
##       cor 
## -0.388375

Correlation test for the Satisfaction level of the employee vs their work accident.

cor.test(hres$satisfaction_level,hres$Work_accident)

## 
##  Pearson's product-moment correlation
## 
## data:  hres$satisfaction_level and hres$Work_accident
## t = 7.2006, df = 14997, p-value = 6.279e-13
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.04273358 0.07463094
## sample estimates:
##        cor 
## 0.05869724

Since the p-values of all cor test is less than 0.05 they are highly correlated.

Appendix 9:

Chi-square test for the number of proects done by the employee vs their work accident.

table<-xtabs(~hres$number_project+hres$Work_accident)
chisq.test(table)

## 
##  Pearson's Chi-squared test
## 
## data:  table
## X-squared = 130.98, df = 5, p-value < 2.2e-16

The above table has a p-value of<2.2e-16.Hence the numberof projects made by the members are dependent of the work accident.

Chi-square test for the work accident vs their promotion in last five years.

table1<-xtabs(~hres$Work_accident+hres$promotion_last_5years)
chisq.test(table1)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table1
## X-squared = 22.335, df = 1, p-value = 2.29e-06

THe above test has a p-value of 2.29e-06.Hence the work accident is dependent from the promotion in the last 5 years

Chi-square test for the employees left the company vs their salary.

table2<-xtabs(~hres$left+hres$salary)
chisq.test(table2)

## 
##  Pearson's Chi-squared test
## 
## data:  table2
## X-squared = 381.23, df = 2, p-value < 2.2e-16

The above test has a p-value of <2.2e-16.Hence the members those who left the company are independent of the salary

Appendix 10:

Model 1:Satisfaction level vs salary.

fit<-lm(hres$satisfaction_level~hres$salary)
summary(fit)

## 
## Call:
## lm(formula = hres$satisfaction_level ~ hres$salary)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.54747 -0.17182  0.02925  0.19925  0.39925 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.637470   0.007061  90.284  < 2e-16 ***
## hres$salarylow    -0.036717   0.007634  -4.809 1.53e-06 ***
## hres$salarymedium -0.015653   0.007709  -2.031   0.0423 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2483 on 14996 degrees of freedom
## Multiple R-squared:  0.002522,   Adjusted R-squared:  0.002389 
## F-statistic: 18.96 on 2 and 14996 DF,  p-value: 5.967e-09

From the above model the p-value is 5.967e-09 and the multiple R-squared value is 0.002522

Model 2:Number of projects vs work accident

fit1<-lm(hres$number_project~hres$Work_accident)
summary(fit1)

## 
## Call:
## lm(formula = hres$number_project ~ hres$Work_accident)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8055 -0.8055  0.1945  1.1945  3.2112 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         3.80546    0.01088 349.696   <2e-16 ***
## hres$Work_accident -0.01661    0.02862  -0.581    0.562    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.233 on 14997 degrees of freedom
## Multiple R-squared:  2.247e-05,  Adjusted R-squared:  -4.421e-05 
## F-statistic: 0.337 on 1 and 14997 DF,  p-value: 0.5616

From the above model the p-value is 0.5616 and the multiple R-squared value is 2.247e-05

Model 3:Average monthly hours vs salary.

fit2<-lm(hres$average_montly_hours~hres$salary)
summary(fit2)

## 
## Call:
## lm(formula = hres$average_montly_hours ~ hres$salary)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -105.338  -45.338   -0.997   44.003  109.003 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        199.867      1.420 140.746   <2e-16 ***
## hres$salarylow       1.129      1.535   0.735    0.462    
## hres$salarymedium    1.471      1.550   0.949    0.343    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 49.94 on 14996 degrees of freedom
## Multiple R-squared:  6.113e-05,  Adjusted R-squared:  -7.223e-05 
## F-statistic: 0.4584 on 2 and 14996 DF,  p-value: 0.6323

From the above model the p-value is 0.6323 and the multiple R-squared value is 6.113e-05

Analysing the models,the model(satisfaction level and the salary) is the best model since it has lowest p-value and the highest multiple R-squared value.

Appendix 11:

To find the beta-coefficients

Beta coefiicient between the satisfaction level and the salary.

fit<-lm(hres$satisfaction_level~hres$salary)
fit$coefficients

##       (Intercept)    hres$salarylow hres$salarymedium 
##        0.63746968       -0.03671654       -0.01565305

satisfaction level=b0 + salary*(b1) where b0=-1,b1(low)=-0.03671,b1(medium)=-0.01565

satisfaction level = -1+salary-0.03671 for salary=low satisfaction level = -1+salary-0.01565 for salary=medium

Beta coefficient between number of projects and the work accident.

fit1<-lm(hres$number_project~hres$Work_accident)
fit1$coefficients

##        (Intercept) hres$Work_accident 
##         3.80545596        -0.01661318

number of projects=b0+work accident(b1) where b0=-1,b1=-0.0166

number of projects=-1+work accident*-0.0166

Beta coefficient between the average monthly hours and the salary

fit2<-lm(hres$average_montly_hours~hres$salary)
fit2$coefficients

##       (Intercept)    hres$salarylow hres$salarymedium 
##        199.867421          1.129162          1.470928

average monthly hours = b0+salary*b1 where b0=-1,b1(low)=1.1291,b1(medium)=1.4709

average monthly hours = -1+1.1291 for salary = low average monthly hours = -1+1.4709 for salary = medium