Human Resource analytics refers to “ analysing the employees based on their behaviours” during their job hours.The basic need for this is to maintain a data of how they are with their projects?how much time they take to complete a project?whether they are satisfied with their jobs or not?what are their requirments? etc. By analysisng the above things we will come to a conclusion of what exactly the things are happening in the company and how the employees are satisfied with their works.Thus using the dataset,the entire analysed is made and they are described in the following topics.
For this study,I have collected data from the link https://www.kaggle.com/datasets .Thus using this website the data has been choosen and the analysis has been made using R programming
The study includes all the fields such as
a.Satisfaction level of the employee
b.Last evaluation by the employee
c.Number of projects done
d.Average hours worked by the employee in the month
e.Time spend by the employee in the company
f.Work accident done by the employee in their projects
g.Employees left the company
h.Promotions to the employee in the last 5 years
i.Sales made by the employee
j.Salary for each employee
By using these fields we can be easily able to compare things like the satisfaction level of the employee with their salary,number of projects done by them,average hours worked in a month,etc.There are several test cases that are also done in this project in which the dependence between each field can also be identified.
After analysing the entire data,it comes to a result that various variables are dependent on some particulary variable and using the certain test method.Likewise in this study the satisfaction level of the employee with respect to his salary has lowest p-value and highest Multiple R-squared value while testing.Thus, the satisfaction level of the employee can be easily related to their salary.
This paper was motivated by the need for research that could improve our understanding of how the salary of the employee influences their satisfaction level in their work.The unique thing in this paper is that by knowing their satisfaction level we will come to a conclusion that how they are involved in their job,what is their average working hours in the month,sales expected,number of projects done and their work accident.Thus this study concludes
Reading the .CSV File and finding the length and breadth of the dataset
setwd("C:/Users/simbu/Desktop/csvfiles/csvfiles")
hres<-read.csv(paste("hresource.csv",sep=""))
dim(hres)
## [1] 14999 10
Finding the factors in the dataset.
str(hres)
## 'data.frame': 14999 obs. of 10 variables:
## $ satisfaction_level : num 0.38 0.8 0.11 0.72 0.37 0.41 0.1 0.92 0.89 0.42 ...
## $ last_evaluation : num 0.53 0.86 0.88 0.87 0.52 0.5 0.77 0.85 1 0.53 ...
## $ number_project : int 2 5 7 5 2 2 6 5 5 2 ...
## $ average_montly_hours : int 157 262 272 223 159 153 247 259 224 142 ...
## $ time_spend_company : int 3 6 4 5 3 3 4 5 5 3 ...
## $ Work_accident : int 0 0 0 0 0 0 0 0 0 0 ...
## $ left : int 1 1 1 1 1 1 1 1 1 1 ...
## $ promotion_last_5years: int 0 0 0 0 0 0 0 0 0 0 ...
## $ sales : Factor w/ 10 levels "accounting","hr",..: 8 8 8 8 8 8 8 8 8 8 ...
## $ salary : Factor w/ 3 levels "high","low","medium": 2 3 3 2 2 2 2 2 2 2 ...
Boxplot for the satisfaction level of the person vs their salary.
boxplot(hres$satisfaction_level~hres$salary , main="Satisfaction level vs Salary" , xlab="salary" , ylab="satisfaction level" ,col = c(" purple","orange","yellow"))
Boxplot for the satifaction level of the people with their jobs vs the last evaluation.
boxplot(hres$satisfaction_level,hres$last_evaluation,main = "Satisfaction Level vs last evaluation",col = c("red","yellow"))
Visualization of each variables.
Histogram of the satisfaction level
library(lattice)
hist(hres$satisfaction_level, main = "Histogram of satisfaction level" ,col = "lightgreen")
Histogram for the number of projects done
hist(hres$number_project,main="Histogram of Number of Projects",col = "brown" )
Histogram for the time spend in the company.
hist(hres$time_spend_company,main="Histogram of Time spend in a company",col="green")
Histogram for the members left the company
hist(hres$left,main = "Histogram of members leaving company", col="blue")
##Appendix 5: Relation between the satisfaction level and the number of projects done which is wrapped using their salary details
library(ggplot2)
ggplot(hres,aes(satisfaction_level,number_project)) + geom_point(aes(color = left)) + scale_x_continuous("satisfaction level") + scale_y_continuous("salary") + labs("Satisfaction level with their salary") + facet_wrap(~salary)
##Appendix 6:
Corrgram for the entire data in the dataset
library(corrgram)
corrgram(hres,order=TRUE,lower.panel = panel.shade,upper.panel = panel.pie,text.panel = panel.txt,main="Human Resource Analysis")
##Appendix 7:
Scatter plotfor the number of projects done vs their sales
pairs(formula = ~ number_project + sales, data = hres,cex = 0.8, col = "red")
Scatter plot for the number of projects done vs their work done in the projects
pairs(formula = ~hres$number_project + hres$Work_accident, cex = 1.5,col = "green")
Correlation test for the satisfaction level of the employees vs the employees left the company.
cor.test(hres$satisfaction_level,hres$left)
##
## Pearson's product-moment correlation
##
## data: hres$satisfaction_level and hres$left
## t = -51.613, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4018809 -0.3747001
## sample estimates:
## cor
## -0.388375
Correlation test for the Satisfaction level of the employee vs their work accident.
cor.test(hres$satisfaction_level,hres$Work_accident)
##
## Pearson's product-moment correlation
##
## data: hres$satisfaction_level and hres$Work_accident
## t = 7.2006, df = 14997, p-value = 6.279e-13
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.04273358 0.07463094
## sample estimates:
## cor
## 0.05869724
Since the p-values of all cor test is less than 0.05 they are highly correlated.
Chi-square test for the number of proects done by the employee vs their work accident.
table<-xtabs(~hres$number_project+hres$Work_accident)
chisq.test(table)
##
## Pearson's Chi-squared test
##
## data: table
## X-squared = 130.98, df = 5, p-value < 2.2e-16
The above table has a p-value of<2.2e-16.Hence the numberof projects made by the members are dependent of the work accident.
Chi-square test for the work accident vs their promotion in last five years.
table1<-xtabs(~hres$Work_accident+hres$promotion_last_5years)
chisq.test(table1)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table1
## X-squared = 22.335, df = 1, p-value = 2.29e-06
THe above test has a p-value of 2.29e-06.Hence the work accident is dependent from the promotion in the last 5 years
Chi-square test for the employees left the company vs their salary.
table2<-xtabs(~hres$left+hres$salary)
chisq.test(table2)
##
## Pearson's Chi-squared test
##
## data: table2
## X-squared = 381.23, df = 2, p-value < 2.2e-16
The above test has a p-value of <2.2e-16.Hence the members those who left the company are independent of the salary
Model 1:Satisfaction level vs salary.
fit<-lm(hres$satisfaction_level~hres$salary)
summary(fit)
##
## Call:
## lm(formula = hres$satisfaction_level ~ hres$salary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.54747 -0.17182 0.02925 0.19925 0.39925
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.637470 0.007061 90.284 < 2e-16 ***
## hres$salarylow -0.036717 0.007634 -4.809 1.53e-06 ***
## hres$salarymedium -0.015653 0.007709 -2.031 0.0423 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2483 on 14996 degrees of freedom
## Multiple R-squared: 0.002522, Adjusted R-squared: 0.002389
## F-statistic: 18.96 on 2 and 14996 DF, p-value: 5.967e-09
From the above model the p-value is 5.967e-09 and the multiple R-squared value is 0.002522
Model 2:Number of projects vs work accident
fit1<-lm(hres$number_project~hres$Work_accident)
summary(fit1)
##
## Call:
## lm(formula = hres$number_project ~ hres$Work_accident)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8055 -0.8055 0.1945 1.1945 3.2112
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.80546 0.01088 349.696 <2e-16 ***
## hres$Work_accident -0.01661 0.02862 -0.581 0.562
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.233 on 14997 degrees of freedom
## Multiple R-squared: 2.247e-05, Adjusted R-squared: -4.421e-05
## F-statistic: 0.337 on 1 and 14997 DF, p-value: 0.5616
From the above model the p-value is 0.5616 and the multiple R-squared value is 2.247e-05
Model 3:Average monthly hours vs salary.
fit2<-lm(hres$average_montly_hours~hres$salary)
summary(fit2)
##
## Call:
## lm(formula = hres$average_montly_hours ~ hres$salary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -105.338 -45.338 -0.997 44.003 109.003
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 199.867 1.420 140.746 <2e-16 ***
## hres$salarylow 1.129 1.535 0.735 0.462
## hres$salarymedium 1.471 1.550 0.949 0.343
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 49.94 on 14996 degrees of freedom
## Multiple R-squared: 6.113e-05, Adjusted R-squared: -7.223e-05
## F-statistic: 0.4584 on 2 and 14996 DF, p-value: 0.6323
From the above model the p-value is 0.6323 and the multiple R-squared value is 6.113e-05
Analysing the models,the model(satisfaction level and the salary) is the best model since it has lowest p-value and the highest multiple R-squared value.
To find the beta-coefficients
Beta coefiicient between the satisfaction level and the salary.
fit<-lm(hres$satisfaction_level~hres$salary)
fit$coefficients
## (Intercept) hres$salarylow hres$salarymedium
## 0.63746968 -0.03671654 -0.01565305
satisfaction level=b0 + salary*(b1) where b0=-1,b1(low)=-0.03671,b1(medium)=-0.01565
satisfaction level = -1+salary-0.03671 for salary=low satisfaction level = -1+salary-0.01565 for salary=medium
Beta coefficient between number of projects and the work accident.
fit1<-lm(hres$number_project~hres$Work_accident)
fit1$coefficients
## (Intercept) hres$Work_accident
## 3.80545596 -0.01661318
number of projects=b0+work accident(b1) where b0=-1,b1=-0.0166
number of projects=-1+work accident*-0.0166
Beta coefficient between the average monthly hours and the salary
fit2<-lm(hres$average_montly_hours~hres$salary)
fit2$coefficients
## (Intercept) hres$salarylow hres$salarymedium
## 199.867421 1.129162 1.470928
average monthly hours = b0+salary*b1 where b0=-1,b1(low)=1.1291,b1(medium)=1.4709
average monthly hours = -1+1.1291 for salary = low average monthly hours = -1+1.4709 for salary = medium