HR Analytics

Project Title: HR Analytics
NAME: Ram Sankar
EMAIL: ramshankar797@gmail.com
COLLEGE : NIT Trichy

A company is trying to figure out why their best and experienced employees are leaving prematurely Following are some of the important coloumns of the dataset
-> satisfaction level (ranges from 0 to 1)
-> Number of projects
-> Average monthly hours
-> Time spent at the company (in years)
-> work accident (0 or 1)
-> Promotion (0 or 1)
-> sales (Departments)
-> salary(low, medium, high)
-> left (0 or 1)

Following is the analysis related to the dataset

Reading the csv file into dataframe and basic description of the dataset

hr.df <- read.csv(paste("HR_comma_sep.csv", sep=""))
str(hr.df)

## 'data.frame':    14999 obs. of  10 variables:
##  $ satisfaction_level   : num  0.38 0.8 0.11 0.72 0.37 0.41 0.1 0.92 0.89 0.42 ...
##  $ last_evaluation      : num  0.53 0.86 0.88 0.87 0.52 0.5 0.77 0.85 1 0.53 ...
##  $ number_project       : int  2 5 7 5 2 2 6 5 5 2 ...
##  $ average_montly_hours : int  157 262 272 223 159 153 247 259 224 142 ...
##  $ time_spend_company   : int  3 6 4 5 3 3 4 5 5 3 ...
##  $ Work_accident        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ left                 : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ promotion_last_5years: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ sales                : Factor w/ 10 levels "accounting","hr",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ salary               : Factor w/ 3 levels "high","low","medium": 2 3 3 2 2 2 2 2 2 2 ...

summary(hr.df)

##  satisfaction_level last_evaluation  number_project  average_montly_hours
##  Min.   :0.0900     Min.   :0.3600   Min.   :2.000   Min.   : 96.0       
##  1st Qu.:0.4400     1st Qu.:0.5600   1st Qu.:3.000   1st Qu.:156.0       
##  Median :0.6400     Median :0.7200   Median :4.000   Median :200.0       
##  Mean   :0.6128     Mean   :0.7161   Mean   :3.803   Mean   :201.1       
##  3rd Qu.:0.8200     3rd Qu.:0.8700   3rd Qu.:5.000   3rd Qu.:245.0       
##  Max.   :1.0000     Max.   :1.0000   Max.   :7.000   Max.   :310.0       
##                                                                          
##  time_spend_company Work_accident         left       
##  Min.   : 2.000     Min.   :0.0000   Min.   :0.0000  
##  1st Qu.: 3.000     1st Qu.:0.0000   1st Qu.:0.0000  
##  Median : 3.000     Median :0.0000   Median :0.0000  
##  Mean   : 3.498     Mean   :0.1446   Mean   :0.2381  
##  3rd Qu.: 4.000     3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :10.000     Max.   :1.0000   Max.   :1.0000  
##                                                      
##  promotion_last_5years         sales         salary    
##  Min.   :0.00000       sales      :4140   high  :1237  
##  1st Qu.:0.00000       technical  :2720   low   :7316  
##  Median :0.00000       support    :2229   medium:6446  
##  Mean   :0.02127       IT         :1227                
##  3rd Qu.:0.00000       product_mng: 902                
##  Max.   :1.00000       marketing  : 858                
##                        (Other)    :2923

library(psych)
describe(hr.df)

##                       vars     n   mean    sd median trimmed   mad   min
## satisfaction_level       1 14999   0.61  0.25   0.64    0.63  0.28  0.09
## last_evaluation          2 14999   0.72  0.17   0.72    0.72  0.22  0.36
## number_project           3 14999   3.80  1.23   4.00    3.74  1.48  2.00
## average_montly_hours     4 14999 201.05 49.94 200.00  200.64 65.23 96.00
## time_spend_company       5 14999   3.50  1.46   3.00    3.28  1.48  2.00
## Work_accident            6 14999   0.14  0.35   0.00    0.06  0.00  0.00
## left                     7 14999   0.24  0.43   0.00    0.17  0.00  0.00
## promotion_last_5years    8 14999   0.02  0.14   0.00    0.00  0.00  0.00
## sales*                   9 14999   6.94  2.75   8.00    7.23  2.97  1.00
## salary*                 10 14999   2.35  0.63   2.00    2.41  1.48  1.00
##                       max  range  skew kurtosis   se
## satisfaction_level      1   0.91 -0.48    -0.67 0.00
## last_evaluation         1   0.64 -0.03    -1.24 0.00
## number_project          7   5.00  0.34    -0.50 0.01
## average_montly_hours  310 214.00  0.05    -1.14 0.41
## time_spend_company     10   8.00  1.85     4.77 0.01
## Work_accident           1   1.00  2.02     2.08 0.00
## left                    1   1.00  1.23    -0.49 0.00
## promotion_last_5years   1   1.00  6.64    42.03 0.00
## sales*                 10   9.00 -0.79    -0.62 0.02
## salary*                 3   2.00 -0.42    -0.67 0.01

hr.df$left[hr.df$left==1]<-'Yes'
hr.df$left[hr.df$left==0]<-'No'
pie(table(hr.df$left),main="Left", col=c("green4","Red2"))

This piechart shows that approximately a quarter of its employees left the company

library(RColorBrewer)
par(mfrow=c(1,2))
boxplot(hr.df$satisfaction_level,horizontal = TRUE,main="satisfaction level of employees",xlab="Satisfaction level")
hist(hr.df$satisfaction_level,col=brewer.pal(8,"Greens"),main="",xlab="satisfaction level",breaks=5)

#heatmap(as.matrix(hr.df$satisfaction_level))

library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

ggplot(hr.df, aes(satisfaction_level)) + geom_area( stat = "bin", bins = 30,fill = "steelblue")+ scale_x_continuous(breaks = seq(0,1,0.1))

library(RColorBrewer)
par(mfrow=c(1,2))
boxplot(hr.df$last_evaluation,horizontal = TRUE,main="last evaluation in years",xlab="years")
hist(hr.df$last_evaluation,col=brewer.pal(8,"Greys"),main="",xlab="years",breaks=7)

-> The last evaluation indicates the age of data collected and the data’s age is more or less like 8 months.

library(ggplot2)
ggplot(hr.df, aes(number_project)) + scale_x_continuous("Projects",seq(2,7,by=1))+
  geom_bar(fill="red")

-> Most of the employees did 3 or 4 projects

library(ggplot2)
ggplot(hr.df, aes(average_montly_hours)) +
  geom_bar(fill="blue")+scale_x_continuous("hours", breaks = seq(90,320, by = 50))+labs(title="average working hours per month")

ggplot(hr.df, aes(time_spend_company)) + geom_bar(fill = "darkgreen")+
  coord_flip()+ labs(title = "Years in the company")

par(mfrow=c(1,2))
hr.df$promotion_last_5years[hr.df$promotion_last_5years==1]<-'Yes'
hr.df$promotion_last_5years[hr.df$promotion_last_5years==0]<-'No'
pie(table(hr.df$promotion_last_5years),main="promotion", col=c("orangered","yellowgreen"))

hr.df$Work_accident[hr.df$Work_accident==1]<-'Yes'
hr.df$Work_accident[hr.df$Work_accident==0]<-'No'
pie(table(hr.df$Work_accident),main="Accident in work place", col=c("antiquewhite","darkslategray"))

library(ggplot2)
ggplot(hr.df,  aes( sales,fill = sales) ) +
  geom_bar()

library(ggplot2)
ggplot(hr.df,aes(x = factor(""), fill = salary) ) +
  geom_bar() +
  coord_polar(theta = "y") +
  scale_x_discrete("")

library(ggplot2)        
ggplot(hr.df, aes(satisfaction_level, number_project))+ 
  scale_x_continuous("satisfaction level", breaks = seq(0,1,0.1))+scale_y_continuous("Projects", breaks = seq(2,7,1))+
  labs(title="satisfaction level vs number of projects")+geom_jitter(colour="greenyellow")

-> The higher satisfied people did 3-5 projects

library(ggplot2)
ggplot(hr.df, aes(time_spend_company, fill = sales) ) +
  geom_bar(position = "stack")+labs(title="time spent in company department wise")

ggplot(hr.df, aes(left, fill = sales) ) +
  geom_bar(position = "stack")

-> Most of the people who left the company were from the sales department

hr.df$left[hr.df$left=="Yes"]<-1
hr.df$left[hr.df$left=="No"]<-2
library(lattice)
par(mfrow=c(2,1))
histogram(~left|promotion_last_5years+Work_accident,data=hr.df,main="left vs promotion and work accident histogram")

histogram(~left|salary,data=hr.df,main="left and salary histogram")

#0.96 represents people who left and 2.04 represents people who are present in company

library(ggplot2)
ggplot(hr.df, aes(satisfaction_level,average_montly_hours)) + geom_point(aes(color = sales)) + 
  scale_x_continuous("satisfaction level", breaks = seq(0,1,0.1))+
  scale_y_continuous("hours", breaks = seq(90,310,50))+
  theme_bw() + labs(title="Scatterplot")

hr.df <- read.csv(paste("HR_comma_sep.csv", sep=""))
hr.df$sales <- as.numeric(hr.df$sales)
hr.df$salary <- as.numeric(hr.df$salary)
cor(hr.df)

##                       satisfaction_level last_evaluation number_project
## satisfaction_level            1.00000000     0.105021214   -0.142969586
## last_evaluation               0.10502121     1.000000000    0.349332589
## number_project               -0.14296959     0.349332589    1.000000000
## average_montly_hours         -0.02004811     0.339741800    0.417210634
## time_spend_company           -0.10086607     0.131590722    0.196785891
## Work_accident                 0.05869724    -0.007104289   -0.004740548
## left                         -0.38837498     0.006567120    0.023787185
## promotion_last_5years         0.02560519    -0.008683768   -0.006063958
## sales                         0.01226082     0.006809655    0.019077888
## salary                        0.01175416     0.013964915    0.009671757
##                       average_montly_hours time_spend_company
## satisfaction_level            -0.020048113       -0.100866073
## last_evaluation                0.339741800        0.131590722
## number_project                 0.417210634        0.196785891
## average_montly_hours           1.000000000        0.127754910
## time_spend_company             0.127754910        1.000000000
## Work_accident                 -0.010142888        0.002120418
## left                           0.071287179        0.144822175
## promotion_last_5years         -0.003544414        0.067432925
## sales                          0.007722204       -0.034825154
## salary                         0.007081960       -0.003086256
##                       Work_accident         left promotion_last_5years
## satisfaction_level      0.058697241 -0.388374983           0.025605186
## last_evaluation        -0.007104289  0.006567120          -0.008683768
## number_project         -0.004740548  0.023787185          -0.006063958
## average_montly_hours   -0.010142888  0.071287179          -0.003544414
## time_spend_company      0.002120418  0.144822175           0.067432925
## Work_accident           1.000000000 -0.154621634           0.039245435
## left                   -0.154621634  1.000000000          -0.061788107
## promotion_last_5years   0.039245435 -0.061788107           1.000000000
## sales                   0.011323553  0.009935740          -0.036953987
## salary                 -0.002505606 -0.001293717          -0.001318425
##                              sales       salary
## satisfaction_level     0.012260815  0.011754160
## last_evaluation        0.006809655  0.013964915
## number_project         0.019077888  0.009671757
## average_montly_hours   0.007722204  0.007081960
## time_spend_company    -0.034825154 -0.003086256
## Work_accident          0.011323553 -0.002505606
## left                   0.009935740 -0.001293717
## promotion_last_5years -0.036953987 -0.001318425
## sales                  1.000000000  0.016429857
## salary                 0.016429857  1.000000000

library(corrgram)
corrgram(hr.df, order=TRUE, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of hr.df intercorrelations")

t1<-xtabs(~hr.df$left+hr.df$promotion_last_5years)
t2<-xtabs(~hr.df$left+hr.df$satisfaction_level)
t3<-xtabs(~hr.df$left+hr.df$salary)
t4<-xtabs(~hr.df$left+hr.df$salary)
t5<-xtabs(~hr.df$left+hr.df$time_spend_company)
t6<-xtabs(~hr.df$left+hr.df$average_montly_hours)
t7<-xtabs(~hr.df$left+hr.df$sales)
t8<-xtabs(~hr.df$left+hr.df$number_project)

chisq.test(t1)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  t1
## X-squared = 56.262, df = 1, p-value = 6.344e-14

chisq.test(t2)

## 
##  Pearson's Chi-squared test
## 
## data:  t2
## X-squared = 7937.7, df = 91, p-value < 2.2e-16

chisq.test(t3)

## 
##  Pearson's Chi-squared test
## 
## data:  t3
## X-squared = 381.23, df = 2, p-value < 2.2e-16

chisq.test(t4)

## 
##  Pearson's Chi-squared test
## 
## data:  t4
## X-squared = 381.23, df = 2, p-value < 2.2e-16

chisq.test(t5)

## 
##  Pearson's Chi-squared test
## 
## data:  t5
## X-squared = 2110.1, df = 7, p-value < 2.2e-16

chisq.test(t6)

## Warning in chisq.test(t6): Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  t6
## X-squared = 3623.1, df = 214, p-value < 2.2e-16

chisq.test(t7)

## 
##  Pearson's Chi-squared test
## 
## data:  t7
## X-squared = 86.825, df = 9, p-value = 7.042e-15

chisq.test(t8)

## 
##  Pearson's Chi-squared test
## 
## data:  t8
## X-squared = 5373.6, df = 5, p-value < 2.2e-16

Pearson’s chisquared test states that all the column metrics are significant and the we reject the null hypothesis since the probability is small (p < 0.01).

t.test(hr.df$left - hr.df$satisfaction_level)

## 
##  One Sample t-test
## 
## data:  hr.df$left - hr.df$satisfaction_level
## t = -80.447, df = 14998, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.383882 -0.365620
## sample estimates:
## mean of x 
## -0.374751

t.test(hr.df$left - hr.df$number_project)

## 
##  One Sample t-test
## 
## data:  hr.df$left - hr.df$number_project
## t = -337.28, df = 14998, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -3.585689 -3.544253
## sample estimates:
## mean of x 
## -3.564971

t.test(hr.df$left - hr.df$average_montly_hours)

## 
##  One Sample t-test
## 
## data:  hr.df$left - hr.df$average_montly_hours
## t = -492.71, df = 14998, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -201.6111 -200.0134
## sample estimates:
## mean of x 
## -200.8123

t.test(hr.df$left - hr.df$time_spend_company)

## 
##  One Sample t-test
## 
## data:  hr.df$left - hr.df$time_spend_company
## t = -273.37, df = 14998, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -3.283527 -3.236774
## sample estimates:
## mean of x 
## -3.260151

t.test(hr.df$left - hr.df$Work_accident)

## 
##  One Sample t-test
## 
## data:  hr.df$left - hr.df$Work_accident
## t = 19.31, df = 14998, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.08398479 0.10296101
## sample estimates:
## mean of x 
## 0.0934729

t.test(hr.df$left - hr.df$promotion_last_5years)

## 
##  One Sample t-test
## 
## data:  hr.df$left - hr.df$promotion_last_5years
## t = 57.969, df = 14998, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.2094832 0.2241457
## sample estimates:
## mean of x 
## 0.2168145

t.test(hr.df$left - hr.df$sales)

## 
##  One Sample t-test
## 
## data:  hr.df$left - hr.df$sales
## t = -295.52, df = 14998, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -6.742675 -6.653818
## sample estimates:
## mean of x 
## -6.698247

t.test(hr.df$left - hr.df$salary)

## 
##  One Sample t-test
## 
## data:  hr.df$left - hr.df$salary
## t = -341.03, df = 14998, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -2.121330 -2.097084
## sample estimates:
## mean of x 
## -2.109207

Even in t-test p-values are < 0.01 and we reject the null hypothesis that the column metrics in hr.df and leaving are independent.

Linear regression of satisfaction level

fit <- lm(satisfaction_level ~ average_montly_hours+Work_accident+promotion_last_5years+sales+salary , data = hr.df)
summary(fit)

## 
## Call:
## lm(formula = satisfaction_level ~ average_montly_hours + Work_accident + 
##     promotion_last_5years + sales + salary, data = hr.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.55470 -0.17486  0.02876  0.20363  0.40719 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            6.068e-01  1.236e-02  49.093  < 2e-16 ***
## average_montly_hours  -9.738e-05  4.057e-05  -2.401   0.0164 *  
## Work_accident          4.062e-02  5.765e-03   7.045 1.93e-12 ***
## promotion_last_5years  4.094e-02  1.406e-02   2.911   0.0036 ** 
## sales                  1.126e-03  7.381e-04   1.526   0.1271    
## salary                 4.713e-03  3.238e-03   1.456   0.1455    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2481 on 14993 degrees of freedom
## Multiple R-squared:  0.004665,   Adjusted R-squared:  0.004333 
## F-statistic: 14.05 on 5 and 14993 DF,  p-value: 9.711e-14

linear regression on number of projects done

prom <- lm(number_project ~ satisfaction_level+average_montly_hours+Work_accident+promotion_last_5years+sales+salary , data = hr.df)
summary(prom)

## 
## Call:
## lm(formula = number_project ~ satisfaction_level + average_montly_hours + 
##     Work_accident + promotion_last_5years + sales + salary, data = hr.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9320 -0.9334 -0.0846  0.8333  3.7053 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            2.0636094  0.0594599  34.706   <2e-16 ***
## satisfaction_level    -0.6711281  0.0364662 -18.404   <2e-16 ***
## average_montly_hours   0.0102268  0.0001812  56.449   <2e-16 ***
## Work_accident          0.0254529  0.0257838   0.987   0.3236    
## promotion_last_5years -0.0065286  0.0628034  -0.104   0.9172    
## sales                  0.0077595  0.0032958   2.354   0.0186 *  
## salary                 0.0158775  0.0144571   1.098   0.2721    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.108 on 14992 degrees of freedom
## Multiple R-squared:  0.1926, Adjusted R-squared:  0.1923 
## F-statistic: 596.1 on 6 and 14992 DF,  p-value: < 2.2e-16

linear regression on promotion variable

prom <- lm(promotion_last_5years ~ satisfaction_level+average_montly_hours+Work_accident+number_project+sales+salary , data = hr.df)
summary(prom)

## 
## Call:
## lm(formula = promotion_last_5years ~ satisfaction_level + average_montly_hours + 
##     Work_accident + number_project + sales + salary, data = hr.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.05179 -0.02681 -0.01916 -0.01483  0.99466 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           2.634e-02  8.034e-03   3.278  0.00105 ** 
## satisfaction_level    1.373e-02  4.794e-03   2.863  0.00420 ** 
## average_montly_hours -5.755e-06  2.594e-05  -0.222  0.82446    
## Work_accident         1.569e-02  3.351e-03   4.684 2.84e-06 ***
## number_project       -1.104e-04  1.062e-03  -0.104  0.91721    
## sales                -1.976e-03  4.284e-04  -4.613 4.00e-06 ***
## salary               -1.981e-04  1.880e-03  -0.105  0.91610    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1441 on 14992 degrees of freedom
## Multiple R-squared:  0.003512,   Adjusted R-squared:  0.003113 
## F-statistic: 8.805 on 6 and 14992 DF,  p-value: 1.32e-09

Logistic regression on variable left

model <- glm(left ~.,family=binomial(link='logit'),data=hr.df)
summary(model)

## 
## Call:
## glm(formula = left ~ ., family = binomial(link = "logit"), data = hr.df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3568  -0.6819  -0.4343  -0.1533   3.1068  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            0.054122   0.151993   0.356  0.72178    
## satisfaction_level    -4.129254   0.096584 -42.753  < 2e-16 ***
## last_evaluation        0.762165   0.145708   5.231 1.69e-07 ***
## number_project        -0.310068   0.020850 -14.872  < 2e-16 ***
## average_montly_hours   0.004346   0.000504   8.624  < 2e-16 ***
## time_spend_company     0.228638   0.014855  15.391  < 2e-16 ***
## Work_accident         -1.498575   0.088254 -16.980  < 2e-16 ***
## promotion_last_5years -1.768024   0.255495  -6.920 4.52e-12 ***
## sales                  0.020587   0.007854   2.621  0.00876 ** 
## salary                 0.011953   0.035040   0.341  0.73300    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 16465  on 14998  degrees of freedom
## Residual deviance: 13323  on 14989  degrees of freedom
## AIC: 13343
## 
## Number of Fisher Scoring iterations: 5

anova(model, test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: left
## 
## Terms added sequentially (first to last)
## 
## 
##                       Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                  14998      16465              
## satisfaction_level     1  2266.90     14997      14198 < 2.2e-16 ***
## last_evaluation        1    17.89     14996      14180 2.337e-05 ***
## number_project         1   115.05     14995      14065 < 2.2e-16 ***
## average_montly_hours   1    83.35     14994      13982 < 2.2e-16 ***
## time_spend_company     1   187.24     14993      13794 < 2.2e-16 ***
## Work_accident          1   390.95     14992      13403 < 2.2e-16 ***
## promotion_last_5years  1    73.32     14991      13330 < 2.2e-16 ***
## sales                  1     6.91     14990      13323  0.008558 ** 
## salary                 1     0.12     14989      13323  0.732948    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

test <- hr.df
fitted.results <- predict(model,newdata=subset(test,select=c(1,2,3,4,5,6,8,9,10)),type='response')
fitted.results <- ifelse(fitted.results > 0.5,1,0)
misClasificError <- mean(fitted.results != test$left)
print(paste('Accuracy',1-misClasificError))

## [1] "Accuracy 0.765584372291486"

-> The linear regression on satisfaction level showed that work_accident and department are the most significant factors
-> The linear regression on number of projects done showed that satisfaction level and the average monthly hours are most significant
-> The linear regression on promotion showed that sales,salary,work accident are most significant.
-> when logistic regression was applied to left it showed that all the coloumns metrics are significant and when the model was tested against the same dataset the accuracy was 79.2%

HR Analytics

Ram sankar

30 December 2017

Linear regression of satisfaction level

linear regression on number of projects done

linear regression on promotion variable

Logistic regression on variable left