1. Introduction

Attrition in business can mean the reduction in staff and employees in a company through normal means, such as retirement and resignation, the loss of customers or clients to old age or growing out of the company’s target demographic.Changes in management style, company structure, or other aspects of the company might cause employees to leave the company voluntarily, resulting in a higher attrition rate. Another possible cause of attrition is when a company eliminates a job completely. There are different turnover rates across industries, with hospitality and retail having higher rates compared to other industries. But a high turnover rate can be costly. When you think about your investment in recruiting and training employees and only having them stay on for a short period of time, you are not getting back a return on your investment.Customer attrition generally has a negative effect on the company’s profits and growth. This paper addresses the following issues concerning the attrition of an employee with respect to several paramters. In this paper, we investigate how the general parameters like Education, Department, Monthly Income, OverTime and others impact the attrition of an employee.

2. Literature Review

The specific objective of this literature is to predict if an employee is going to resign or not. The ultimate goal is reduction in attrition using the analysis collected. A proper idea or knowledge about the reasons concerning the attrition of employees helps the human resource team eliminate the skepticism. It majorly helps in cutting down the costs, that a company incurs when its employees resign. It’s rightly said that “It takes a lot of resources to build the Human Resource”. The output variable “Attriton” is a dichotomous variable with values “Yes/No”. The analysis uncovers the factors that lead to employee attrition and explore important questions such as ‘show me a breakdown of distance from home by job role and attrition’ or ‘compare average monthly income by education and attrition’. Methodology: 1. We shall be looking at all variables through some plots and infer about it in our exploratory analysis.2. Through our analysis we intend to build a model which can predict if an employee is about to quit. We will see the implelementaion of logistic regression model which is part of a larger class of algorithms known as Generalized Linear Model (glm).

3. Data Description

For this study, we collected data from IBM Watson Analytics website (https://www.ibm.com/communities/analytics/watson-analytics-blog/hr-employee-attrition/). This is a fictional data set created by IBM data scientists. The original datset is in csv form. It consists of 1470 rows/data points and 35 columns/attributes which tend to impact employee attrition. Some attributes such as EmployeeNumber, EmployeeCount, Over18 and StandardHours,being same for each employee are not related to our study.Therefore among the remaining relevant attributes, there are 16 categorical variables in the dataset viz. (Attrition,BusinessTravel,Department,Education,EducationField,EnvironmentSatisfaction,Gender,JobInvolvement,JobLevel,JobRole,JobSatisfaction,‘’ MaritalStatus,OverTime,PerformanceRating,RelationshipSatisfaction,WorkLifeBalance); and 11 numeric variables namely (Age,DistanceFromHome,MonthlyIncome,NumCompaniesWorked,PercentSalaryHike,TotalWorkingYears,TrainingTimesLastYear,YearsAtCompany,‘’ YearsInCurrentRole,YearsSinceLastPromotion, YearsWithCurrManager). Some categorical variables are coded with numeric values. Such variables are called dummy variables. For example

Education 1 ‘Below College’ 2 ‘College’ 3 ‘Bachelor’ 4 ‘Master’ 5 ‘Doctor’

EnvironmentSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

JobInvolvement 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

JobSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

PerformanceRating 1 ‘Low’ 2 ‘Good’ 3 ‘Excellent’ 4 ‘Outstanding’

RelationshipSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

WorkLifeBalance 1 ‘Bad’ 2 ‘Good’ 3 ‘Better’ 4 ‘Best’

#Reading the dataset
setwd("C:/Users/CJ With HP/Desktop/IIM Lucknow/Datasets")
att.df <- read.csv(paste("WA_Fn-UseC_-HR-Employee-Attrition.csv",sep = ""))
names(att.df)[1]<-"Age"
attach(att.df)
#Description of the attributes
str(att.df)
## 'data.frame':    1470 obs. of  35 variables:
##  $ Age                     : int  41 49 37 33 27 32 59 30 38 36 ...
##  $ Attrition               : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 1 1 1 1 1 ...
##  $ BusinessTravel          : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 2 3 2 3 2 3 3 2 3 ...
##  $ DailyRate               : int  1102 279 1373 1392 591 1005 1324 1358 216 1299 ...
##  $ Department              : Factor w/ 3 levels "Human Resources",..: 3 2 2 2 2 2 2 2 2 2 ...
##  $ DistanceFromHome        : int  1 8 2 3 2 2 3 24 23 27 ...
##  $ Education               : int  2 1 2 4 1 2 3 1 3 3 ...
##  $ EducationField          : Factor w/ 6 levels "Human Resources",..: 2 2 5 2 4 2 4 2 2 4 ...
##  $ EmployeeCount           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ EmployeeNumber          : int  1 2 4 5 7 8 10 11 12 13 ...
##  $ EnvironmentSatisfaction : int  2 3 4 4 1 4 3 4 4 3 ...
##  $ Gender                  : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 2 2 2 ...
##  $ HourlyRate              : int  94 61 92 56 40 79 81 67 44 94 ...
##  $ JobInvolvement          : int  3 2 2 3 3 3 4 3 2 3 ...
##  $ JobLevel                : int  2 2 1 1 1 1 1 1 3 2 ...
##  $ JobRole                 : Factor w/ 9 levels "Healthcare Representative",..: 8 7 3 7 3 3 3 3 5 1 ...
##  $ JobSatisfaction         : int  4 2 3 3 2 4 1 3 3 3 ...
##  $ MaritalStatus           : Factor w/ 3 levels "Divorced","Married",..: 3 2 3 2 2 3 2 1 3 2 ...
##  $ MonthlyIncome           : int  5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
##  $ MonthlyRate             : int  19479 24907 2396 23159 16632 11864 9964 13335 8787 16577 ...
##  $ NumCompaniesWorked      : int  8 1 6 1 9 0 4 1 0 6 ...
##  $ Over18                  : Factor w/ 1 level "Y": 1 1 1 1 1 1 1 1 1 1 ...
##  $ OverTime                : Factor w/ 2 levels "No","Yes": 2 1 2 2 1 1 2 1 1 1 ...
##  $ PercentSalaryHike       : int  11 23 15 11 12 13 20 22 21 13 ...
##  $ PerformanceRating       : int  3 4 3 3 3 3 4 4 4 3 ...
##  $ RelationshipSatisfaction: int  1 4 2 3 4 3 1 2 2 2 ...
##  $ StandardHours           : int  80 80 80 80 80 80 80 80 80 80 ...
##  $ StockOptionLevel        : int  0 1 0 0 1 0 3 1 0 2 ...
##  $ TotalWorkingYears       : int  8 10 7 8 6 8 12 1 10 17 ...
##  $ TrainingTimesLastYear   : int  0 3 3 3 3 2 3 2 2 3 ...
##  $ WorkLifeBalance         : int  1 3 3 3 3 2 2 3 3 2 ...
##  $ YearsAtCompany          : int  6 10 0 8 2 7 1 1 9 7 ...
##  $ YearsInCurrentRole      : int  4 7 0 7 2 7 0 0 7 7 ...
##  $ YearsSinceLastPromotion : int  0 1 0 3 2 3 0 0 1 7 ...
##  $ YearsWithCurrManager    : int  5 7 0 0 2 6 0 0 8 7 ...

The attributes are self explanatory.

Hypothesis: The independent variable strongly impact the attrition of an employee, and they can be collectively grouped together to decide upon attrition.

4. Model Analysis

Logistic regression is a method for fitting a regression curve, y = f(x), when y is a categorical variable. The typical use of this model is predicting y given a set of predictors x. The predictors can be continuous, categorical or a mix of both.

The categorical variable y, in general, can assume different values. In the simplest case scenario y is binary meaning that it can assume either the value 1 or 0. A classical example used in machine learning is email classification: given a set of attributes for each email such as number of words, links and pictures, the algorithm should decide whether the email is spam (1) or not (0). In this post we call the model “binomial logistic regression”, since the variable to predict is binary, however, logistic regression can also be used to predict a dependent variable which can assume more than 2 values. In this second case we call the model “multinomial logistic regression”. A typical example for instance, would be classifying films between “Entertaining”, “borderline” or “boring”.

Logistic regression implementation in R

R makes it very easy to fit a logistic regression model. The function to be called is glm() and the fitting process is not so different from the one used in linear regression.

In order to test Hypothesis, we proposed the following model:

#We split the data into two chunks: training and testing set. The training set will be used to fit our model.
train <- att.df[1:799,]
test<-att.df[800:1400,]
model <- glm(Attrition ~ Age+BusinessTravel+Department+DistanceFromHome+
               Education+EducationField+EnvironmentSatisfaction+Gender+JobInvolvement+
               JobLevel+JobRole+JobSatisfaction+MaritalStatus+MonthlyIncome+NumCompaniesWorked+
               OverTime+PercentSalaryHike+PerformanceRating+RelationshipSatisfaction+
               TotalWorkingYears+TrainingTimesLastYear+WorkLifeBalance+ 
               YearsAtCompany+YearsInCurrentRole+YearsSinceLastPromotion+
               YearsWithCurrManager,family=binomial(link='logit'),data=train)
summary(model)
## 
## Call:
## glm(formula = Attrition ~ Age + BusinessTravel + Department + 
##     DistanceFromHome + Education + EducationField + EnvironmentSatisfaction + 
##     Gender + JobInvolvement + JobLevel + JobRole + JobSatisfaction + 
##     MaritalStatus + MonthlyIncome + NumCompaniesWorked + OverTime + 
##     PercentSalaryHike + PerformanceRating + RelationshipSatisfaction + 
##     TotalWorkingYears + TrainingTimesLastYear + WorkLifeBalance + 
##     YearsAtCompany + YearsInCurrentRole + YearsSinceLastPromotion + 
##     YearsWithCurrManager, family = binomial(link = "logit"), 
##     data = train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0043  -0.4501  -0.2306  -0.0739   3.1932  
## 
## Coefficients:
##                                    Estimate Std. Error z value Pr(>|z|)
## (Intercept)                      -1.150e+01  5.322e+02  -0.022 0.982766
## Age                              -6.303e-02  1.870e-02  -3.371 0.000749
## BusinessTravelTravel_Frequently   1.974e+00  5.809e-01   3.397 0.000680
## BusinessTravelTravel_Rarely       1.242e+00  5.368e-01   2.315 0.020630
## DepartmentResearch & Development  1.343e+01  5.322e+02   0.025 0.979876
## DepartmentSales                   1.241e+01  5.322e+02   0.023 0.981394
## DistanceFromHome                  5.480e-02  1.518e-02   3.610 0.000307
## Education                        -1.416e-02  1.225e-01  -0.116 0.907997
## EducationFieldLife Sciences       6.837e-02  1.322e+00   0.052 0.958766
## EducationFieldMarketing           7.414e-01  1.385e+00   0.535 0.592426
## EducationFieldMedical             7.745e-02  1.318e+00   0.059 0.953129
## EducationFieldOther               4.266e-01  1.416e+00   0.301 0.763285
## EducationFieldTechnical Degree    1.253e+00  1.343e+00   0.933 0.351036
## EnvironmentSatisfaction          -5.098e-01  1.224e-01  -4.164 3.12e-05
## GenderMale                        4.900e-01  2.633e-01   1.861 0.062752
## JobInvolvement                   -6.840e-01  1.773e-01  -3.858 0.000114
## JobLevel                         -1.189e-01  4.370e-01  -0.272 0.785478
## JobRoleHuman Resources            1.552e+01  5.322e+02   0.029 0.976744
## JobRoleLaboratory Technician      2.067e+00  7.027e-01   2.942 0.003260
## JobRoleManager                    1.022e+00  1.085e+00   0.942 0.346325
## JobRoleManufacturing Director     1.970e-01  8.023e-01   0.246 0.806014
## JobRoleResearch Director         -7.318e-02  1.163e+00  -0.063 0.949819
## JobRoleResearch Scientist         1.190e+00  7.076e-01   1.682 0.092531
## JobRoleSales Executive            2.355e+00  1.492e+00   1.579 0.114428
## JobRoleSales Representative       3.407e+00  1.560e+00   2.184 0.028947
## JobSatisfaction                  -4.023e-01  1.119e-01  -3.596 0.000323
## MaritalStatusMarried              6.020e-01  3.579e-01   1.682 0.092589
## MaritalStatusSingle               1.823e+00  3.700e-01   4.928 8.30e-07
## MonthlyIncome                     3.624e-05  1.134e-04   0.320 0.749292
## NumCompaniesWorked                2.039e-01  5.323e-02   3.830 0.000128
## OverTimeYes                       2.251e+00  2.805e-01   8.026 1.01e-15
## PercentSalaryHike                 2.623e-02  5.463e-02   0.480 0.631104
## PerformanceRating                -2.494e-01  5.370e-01  -0.464 0.642297
## RelationshipSatisfaction         -3.057e-01  1.157e-01  -2.643 0.008220
## TotalWorkingYears                -1.451e-02  3.963e-02  -0.366 0.714155
## TrainingTimesLastYear            -1.129e-01  9.801e-02  -1.152 0.249230
## WorkLifeBalance                  -5.250e-01  1.827e-01  -2.874 0.004059
## YearsAtCompany                    9.193e-02  5.055e-02   1.819 0.068960
## YearsInCurrentRole               -1.616e-01  6.191e-02  -2.611 0.009039
## YearsSinceLastPromotion           1.841e-01  5.753e-02   3.200 0.001376
## YearsWithCurrManager             -1.465e-01  6.478e-02  -2.261 0.023754
##                                     
## (Intercept)                         
## Age                              ***
## BusinessTravelTravel_Frequently  ***
## BusinessTravelTravel_Rarely      *  
## DepartmentResearch & Development    
## DepartmentSales                     
## DistanceFromHome                 ***
## Education                           
## EducationFieldLife Sciences         
## EducationFieldMarketing             
## EducationFieldMedical               
## EducationFieldOther                 
## EducationFieldTechnical Degree      
## EnvironmentSatisfaction          ***
## GenderMale                       .  
## JobInvolvement                   ***
## JobLevel                            
## JobRoleHuman Resources              
## JobRoleLaboratory Technician     ** 
## JobRoleManager                      
## JobRoleManufacturing Director       
## JobRoleResearch Director            
## JobRoleResearch Scientist        .  
## JobRoleSales Executive              
## JobRoleSales Representative      *  
## JobSatisfaction                  ***
## MaritalStatusMarried             .  
## MaritalStatusSingle              ***
## MonthlyIncome                       
## NumCompaniesWorked               ***
## OverTimeYes                      ***
## PercentSalaryHike                   
## PerformanceRating                   
## RelationshipSatisfaction         ** 
## TotalWorkingYears                   
## TrainingTimesLastYear               
## WorkLifeBalance                  ** 
## YearsAtCompany                   .  
## YearsInCurrentRole               ** 
## YearsSinceLastPromotion          ** 
## YearsWithCurrManager             *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 725.87  on 798  degrees of freedom
## Residual deviance: 448.26  on 758  degrees of freedom
## AIC: 530.26
## 
## Number of Fisher Scoring iterations: 14

Now we can run the anova() function on the model to analyze the table of deviance

anova(model, test="Chisq")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: Attrition
## 
## Terms added sequentially (first to last)
## 
## 
##                          Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                       798     725.87              
## Age                       1   31.065       797     694.80 2.496e-08 ***
## BusinessTravel            2    8.372       795     686.43  0.015210 *  
## Department                2    2.298       793     684.13  0.316889    
## DistanceFromHome          1    6.101       792     678.03  0.013510 *  
## Education                 1    0.023       791     678.01  0.878955    
## EducationField            5    7.552       786     670.46  0.182731    
## EnvironmentSatisfaction   1   10.023       785     660.43  0.001546 ** 
## Gender                    1    1.472       784     658.96  0.224988    
## JobInvolvement            1   20.252       783     638.71 6.789e-06 ***
## JobLevel                  1   16.670       782     622.04 4.448e-05 ***
## JobRole                   8   15.308       774     606.73  0.053435 .  
## JobSatisfaction           1    9.577       773     597.16  0.001970 ** 
## MaritalStatus             2   19.656       771     577.50 5.392e-05 ***
## MonthlyIncome             1    0.287       770     577.21  0.591974    
## NumCompaniesWorked        1   15.189       769     562.02 9.725e-05 ***
## OverTime                  1   74.021       768     488.00 < 2.2e-16 ***
## PercentSalaryHike         1    0.104       767     487.90  0.746720    
## PerformanceRating         1    0.254       766     487.64  0.614543    
## RelationshipSatisfaction  1    4.847       765     482.80  0.027696 *  
## TotalWorkingYears         1    0.436       764     482.36  0.508824    
## TrainingTimesLastYear     1    1.705       763     480.66  0.191613    
## WorkLifeBalance           1    8.887       762     471.77  0.002873 ** 
## YearsAtCompany            1    0.001       761     471.77  0.975419    
## YearsInCurrentRole        1    8.053       760     463.71  0.004543 ** 
## YearsSinceLastPromotion   1   10.375       759     453.34  0.001277 ** 
## YearsWithCurrManager      1    5.075       758     448.26  0.024273 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5. Disussion

Interpreting the results of our logistic regression model

Now we can analyze the fitting and interpret what the model is telling us. First of all, we can see that (Age, BusinessTravel, DistanceFromHome, EnvironmentSatisfaction, JobInvolvement, JobRole, JobSatisfaction,MaritalStatus, NumCompaniesWorked, OverTime, RelationshipSatisfaction, WorkLifeBalance, YearsInCurrentRole, YearsSinceLastPromotion, YearsWithCurrManager) are statistically significant.

As for the statistically significant variables, Overtime and Marital Staus has the lowest p-value suggesting a strong association.In the logit model the response variable is log odds: ln(odds) = ln(p/(1-p)) = a/x1 + b/x2 + . + z*xn.

The difference between the null deviance and the residual deviance shows how our model is doing against the null model (a model with only the intercept). The wider this gap, the better. Analyzing the anova table we can see the drop in deviance when adding each variable one at a time. A large p-value here indicates that the model without the variable explains more or less the same amount of variation. Ultimately what you would like to see is a significant drop in deviance and the AIC.

6. Conclusion

This paper was motivated by the need for research that could reduce the attrition rate in an industry, therby incrementing the productivity. The unique contribution of this paper is that we investigated the attributes which majorly contribute in the attrition. We found that parameters such as Overtime, Marital Status, Previous Work Experience, Environmental Satisfaction are some of the parameters, which show strong association with attrition, positively or negatively.

7. References

An introduction to logistic regression analysis and reporting CYJ Peng, KL Lee, GM Ingersoll - . journal of educational research, 2002 - Taylor & Francis . & Kravitz, 1994; Tolman & Weisz, 1995)

Goodness-of-fit test for a logistic regression model fitted using survey sample data KJ Archer, S Lemeshow - Stata Journal, 2006

Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and .I Yilmaz - Environmental Earth Sciences, 2010 - Springer

Calculation of polychotomous logistic regression parameters using individualized regressions

7. Appendix : Online Resources

https://www.analyticsvidhya.com/blog/2015/11/beginners-guide-on-logistic-regression-in-r/

https://rpubs.com/bpr1989/HRAnalysis

http://rpubs.com/SameerMathur/HTNF

http://recruitloop.com/blog/7-ways-reduce-employee-attrition/

https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset/data

8. R Code supporting the study

1. Creating a descriptive analysis for each variable

str(att.df)
## 'data.frame':    1470 obs. of  35 variables:
##  $ Age                     : int  41 49 37 33 27 32 59 30 38 36 ...
##  $ Attrition               : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 1 1 1 1 1 ...
##  $ BusinessTravel          : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 2 3 2 3 2 3 3 2 3 ...
##  $ DailyRate               : int  1102 279 1373 1392 591 1005 1324 1358 216 1299 ...
##  $ Department              : Factor w/ 3 levels "Human Resources",..: 3 2 2 2 2 2 2 2 2 2 ...
##  $ DistanceFromHome        : int  1 8 2 3 2 2 3 24 23 27 ...
##  $ Education               : int  2 1 2 4 1 2 3 1 3 3 ...
##  $ EducationField          : Factor w/ 6 levels "Human Resources",..: 2 2 5 2 4 2 4 2 2 4 ...
##  $ EmployeeCount           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ EmployeeNumber          : int  1 2 4 5 7 8 10 11 12 13 ...
##  $ EnvironmentSatisfaction : int  2 3 4 4 1 4 3 4 4 3 ...
##  $ Gender                  : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 2 2 2 ...
##  $ HourlyRate              : int  94 61 92 56 40 79 81 67 44 94 ...
##  $ JobInvolvement          : int  3 2 2 3 3 3 4 3 2 3 ...
##  $ JobLevel                : int  2 2 1 1 1 1 1 1 3 2 ...
##  $ JobRole                 : Factor w/ 9 levels "Healthcare Representative",..: 8 7 3 7 3 3 3 3 5 1 ...
##  $ JobSatisfaction         : int  4 2 3 3 2 4 1 3 3 3 ...
##  $ MaritalStatus           : Factor w/ 3 levels "Divorced","Married",..: 3 2 3 2 2 3 2 1 3 2 ...
##  $ MonthlyIncome           : int  5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
##  $ MonthlyRate             : int  19479 24907 2396 23159 16632 11864 9964 13335 8787 16577 ...
##  $ NumCompaniesWorked      : int  8 1 6 1 9 0 4 1 0 6 ...
##  $ Over18                  : Factor w/ 1 level "Y": 1 1 1 1 1 1 1 1 1 1 ...
##  $ OverTime                : Factor w/ 2 levels "No","Yes": 2 1 2 2 1 1 2 1 1 1 ...
##  $ PercentSalaryHike       : int  11 23 15 11 12 13 20 22 21 13 ...
##  $ PerformanceRating       : int  3 4 3 3 3 3 4 4 4 3 ...
##  $ RelationshipSatisfaction: int  1 4 2 3 4 3 1 2 2 2 ...
##  $ StandardHours           : int  80 80 80 80 80 80 80 80 80 80 ...
##  $ StockOptionLevel        : int  0 1 0 0 1 0 3 1 0 2 ...
##  $ TotalWorkingYears       : int  8 10 7 8 6 8 12 1 10 17 ...
##  $ TrainingTimesLastYear   : int  0 3 3 3 3 2 3 2 2 3 ...
##  $ WorkLifeBalance         : int  1 3 3 3 3 2 2 3 3 2 ...
##  $ YearsAtCompany          : int  6 10 0 8 2 7 1 1 9 7 ...
##  $ YearsInCurrentRole      : int  4 7 0 7 2 7 0 0 7 7 ...
##  $ YearsSinceLastPromotion : int  0 1 0 3 2 3 0 0 1 7 ...
##  $ YearsWithCurrManager    : int  5 7 0 0 2 6 0 0 8 7 ...
library(psych)
describe(att.df)
##                          vars    n     mean      sd  median  trimmed
## Age                         1 1470    36.92    9.14    36.0    36.47
## Attrition*                  2 1470     1.16    0.37     1.0     1.08
## BusinessTravel*             3 1470     2.61    0.67     3.0     2.76
## DailyRate                   4 1470   802.49  403.51   802.0   803.83
## Department*                 5 1470     2.26    0.53     2.0     2.25
## DistanceFromHome            6 1470     9.19    8.11     7.0     8.08
## Education                   7 1470     2.91    1.02     3.0     2.98
## EducationField*             8 1470     3.25    1.33     3.0     3.10
## EmployeeCount               9 1470     1.00    0.00     1.0     1.00
## EmployeeNumber             10 1470  1024.87  602.02  1020.5  1023.40
## EnvironmentSatisfaction    11 1470     2.72    1.09     3.0     2.78
## Gender*                    12 1470     1.60    0.49     2.0     1.62
## HourlyRate                 13 1470    65.89   20.33    66.0    66.02
## JobInvolvement             14 1470     2.73    0.71     3.0     2.74
## JobLevel                   15 1470     2.06    1.11     2.0     1.90
## JobRole*                   16 1470     5.46    2.46     6.0     5.61
## JobSatisfaction            17 1470     2.73    1.10     3.0     2.79
## MaritalStatus*             18 1470     2.10    0.73     2.0     2.12
## MonthlyIncome              19 1470  6502.93 4707.96  4919.0  5667.24
## MonthlyRate                20 1470 14313.10 7117.79 14235.5 14286.48
## NumCompaniesWorked         21 1470     2.69    2.50     2.0     2.36
## Over18*                    22 1470     1.00    0.00     1.0     1.00
## OverTime*                  23 1470     1.28    0.45     1.0     1.23
## PercentSalaryHike          24 1470    15.21    3.66    14.0    14.80
## PerformanceRating          25 1470     3.15    0.36     3.0     3.07
## RelationshipSatisfaction   26 1470     2.71    1.08     3.0     2.77
## StandardHours              27 1470    80.00    0.00    80.0    80.00
## StockOptionLevel           28 1470     0.79    0.85     1.0     0.67
## TotalWorkingYears          29 1470    11.28    7.78    10.0    10.37
## TrainingTimesLastYear      30 1470     2.80    1.29     3.0     2.72
## WorkLifeBalance            31 1470     2.76    0.71     3.0     2.77
## YearsAtCompany             32 1470     7.01    6.13     5.0     5.99
## YearsInCurrentRole         33 1470     4.23    3.62     3.0     3.85
## YearsSinceLastPromotion    34 1470     2.19    3.22     1.0     1.48
## YearsWithCurrManager       35 1470     4.12    3.57     3.0     3.77
##                              mad  min   max range  skew kurtosis     se
## Age                         8.90   18    60    42  0.41    -0.41   0.24
## Attrition*                  0.00    1     2     1  1.84     1.39   0.01
## BusinessTravel*             0.00    1     3     2 -1.44     0.69   0.02
## DailyRate                 510.01  102  1499  1397  0.00    -1.21  10.52
## Department*                 0.00    1     3     2  0.17    -0.40   0.01
## DistanceFromHome            7.41    1    29    28  0.96    -0.23   0.21
## Education                   1.48    1     5     4 -0.29    -0.56   0.03
## EducationField*             1.48    1     6     5  0.55    -0.69   0.03
## EmployeeCount               0.00    1     1     0   NaN      NaN   0.00
## EmployeeNumber            790.97    1  2068  2067  0.02    -1.23  15.70
## EnvironmentSatisfaction     1.48    1     4     3 -0.32    -1.20   0.03
## Gender*                     0.00    1     2     1 -0.41    -1.83   0.01
## HourlyRate                 26.69   30   100    70 -0.03    -1.20   0.53
## JobInvolvement              0.00    1     4     3 -0.50     0.26   0.02
## JobLevel                    1.48    1     5     4  1.02     0.39   0.03
## JobRole*                    2.97    1     9     8 -0.36    -1.20   0.06
## JobSatisfaction             1.48    1     4     3 -0.33    -1.22   0.03
## MaritalStatus*              1.48    1     3     2 -0.15    -1.12   0.02
## MonthlyIncome            3260.24 1009 19999 18990  1.37     0.99 122.79
## MonthlyRate              9201.76 2094 26999 24905  0.02    -1.22 185.65
## NumCompaniesWorked          1.48    0     9     9  1.02     0.00   0.07
## Over18*                     0.00    1     1     0   NaN      NaN   0.00
## OverTime*                   0.00    1     2     1  0.96    -1.07   0.01
## PercentSalaryHike           2.97   11    25    14  0.82    -0.31   0.10
## PerformanceRating           0.00    3     4     1  1.92     1.68   0.01
## RelationshipSatisfaction    1.48    1     4     3 -0.30    -1.19   0.03
## StandardHours               0.00   80    80     0   NaN      NaN   0.00
## StockOptionLevel            1.48    0     3     3  0.97     0.35   0.02
## TotalWorkingYears           5.93    0    40    40  1.11     0.91   0.20
## TrainingTimesLastYear       1.48    0     6     6  0.55     0.48   0.03
## WorkLifeBalance             0.00    1     4     3 -0.55     0.41   0.02
## YearsAtCompany              4.45    0    40    40  1.76     3.91   0.16
## YearsInCurrentRole          4.45    0    18    18  0.92     0.47   0.09
## YearsSinceLastPromotion     1.48    0    15    15  1.98     3.59   0.08
## YearsWithCurrManager        4.45    0    17    17  0.83     0.16   0.09

2. Creating a one way contingency table for all the categorical variables in the dataset

table(Attrition)
## Attrition
##   No  Yes 
## 1233  237
table(BusinessTravel)
## BusinessTravel
##        Non-Travel Travel_Frequently     Travel_Rarely 
##               150               277              1043
table(Department)
## Department
##        Human Resources Research & Development                  Sales 
##                     63                    961                    446
table(EducationField)
## EducationField
##  Human Resources    Life Sciences        Marketing          Medical 
##               27              606              159              464 
##            Other Technical Degree 
##               82              132
table(Education)
## Education
##   1   2   3   4   5 
## 170 282 572 398  48
table(EnvironmentSatisfaction)
## EnvironmentSatisfaction
##   1   2   3   4 
## 284 287 453 446
table(Gender)
## Gender
## Female   Male 
##    588    882
table(JobInvolvement)
## JobInvolvement
##   1   2   3   4 
##  83 375 868 144
table(JobRole)
## JobRole
## Healthcare Representative           Human Resources 
##                       131                        52 
##     Laboratory Technician                   Manager 
##                       259                       102 
##    Manufacturing Director         Research Director 
##                       145                        80 
##        Research Scientist           Sales Executive 
##                       292                       326 
##      Sales Representative 
##                        83
table(JobLevel)
## JobLevel
##   1   2   3   4   5 
## 543 534 218 106  69
table(JobSatisfaction)
## JobSatisfaction
##   1   2   3   4 
## 289 280 442 459
table(MaritalStatus)
## MaritalStatus
## Divorced  Married   Single 
##      327      673      470
table(OverTime)
## OverTime
##   No  Yes 
## 1054  416

3. Creating a two way contingency table for all the categorical variables in the dataset

mytable<- xtabs(~PerformanceRating+Attrition,data=att.df)
round(prop.table(mytable,1)*100,2)
##                  Attrition
## PerformanceRating    No   Yes
##                 3 83.92 16.08
##                 4 83.63 16.37
mytable<- xtabs(~OverTime+Attrition,data=att.df)
round(prop.table(mytable,1)*100,2)
##         Attrition
## OverTime    No   Yes
##      No  89.56 10.44
##      Yes 69.47 30.53
mytable<- xtabs(~WorkLifeBalance+Attrition,data=att.df) 
round(prop.table(mytable,1)*100,2)
##                Attrition
## WorkLifeBalance    No   Yes
##               1 68.75 31.25
##               2 83.14 16.86
##               3 85.78 14.22
##               4 82.35 17.65
mytable<- xtabs(~JobRole+Attrition,data=att.df) 
round(prop.table(mytable,1)*100,2)
##                            Attrition
## JobRole                        No   Yes
##   Healthcare Representative 93.13  6.87
##   Human Resources           76.92 23.08
##   Laboratory Technician     76.06 23.94
##   Manager                   95.10  4.90
##   Manufacturing Director    93.10  6.90
##   Research Director         97.50  2.50
##   Research Scientist        83.90 16.10
##   Sales Executive           82.52 17.48
##   Sales Representative      60.24 39.76
mytable<- xtabs(~NumCompaniesWorked+Attrition,data=att.df)
round(prop.table(mytable,1)*100,2)
##                   Attrition
## NumCompaniesWorked    No   Yes
##                  0 88.32 11.68
##                  1 81.19 18.81
##                  2 89.04 10.96
##                  3 89.94 10.06
##                  4 87.77 12.23
##                  5 74.60 25.40
##                  6 77.14 22.86
##                  7 77.03 22.97
##                  8 87.76 12.24
##                  9 76.92 23.08
mytable<- xtabs(~MaritalStatus+Attrition,data=att.df) 
round(prop.table(mytable,1)*100,2) 
##              Attrition
## MaritalStatus    No   Yes
##      Divorced 89.91 10.09
##      Married  87.52 12.48
##      Single   74.47 25.53
mytable<- xtabs(~Gender+Attrition,data=att.df) 
round(prop.table(mytable,1)*100,2)
##         Attrition
## Gender      No   Yes
##   Female 85.20 14.80
##   Male   82.99 17.01
mytable<- xtabs(~EnvironmentSatisfaction+Attrition,data=att.df) 
round(prop.table(mytable,1)*100,2)
##                        Attrition
## EnvironmentSatisfaction    No   Yes
##                       1 74.65 25.35
##                       2 85.02 14.98
##                       3 86.31 13.69
##                       4 86.55 13.45
mytable<- xtabs(~BusinessTravel+Attrition,data=att.df) 
round(prop.table(mytable,1)*100,2)
##                    Attrition
## BusinessTravel         No   Yes
##   Non-Travel        92.00  8.00
##   Travel_Frequently 75.09 24.91
##   Travel_Rarely     85.04 14.96
mytable<- xtabs(~EducationField+Attrition,data=att.df) 
round(prop.table(mytable,1)*100,2)
##                   Attrition
## EducationField        No   Yes
##   Human Resources  74.07 25.93
##   Life Sciences    85.31 14.69
##   Marketing        77.99 22.01
##   Medical          86.42 13.58
##   Other            86.59 13.41
##   Technical Degree 75.76 24.24
mytable<- xtabs(~JobSatisfaction+Attrition,data=att.df)
round(prop.table(mytable,1)*100,2)
##                Attrition
## JobSatisfaction    No   Yes
##               1 77.16 22.84
##               2 83.57 16.43
##               3 83.48 16.52
##               4 88.67 11.33
mytable<- xtabs(~Education+Attrition,data=att.df) 
round(prop.table(mytable,1)*100,2)
##          Attrition
## Education    No   Yes
##         1 81.76 18.24
##         2 84.40 15.60
##         3 82.69 17.31
##         4 85.43 14.57
##         5 89.58 10.42

4. Boxplots

boxplot(Age~Attrition,main="Boxplot",xlab="Age",ylab = "Attrition(Yes/No)",horizontal=TRUE,col=c("pink","lightblue"))

boxplot(DistanceFromHome~Attrition,main="Boxplot",xlab="DistanceFromHome",ylab = "Attrition(Yes/No)",horizontal=TRUE,col=c("pink","lightblue"))

boxplot(MonthlyIncome~Attrition,main="Boxplot",xlab="MonthlyIncome",ylab = "Attrition(Yes/No)",horizontal=TRUE,col=c("pink","lightblue"))

boxplot(YearsWithCurrManager~Attrition,main="Boxplot",xlab="YearsWithCurManager",ylab = "Attrition(Yes/No)",horizontal=TRUE,col=c("pink","lightblue"))

boxplot(MonthlyRate~Attrition,main="Boxplot",xlab="MonthlyRate",ylab = "Attrition(Yes/No)",horizontal=TRUE,col=c("pink","lightblue"))

boxplot(DailyRate~Attrition,main="Boxplot",xlab="DailyRate",ylab = "Attrition(Yes/No)",horizontal=TRUE,col=c("pink","lightblue"))

boxplot(HourlyRate~Attrition,main="Boxplot",xlab="HourlyRate",ylab = "Attrition(Yes/No)",horizontal=TRUE,col=c("pink","lightblue"))

5.Histograms

hist(Age,xlab="age",ylab="count",breaks=20,main="Age variability in the company",col="lightblue",freq=FALSE)

hist(MonthlyIncome,xlab="MonthlyIncome",ylab="count",breaks=20,main="MonthlyIncome",col="lightblue",ylim=c(0,400))

hist(YearsAtCompany,xlab="YearsAtCompany",ylab="count",breaks=20,main="YearsAtcompany",col="lightblue",ylim=c(0,400))

hist(YearsWithCurrManager,xlab="YearswithCurManager",ylab="count",breaks=20,main="YearsWithCurManager",col="lightblue",ylim=c(0,400))

hist(PercentSalaryHike,xlab="PercentSalaryHike",ylab="count",breaks=20,main="PercentSalaryHike",col="lightblue")

library(lattice)
histogram(~Attrition|JobRole) 

histogram(~Attrition|Department,layout=c(4,1),col=c("lightblue","pink")) 

histogram(~PercentSalaryHike|Attrition) 

histogram(~Education|Attrition) 

6. CorrelationMatrix

round(cor(att.df[,c(1,4,6,7,11,13,14,15,17,19,20,21,24,25,26,29:35)]),2)
##                            Age DailyRate DistanceFromHome Education
## Age                       1.00      0.01             0.00      0.21
## DailyRate                 0.01      1.00             0.00     -0.02
## DistanceFromHome          0.00      0.00             1.00      0.02
## Education                 0.21     -0.02             0.02      1.00
## EnvironmentSatisfaction   0.01      0.02            -0.02     -0.03
## HourlyRate                0.02      0.02             0.03      0.02
## JobInvolvement            0.03      0.05             0.01      0.04
## JobLevel                  0.51      0.00             0.01      0.10
## JobSatisfaction           0.00      0.03             0.00     -0.01
## MonthlyIncome             0.50      0.01            -0.02      0.09
## MonthlyRate               0.03     -0.03             0.03     -0.03
## NumCompaniesWorked        0.30      0.04            -0.03      0.13
## PercentSalaryHike         0.00      0.02             0.04     -0.01
## PerformanceRating         0.00      0.00             0.03     -0.02
## RelationshipSatisfaction  0.05      0.01             0.01     -0.01
## TotalWorkingYears         0.68      0.01             0.00      0.15
## TrainingTimesLastYear    -0.02      0.00            -0.04     -0.03
## WorkLifeBalance          -0.02     -0.04            -0.03      0.01
## YearsAtCompany            0.31     -0.03             0.01      0.07
## YearsInCurrentRole        0.21      0.01             0.02      0.06
## YearsSinceLastPromotion   0.22     -0.03             0.01      0.05
## YearsWithCurrManager      0.20     -0.03             0.01      0.07
##                          EnvironmentSatisfaction HourlyRate JobInvolvement
## Age                                         0.01       0.02           0.03
## DailyRate                                   0.02       0.02           0.05
## DistanceFromHome                           -0.02       0.03           0.01
## Education                                  -0.03       0.02           0.04
## EnvironmentSatisfaction                     1.00      -0.05          -0.01
## HourlyRate                                 -0.05       1.00           0.04
## JobInvolvement                             -0.01       0.04           1.00
## JobLevel                                    0.00      -0.03          -0.01
## JobSatisfaction                            -0.01      -0.07          -0.02
## MonthlyIncome                              -0.01      -0.02          -0.02
## MonthlyRate                                 0.04      -0.02          -0.02
## NumCompaniesWorked                          0.01       0.02           0.02
## PercentSalaryHike                          -0.03      -0.01          -0.02
## PerformanceRating                          -0.03       0.00          -0.03
## RelationshipSatisfaction                    0.01       0.00           0.03
## TotalWorkingYears                           0.00       0.00          -0.01
## TrainingTimesLastYear                      -0.02      -0.01          -0.02
## WorkLifeBalance                             0.03       0.00          -0.01
## YearsAtCompany                              0.00      -0.02          -0.02
## YearsInCurrentRole                          0.02      -0.02           0.01
## YearsSinceLastPromotion                     0.02      -0.03          -0.02
## YearsWithCurrManager                        0.00      -0.02           0.03
##                          JobLevel JobSatisfaction MonthlyIncome
## Age                          0.51            0.00          0.50
## DailyRate                    0.00            0.03          0.01
## DistanceFromHome             0.01            0.00         -0.02
## Education                    0.10           -0.01          0.09
## EnvironmentSatisfaction      0.00           -0.01         -0.01
## HourlyRate                  -0.03           -0.07         -0.02
## JobInvolvement              -0.01           -0.02         -0.02
## JobLevel                     1.00            0.00          0.95
## JobSatisfaction              0.00            1.00         -0.01
## MonthlyIncome                0.95           -0.01          1.00
## MonthlyRate                  0.04            0.00          0.03
## NumCompaniesWorked           0.14           -0.06          0.15
## PercentSalaryHike           -0.03            0.02         -0.03
## PerformanceRating           -0.02            0.00         -0.02
## RelationshipSatisfaction     0.02           -0.01          0.03
## TotalWorkingYears            0.78           -0.02          0.77
## TrainingTimesLastYear       -0.02           -0.01         -0.02
## WorkLifeBalance              0.04           -0.02          0.03
## YearsAtCompany               0.53            0.00          0.51
## YearsInCurrentRole           0.39            0.00          0.36
## YearsSinceLastPromotion      0.35           -0.02          0.34
## YearsWithCurrManager         0.38           -0.03          0.34
##                          MonthlyRate NumCompaniesWorked PercentSalaryHike
## Age                             0.03               0.30              0.00
## DailyRate                      -0.03               0.04              0.02
## DistanceFromHome                0.03              -0.03              0.04
## Education                      -0.03               0.13             -0.01
## EnvironmentSatisfaction         0.04               0.01             -0.03
## HourlyRate                     -0.02               0.02             -0.01
## JobInvolvement                 -0.02               0.02             -0.02
## JobLevel                        0.04               0.14             -0.03
## JobSatisfaction                 0.00              -0.06              0.02
## MonthlyIncome                   0.03               0.15             -0.03
## MonthlyRate                     1.00               0.02             -0.01
## NumCompaniesWorked              0.02               1.00             -0.01
## PercentSalaryHike              -0.01              -0.01              1.00
## PerformanceRating              -0.01              -0.01              0.77
## RelationshipSatisfaction        0.00               0.05             -0.04
## TotalWorkingYears               0.03               0.24             -0.02
## TrainingTimesLastYear           0.00              -0.07             -0.01
## WorkLifeBalance                 0.01              -0.01              0.00
## YearsAtCompany                 -0.02              -0.12             -0.04
## YearsInCurrentRole             -0.01              -0.09              0.00
## YearsSinceLastPromotion         0.00              -0.04             -0.02
## YearsWithCurrManager           -0.04              -0.11             -0.01
##                          PerformanceRating RelationshipSatisfaction
## Age                                   0.00                     0.05
## DailyRate                             0.00                     0.01
## DistanceFromHome                      0.03                     0.01
## Education                            -0.02                    -0.01
## EnvironmentSatisfaction              -0.03                     0.01
## HourlyRate                            0.00                     0.00
## JobInvolvement                       -0.03                     0.03
## JobLevel                             -0.02                     0.02
## JobSatisfaction                       0.00                    -0.01
## MonthlyIncome                        -0.02                     0.03
## MonthlyRate                          -0.01                     0.00
## NumCompaniesWorked                   -0.01                     0.05
## PercentSalaryHike                     0.77                    -0.04
## PerformanceRating                     1.00                    -0.03
## RelationshipSatisfaction             -0.03                     1.00
## TotalWorkingYears                     0.01                     0.02
## TrainingTimesLastYear                -0.02                     0.00
## WorkLifeBalance                       0.00                     0.02
## YearsAtCompany                        0.00                     0.02
## YearsInCurrentRole                    0.03                    -0.02
## YearsSinceLastPromotion               0.02                     0.03
## YearsWithCurrManager                  0.02                     0.00
##                          TotalWorkingYears TrainingTimesLastYear
## Age                                   0.68                 -0.02
## DailyRate                             0.01                  0.00
## DistanceFromHome                      0.00                 -0.04
## Education                             0.15                 -0.03
## EnvironmentSatisfaction               0.00                 -0.02
## HourlyRate                            0.00                 -0.01
## JobInvolvement                       -0.01                 -0.02
## JobLevel                              0.78                 -0.02
## JobSatisfaction                      -0.02                 -0.01
## MonthlyIncome                         0.77                 -0.02
## MonthlyRate                           0.03                  0.00
## NumCompaniesWorked                    0.24                 -0.07
## PercentSalaryHike                    -0.02                 -0.01
## PerformanceRating                     0.01                 -0.02
## RelationshipSatisfaction              0.02                  0.00
## TotalWorkingYears                     1.00                 -0.04
## TrainingTimesLastYear                -0.04                  1.00
## WorkLifeBalance                       0.00                  0.03
## YearsAtCompany                        0.63                  0.00
## YearsInCurrentRole                    0.46                 -0.01
## YearsSinceLastPromotion               0.40                  0.00
## YearsWithCurrManager                  0.46                  0.00
##                          WorkLifeBalance YearsAtCompany YearsInCurrentRole
## Age                                -0.02           0.31               0.21
## DailyRate                          -0.04          -0.03               0.01
## DistanceFromHome                   -0.03           0.01               0.02
## Education                           0.01           0.07               0.06
## EnvironmentSatisfaction             0.03           0.00               0.02
## HourlyRate                          0.00          -0.02              -0.02
## JobInvolvement                     -0.01          -0.02               0.01
## JobLevel                            0.04           0.53               0.39
## JobSatisfaction                    -0.02           0.00               0.00
## MonthlyIncome                       0.03           0.51               0.36
## MonthlyRate                         0.01          -0.02              -0.01
## NumCompaniesWorked                 -0.01          -0.12              -0.09
## PercentSalaryHike                   0.00          -0.04               0.00
## PerformanceRating                   0.00           0.00               0.03
## RelationshipSatisfaction            0.02           0.02              -0.02
## TotalWorkingYears                   0.00           0.63               0.46
## TrainingTimesLastYear               0.03           0.00              -0.01
## WorkLifeBalance                     1.00           0.01               0.05
## YearsAtCompany                      0.01           1.00               0.76
## YearsInCurrentRole                  0.05           0.76               1.00
## YearsSinceLastPromotion             0.01           0.62               0.55
## YearsWithCurrManager                0.00           0.77               0.71
##                          YearsSinceLastPromotion YearsWithCurrManager
## Age                                         0.22                 0.20
## DailyRate                                  -0.03                -0.03
## DistanceFromHome                            0.01                 0.01
## Education                                   0.05                 0.07
## EnvironmentSatisfaction                     0.02                 0.00
## HourlyRate                                 -0.03                -0.02
## JobInvolvement                             -0.02                 0.03
## JobLevel                                    0.35                 0.38
## JobSatisfaction                            -0.02                -0.03
## MonthlyIncome                               0.34                 0.34
## MonthlyRate                                 0.00                -0.04
## NumCompaniesWorked                         -0.04                -0.11
## PercentSalaryHike                          -0.02                -0.01
## PerformanceRating                           0.02                 0.02
## RelationshipSatisfaction                    0.03                 0.00
## TotalWorkingYears                           0.40                 0.46
## TrainingTimesLastYear                       0.00                 0.00
## WorkLifeBalance                             0.01                 0.00
## YearsAtCompany                              0.62                 0.77
## YearsInCurrentRole                          0.55                 0.71
## YearsSinceLastPromotion                     1.00                 0.51
## YearsWithCurrManager                        0.51                 1.00

7. Creating a corrgram

library(corrplot)
## corrplot 0.84 loaded
corrplot(corr=cor(att.df[,c(1,4,6,7,11,13,14,15,17,19,20,21,24,25,26,29:35)],use="complete.obs"),method="ellipse")

8. Checking hypothesis using t-test

H1. To check whether there is a significant difference in the means of the monthly income of the employees who leave and those who don’t

log.trans.Income = log(MonthlyIncome)
t.test(log.trans.Income~Attrition,var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  log.trans.Income by Attrition
## t = 7.7481, df = 1468, p-value = 1.73e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2673770 0.4486555
## sample estimates:
##  mean in group No mean in group Yes 
##          8.610236          8.252220

H2. To check whether there is a significant difference in the means of the monthly rate of the employees who leave and those who don’t

log.trans.Rate = log(MonthlyRate)
t.test(log.trans.Rate~Attrition,var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  log.trans.Rate by Attrition
## t = -0.47592, df = 1468, p-value = 0.6342
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.10954526  0.06676779
## sample estimates:
##  mean in group No mean in group Yes 
##          9.398882          9.420271

H3. To check whether there is a significant difference in the means of the daily rate of the employees who leave and those who don’t

log.trans.DailyRate = log(DailyRate)
t.test(log.trans.DailyRate~Attrition,var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  log.trans.DailyRate by Attrition
## t = 1.9013, df = 1468, p-value = 0.05746
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.002824123  0.180830501
## sample estimates:
##  mean in group No mean in group Yes 
##          6.525604          6.436601

9. Chi-square test

mytable<- xtabs(~BusinessTravel+Attrition,data=att.df) 
round(prop.table(mytable,1)*100,2)
##                    Attrition
## BusinessTravel         No   Yes
##   Non-Travel        92.00  8.00
##   Travel_Frequently 75.09 24.91
##   Travel_Rarely     85.04 14.96
chisq.test(mytable)
## 
##  Pearson's Chi-squared test
## 
## data:  mytable
## X-squared = 24.182, df = 2, p-value = 5.609e-06
mytable<- xtabs(~JobSatisfaction+Attrition,data=att.df)
round(prop.table(mytable,1)*100,2)
##                Attrition
## JobSatisfaction    No   Yes
##               1 77.16 22.84
##               2 83.57 16.43
##               3 83.48 16.52
##               4 88.67 11.33
chisq.test(mytable) 
## 
##  Pearson's Chi-squared test
## 
## data:  mytable
## X-squared = 17.505, df = 3, p-value = 0.0005563
library(vcd)
## Loading required package: grid
## 
## Attaching package: 'vcd'
## The following object is masked from 'att.df':
## 
##     JobSatisfaction
fisher.test(mytable)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  mytable
## p-value = 0.0005767
## alternative hypothesis: two.sided
assocstats(mytable)
##                     X^2 df   P(> X^2)
## Likelihood Ratio 17.356  3 0.00059691
## Pearson          17.505  3 0.00055630
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.108 
## Cramer's V        : 0.109