Project Title-IBM attrition Analysis Name-Himesh Mendiratta Email-himeshmendiratta1998@gmail.com College-Srm university — title: “IBM HR Analytics Employee Attrition & Performance” author: “Himesh Mendiratta” date: “28 December 2017” output: pdf_document: default word_document: default html_document: default —

ATTRITION OF EMPLOYEES (INTRODUCTION)

Employee attrition refers to the loss of employees through a number of circumstances, such as resignation and retirement. The cause of attrition may be either voluntary or involuntary, though employer-initiated events such as layoffs are not typically included in the definition. Each industry has its own standards for acceptable attrition rates, and these rates can also differ between skilled and unskilled positions. Due to the expenses associated with training new employees, any type of employee attrition is typically seen to have a monetary cost. It is also possible for a company to use employee attrition to its benefit in some circumstances, such as relying on it to control labor costs without issuing mass layoffs.

OVERVIEW OF ANALYSIS

Uncover the factors that lead to employee attrition and explore important questions such as ‘show me a breakdown of distance from home by job role and attrition’ or ‘compare average monthly income by education and attrition’. This is a data set created by IBM data scientists.It is adetail analysis

IBM COMPANY OVERVIEW

IBM (International Business Machines Corporation) is an American multinational technology company headquartered in Armonk, New York, United States, with operations in over 170 countries. The company originated in 1911 as the Computing-Tabulating-Recording Company (CTR) and was renamed “International Business Machines” in 1924.IBM manufactures and markets computer hardware, middleware and software, and provides hosting and consulting services in areas ranging from mainframe computers to nanotechnology. IBM is also a major research organization, holding the record for most patents generated by a business (as of 2017) for 24 consecutive years

DATA SOURCE

Data has been collected from kaggle website.Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective.Data set link is given below-https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset

DATA SET DESCRIPTION

Education 1 ‘Below College’ 2 ‘College’ 3 ‘Bachelor’ 4 ‘Master’ 5 ‘Doctor’

EnvironmentSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

JobInvolvement 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

JobSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

PerformanceRating 1 ‘Low’ 2 ‘Good’ 3 ‘Excellent’ 4 ‘Outstanding’

RelationshipSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

WorkLifeBalance 1 ‘Bad’ 2 ‘Good’ 3 ‘Better’ 4 ‘Best’

DETAILED ANALYSIS

Reading Data into R

attrition.df<-read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")
View("attrition.df")
dim(attrition.df)
## [1] 1470   35

Analysing the data set

car::some(attrition.df)
##      ï..Age Attrition    BusinessTravel DailyRate             Department
## 206      29       Yes     Travel_Rarely       121                  Sales
## 445      48        No     Travel_Rarely       163                  Sales
## 730      35        No     Travel_Rarely       583 Research & Development
## 858      44       Yes     Travel_Rarely      1097 Research & Development
## 917      46        No     Travel_Rarely       168                  Sales
## 935      25        No     Travel_Rarely       266 Research & Development
## 1055     49        No     Travel_Rarely      1490 Research & Development
## 1056     34        No Travel_Frequently       829 Research & Development
## 1281     37        No     Travel_Rarely      1239        Human Resources
## 1295     41        No     Travel_Rarely       447 Research & Development
##      DistanceFromHome Education EducationField EmployeeCount
## 206                27         3      Marketing             1
## 445                 2         5      Marketing             1
## 730                25         4        Medical             1
## 858                10         4  Life Sciences             1
## 917                 4         2      Marketing             1
## 935                 1         3        Medical             1
## 1055                7         4  Life Sciences             1
## 1056               15         3        Medical             1
## 1281                8         2          Other             1
## 1295                5         3  Life Sciences             1
##      EmployeeNumber EnvironmentSatisfaction Gender HourlyRate
## 206             283                       2 Female         35
## 445             595                       2 Female         37
## 730            1014                       3 Female         57
## 858            1200                       3   Male         96
## 917            1280                       4 Female         33
## 935            1303                       4 Female         40
## 1055           1484                       3   Male         35
## 1056           1485                       2   Male         71
## 1281           1794                       3   Male         89
## 1295           1814                       2   Male         85
##      JobInvolvement JobLevel                   JobRole JobSatisfaction
## 206               3        3           Sales Executive               4
## 445               3        2           Sales Executive               4
## 730               3        3 Healthcare Representative               3
## 858               3        1        Research Scientist               3
## 917               2        5                   Manager               2
## 935               3        1        Research Scientist               2
## 1055              3        3 Healthcare Representative               2
## 1056              3        4         Research Director               1
## 1281              3        2           Human Resources               2
## 1295              4        2 Healthcare Representative               2
##      MaritalStatus MonthlyIncome MonthlyRate NumCompaniesWorked Over18
## 206        Married          7639       24525                  1      Y
## 445        Married          4051       19658                  2      Y
## 730       Divorced         10388        6975                  1      Y
## 858         Single          2936       10826                  1      Y
## 917        Married         18789        9946                  2      Y
## 935         Single          2096       18830                  1      Y
## 1055      Divorced         10466       20948                  3      Y
## 1056      Divorced         17007       11929                  7      Y
## 1281      Divorced          4071       12832                  2      Y
## 1295        Single          6870       15530                  3      Y
##      OverTime PercentSalaryHike PerformanceRating RelationshipSatisfaction
## 206        No                22                 4                        4
## 445        No                14                 3                        1
## 730       Yes                11                 3                        3
## 858       Yes                11                 3                        3
## 917        No                14                 3                        3
## 935        No                18                 3                        4
## 1055       No                14                 3                        2
## 1056       No                14                 3                        4
## 1281       No                13                 3                        3
## 1295       No                12                 3                        1
##      StandardHours StockOptionLevel TotalWorkingYears
## 206             80                3                10
## 445             80                1                14
## 730             80                1                16
## 858             80                0                 6
## 917             80                1                26
## 935             80                0                 2
## 1055            80                2                29
## 1056            80                2                16
## 1281            80                0                19
## 1295            80                0                11
##      TrainingTimesLastYear WorkLifeBalance YearsAtCompany
## 206                      3               2             10
## 445                      2               3              9
## 730                      3               2             16
## 858                      4               3              6
## 917                      2               3             11
## 935                      3               2              2
## 1055                     3               3              8
## 1056                     3               2             14
## 1281                     4               2             10
## 1295                     3               1              3
##      YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
## 206                   4                       1                    9
## 445                   7                       6                    7
## 730                  10                      10                    1
## 858                   4                       0                    2
## 917                   4                       0                    8
## 935                   2                       2                    1
## 1055                  7                       0                    7
## 1056                  8                       6                    9
## 1281                  0                       4                    7
## 1295                  2                       1                    2

Changing Attrition coloumn

attrition.df[, c(2)] <- sapply(attrition.df[, c(2)], as.numeric)

1 stands for no and 2 stands for yes

Evaluating first five rows

head(attrition.df)
##   ï..Age Attrition    BusinessTravel DailyRate             Department
## 1     41         2     Travel_Rarely      1102                  Sales
## 2     49         1 Travel_Frequently       279 Research & Development
## 3     37         2     Travel_Rarely      1373 Research & Development
## 4     33         1 Travel_Frequently      1392 Research & Development
## 5     27         1     Travel_Rarely       591 Research & Development
## 6     32         1 Travel_Frequently      1005 Research & Development
##   DistanceFromHome Education EducationField EmployeeCount EmployeeNumber
## 1                1         2  Life Sciences             1              1
## 2                8         1  Life Sciences             1              2
## 3                2         2          Other             1              4
## 4                3         4  Life Sciences             1              5
## 5                2         1        Medical             1              7
## 6                2         2  Life Sciences             1              8
##   EnvironmentSatisfaction Gender HourlyRate JobInvolvement JobLevel
## 1                       2 Female         94              3        2
## 2                       3   Male         61              2        2
## 3                       4   Male         92              2        1
## 4                       4 Female         56              3        1
## 5                       1   Male         40              3        1
## 6                       4   Male         79              3        1
##                 JobRole JobSatisfaction MaritalStatus MonthlyIncome
## 1       Sales Executive               4        Single          5993
## 2    Research Scientist               2       Married          5130
## 3 Laboratory Technician               3        Single          2090
## 4    Research Scientist               3       Married          2909
## 5 Laboratory Technician               2       Married          3468
## 6 Laboratory Technician               4        Single          3068
##   MonthlyRate NumCompaniesWorked Over18 OverTime PercentSalaryHike
## 1       19479                  8      Y      Yes                11
## 2       24907                  1      Y       No                23
## 3        2396                  6      Y      Yes                15
## 4       23159                  1      Y      Yes                11
## 5       16632                  9      Y       No                12
## 6       11864                  0      Y       No                13
##   PerformanceRating RelationshipSatisfaction StandardHours
## 1                 3                        1            80
## 2                 4                        4            80
## 3                 3                        2            80
## 4                 3                        3            80
## 5                 3                        4            80
## 6                 3                        3            80
##   StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance
## 1                0                 8                     0               1
## 2                1                10                     3               3
## 3                0                 7                     3               3
## 4                0                 8                     3               3
## 5                1                 6                     3               3
## 6                0                 8                     2               2
##   YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion
## 1              6                  4                       0
## 2             10                  7                       1
## 3              0                  0                       0
## 4              8                  7                       3
## 5              2                  2                       2
## 6              7                  7                       3
##   YearsWithCurrManager
## 1                    5
## 2                    7
## 3                    0
## 4                    0
## 5                    2
## 6                    6

Describing the data set

psych::describe(attrition.df)[,3:9]
##                              mean      sd  median  trimmed     mad  min
## ï..Age                      36.92    9.14    36.0    36.47    8.90   18
## Attrition                    1.16    0.37     1.0     1.08    0.00    1
## BusinessTravel*              2.61    0.67     3.0     2.76    0.00    1
## DailyRate                  802.49  403.51   802.0   803.83  510.01  102
## Department*                  2.26    0.53     2.0     2.25    0.00    1
## DistanceFromHome             9.19    8.11     7.0     8.08    7.41    1
## Education                    2.91    1.02     3.0     2.98    1.48    1
## EducationField*              3.25    1.33     3.0     3.10    1.48    1
## EmployeeCount                1.00    0.00     1.0     1.00    0.00    1
## EmployeeNumber            1024.87  602.02  1020.5  1023.40  790.97    1
## EnvironmentSatisfaction      2.72    1.09     3.0     2.78    1.48    1
## Gender*                      1.60    0.49     2.0     1.62    0.00    1
## HourlyRate                  65.89   20.33    66.0    66.02   26.69   30
## JobInvolvement               2.73    0.71     3.0     2.74    0.00    1
## JobLevel                     2.06    1.11     2.0     1.90    1.48    1
## JobRole*                     5.46    2.46     6.0     5.61    2.97    1
## JobSatisfaction              2.73    1.10     3.0     2.79    1.48    1
## MaritalStatus*               2.10    0.73     2.0     2.12    1.48    1
## MonthlyIncome             6502.93 4707.96  4919.0  5667.24 3260.24 1009
## MonthlyRate              14313.10 7117.79 14235.5 14286.48 9201.76 2094
## NumCompaniesWorked           2.69    2.50     2.0     2.36    1.48    0
## Over18*                      1.00    0.00     1.0     1.00    0.00    1
## OverTime*                    1.28    0.45     1.0     1.23    0.00    1
## PercentSalaryHike           15.21    3.66    14.0    14.80    2.97   11
## PerformanceRating            3.15    0.36     3.0     3.07    0.00    3
## RelationshipSatisfaction     2.71    1.08     3.0     2.77    1.48    1
## StandardHours               80.00    0.00    80.0    80.00    0.00   80
## StockOptionLevel             0.79    0.85     1.0     0.67    1.48    0
## TotalWorkingYears           11.28    7.78    10.0    10.37    5.93    0
## TrainingTimesLastYear        2.80    1.29     3.0     2.72    1.48    0
## WorkLifeBalance              2.76    0.71     3.0     2.77    0.00    1
## YearsAtCompany               7.01    6.13     5.0     5.99    4.45    0
## YearsInCurrentRole           4.23    3.62     3.0     3.85    4.45    0
## YearsSinceLastPromotion      2.19    3.22     1.0     1.48    1.48    0
## YearsWithCurrManager         4.12    3.57     3.0     3.77    4.45    0
##                            max
## ï..Age                      60
## Attrition                    2
## BusinessTravel*              3
## DailyRate                 1499
## Department*                  3
## DistanceFromHome            29
## Education                    5
## EducationField*              6
## EmployeeCount                1
## EmployeeNumber            2068
## EnvironmentSatisfaction      4
## Gender*                      2
## HourlyRate                 100
## JobInvolvement               4
## JobLevel                     5
## JobRole*                     9
## JobSatisfaction              4
## MaritalStatus*               3
## MonthlyIncome            19999
## MonthlyRate              26999
## NumCompaniesWorked           9
## Over18*                      1
## OverTime*                    2
## PercentSalaryHike           25
## PerformanceRating            4
## RelationshipSatisfaction     4
## StandardHours               80
## StockOptionLevel             3
## TotalWorkingYears           40
## TrainingTimesLastYear        6
## WorkLifeBalance              4
## YearsAtCompany              40
## YearsInCurrentRole          18
## YearsSinceLastPromotion     15
## YearsWithCurrManager        17

We can see the important variables mean median and mode from this

VISUALISING EACH COLOUMN

Gender

mytable<-with(attrition.df,table(Gender))
mytable
## Gender
## Female   Male 
##    588    882

Most of the employees in this data set is male

Age

hist(attrition.df$ï..Age,main="Distribution of age",xlab="Age",ylab="Count",col=blues9)

Age lies between 30-40 yrs

MonthlyIncome

hist(attrition.df$MonthlyIncome,main="Distribution of Monthly Income",xlab="Monthly Income",ylab="Count",col="orange")

Most of the employees earn less.

Attrition

hist(attrition.df$Attrition,main=" Distribution of Attrition",xlab="Attrition ",ylab="Count",col="yellowgreen"  )

Attrition NO-1,Yes-2.Low number of people go for attrition

Distance from home

hist(attrition.df$DistanceFromHome,main="Distance from home distribution",xlab="Distance from Home",ylab="Count",col="Pink")

Most of the employees in this data set live in close proximity to the office.

Number of companies Worked with

hist(attrition.df$NumCompaniesWorked,main="Number of companies Worked with",xlab="Number of companies",ylab="Count",col="orange")

Most of the employees have changed there company less than 2 times

Salary hike percent

hist(attrition.df$PercentSalaryHike,main="Distribution of salary hike",xlab="Salary hike(%)",ylab="Count",col="red")

Hike of 12-14 percent is common

Number of trainings

hist(attrition.df$TrainingTimesLastYear,main="Number of times the employee has gone under training",xlab="Training count",ylab="Count",col="brown")

Years at company

hist(attrition.df$YearsAtCompany,main="Years at company",xlab="Number of years",ylab="Count",col="yellow")

Most of the employees of IBM are new and have served the company for less than 10 years

years in current role

hist(attrition.df$YearsInCurrentRole,main="Years in current role",xlab="Years count",ylab="Count",col="yellow")

Most of the Employees have been in the same role for long period.

Years with current manager

hist(attrition.df$YearsWithCurrManager,main="Years with current manager",xlab="Years count",ylab="Count",col="yellow")

Most of the employees have been with same manager for less than 5 yrs

Years at company

hist(attrition.df$YearsSinceLastPromotion,main="years since last promotion",xlab="Years count",ylab="Count",col="yellow")

Most of the employees get promotion within five years.

BusinessTravel

mytable<-with(attrition.df,table(BusinessTravel))
mytable
## BusinessTravel
##        Non-Travel Travel_Frequently     Travel_Rarely 
##               150               277              1043

Most of the employees travel rarely

Department

mytable<-with(attrition.df,table(Department))
mytable
## Department
##        Human Resources Research & Development                  Sales 
##                     63                    961                    446

IBM company’s main focus is on research and development having the largest team.

Level of Education

mytable<-with(attrition.df,table(Education))
mytable
## Education
##   1   2   3   4   5 
## 170 282 572 398  48

Most of the employees have studied till bachelor’s degree

Environment Satisfaction

mytable<-with(attrition.df,table(EnvironmentSatisfaction))
mytable
## EnvironmentSatisfaction
##   1   2   3   4 
## 284 287 453 446

Most of the employees have hif=gh environment satisfaction.

Job Involvement

mytable<-with(attrition.df,table(JobInvolvement))
mytable
## JobInvolvement
##   1   2   3   4 
##  83 375 868 144

Most of the employees have high job involvement

Job Level

mytable<-with(attrition.df,table(JobLevel))
mytable
## JobLevel
##   1   2   3   4   5 
## 543 534 218 106  69

Most of the employees have low job level

Job Role

mytable<-with(attrition.df,table(JobRole))
mytable
## JobRole
## Healthcare Representative           Human Resources 
##                       131                        52 
##     Laboratory Technician                   Manager 
##                       259                       102 
##    Manufacturing Director         Research Director 
##                       145                        80 
##        Research Scientist           Sales Executive 
##                       292                       326 
##      Sales Representative 
##                        83

This company has highest number of sales executives when compared to other roles.

Job Satisfaction

mytable<-with(attrition.df,table(JobSatisfaction))
mytable
## JobSatisfaction
##   1   2   3   4 
## 289 280 442 459

Most of the employees have very high job satisfaction

Marital Status

mytable<-with(attrition.df,table(MaritalStatus))
mytable
## MaritalStatus
## Divorced  Married   Single 
##      327      673      470

Employees in this company are mostly married

Overtime

mytable<-with(attrition.df,table(OverTime))
mytable
## OverTime
##   No  Yes 
## 1054  416

Most of the employees dont go for over time in this firm

Performance Rating

mytable<-with(attrition.df,table(PerformanceRating))
mytable
## PerformanceRating
##    3    4 
## 1244  226

Mostly employees have performance rating of 3

Relationship Satisfaction

mytable<-with(attrition.df,table(RelationshipSatisfaction))
mytable
## RelationshipSatisfaction
##   1   2   3   4 
## 276 303 459 432

Relationship satisfaction is high in this company

Work Life Balance

mytable<-with(attrition.df,table(WorkLifeBalance))
mytable
## WorkLifeBalance
##   1   2   3   4 
##  80 344 893 153

Work life balance is better in IBM

We have clearly seen Distribution of all the major variables in this data set.all the visuals and outputs are self-explanatory.

Now proceeding to next level of analysis

Our main aim of this analysis will be to check the factors that affect the attrition.

RELATIONSHIP BETWEEN ATTRITION AND OTHER VARIABLES

Job role vs distance from home vs attrition

mytable<-xtabs(~JobRole+DistanceFromHome+Attrition,data=attrition.df)
mytable
## , , Attrition = 1
## 
##                            DistanceFromHome
## JobRole                      1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
##   Healthcare Representative 23 14  4  3  6  4  8  8  5  7  4  3  2  1  2
##   Human Resources            8  9  3  2  1  2  0  5  1  2  1  0  1  0  0
##   Laboratory Technician     27 25 12  9  7 12 10 10 18  6  5  3  2  1  3
##   Manager                   13 22  6  8  6  3  5  3  4  5  3  0  1  0  0
##   Manufacturing Director    23 22  4  6  4  5 10  5  5  6  3  3  1  0  1
##   Research Director         13 10  6  1  3  4  7  5  2  6  1  1  0  3  5
##   Research Scientist        40 32 15  9 14 14 13 10 19 12  2  1  2  5  5
##   Sales Executive           33 37 19 14 11  7 16 21 10 26  5  3  2  6  5
##   Sales Representative       2 12  1  3  3  1  4  3  3  5  1  0  2  1  0
##                            DistanceFromHome
## JobRole                     16 17 18 19 20 21 22 23 24 25 26 27 28 29
##   Healthcare Representative  4  1  4  0  4  1  0  2  3  4  1  2  1  1
##   Human Resources            0  1  0  1  0  0  0  0  1  1  1  0  0  0
##   Laboratory Technician      5  3  3  6  4  3  1  5  5  2  0  1  3  6
##   Manager                    1  2  1  0  0  1  3  0  0  1  6  0  0  3
##   Manufacturing Director     3  2  6  3  2  2  1  2  3  4  0  3  2  4
##   Research Director          0  1  1  0  0  0  2  2  0  1  1  0  3  0
##   Research Scientist         4  3  3  6  4  2  3  7  2  2  6  2  5  3
##   Sales Executive            8  2  4  3  5  4  2  3  1  4  7  1  6  4
##   Sales Representative       0  0  0  0  2  2  1  1  1  0  0  0  1  1
## 
## , , Attrition = 2
## 
##                            DistanceFromHome
## JobRole                      1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
##   Healthcare Representative  0  1  0  0  0  0  0  0  0  0  1  0  0  1  1
##   Human Resources            1  1  0  0  0  1  0  1  1  0  0  0  1  0  0
##   Laboratory Technician      4 11  3  3  2  3  6  4  4  3  0  1  0  2  2
##   Manager                    0  3  0  0  0  0  0  0  0  0  0  0  0  0  1
##   Manufacturing Director     1  2  1  0  0  0  1  1  0  2  0  0  0  0  0
##   Research Director          0  1  0  0  0  0  0  0  0  0  0  1  0  0  0
##   Research Scientist         7  4  5  2  3  1  1  2  2  4  1  0  0  1  0
##   Sales Executive            6  2  3  4  2  1  0  2  5  2  2  2  5  0  1
##   Sales Representative       7  3  2  0  3  1  3  0  6  0  0  2  0  0  0
##                            DistanceFromHome
## JobRole                     16 17 18 19 20 21 22 23 24 25 26 27 28 29
##   Healthcare Representative  0  0  0  0  1  1  0  1  1  0  0  0  0  1
##   Human Resources            0  1  1  0  1  0  2  1  0  0  0  0  0  0
##   Laboratory Technician      2  2  0  0  0  1  0  0  5  2  0  0  1  1
##   Manager                    0  0  0  0  0  0  0  0  0  0  0  0  0  1
##   Manufacturing Director     0  0  0  0  0  0  1  1  0  0  0  0  0  0
##   Research Director          0  0  0  0  0  0  0  0  0  0  0  0  0  0
##   Research Scientist         2  2  2  1  0  0  1  1  1  3  0  0  0  1
##   Sales Executive            3  0  1  1  1  0  1  1  3  1  3  3  1  1
##   Sales Representative       0  0  0  1  1  1  1  0  2  0  0  0  0  0
margin.table(mytable,1)
## JobRole
## Healthcare Representative           Human Resources 
##                       131                        52 
##     Laboratory Technician                   Manager 
##                       259                       102 
##    Manufacturing Director         Research Director 
##                       145                        80 
##        Research Scientist           Sales Executive 
##                       292                       326 
##      Sales Representative 
##                        83

Mostly sales executive travel long distance for job and they are the ones who mostly go for attrition.Hence distance affects the attrition of employees

Effect of monthly income,age and gender on attrition

aggregate(cbind(Attrition,ï..Age,MonthlyIncome) ~ Gender,
data = attrition.df, mean)
##   Gender Attrition   ï..Age MonthlyIncome
## 1 Female  1.147959 37.32993      6686.566
## 2   Male  1.170068 36.65306      6380.508
boxplot(MonthlyIncome~Attrition ,data=attrition.df, main="Attrition based on Monthly Income",ylab="monthly income",xlab="Attrition")

boxplot(ï..Age~Attrition ,data=attrition.df, main="Attrition based on age",ylab="Age",xlab="Attrition")

car::scatterplot(Attrition~MonthlyIncome|Gender,data=attrition.df,
xlab="MonthlyIncome", ylab="Attrition",
main="How Monthly income affects attrition",
labels=row.names(attrition.df))

car::scatterplot(Attrition~ï..Age|Gender,data=attrition.df,
xlab="Age", ylab="Attrition",
main="Effect of age on attrition",
labels=row.names(attrition.df))

People who opt for attrition are mostly male and have a low mothly income.Also they belong to a younger group age

Effect of number of companies worked and job Distance from home on Attrition

aggregate(cbind(Attrition,NumCompaniesWorked,DistanceFromHome) ~ Gender,
data = attrition.df, mean)
##   Gender Attrition NumCompaniesWorked DistanceFromHome
## 1 Female  1.147959           2.812925         9.210884
## 2   Male  1.170068           2.613379         9.180272
car::scatterplot(Attrition~NumCompaniesWorked|Gender,data=attrition.df,
xlab="Number of companies worked in", ylab="Attrition",
main="Effect of number of companies on attrition",
labels=row.names(attrition.df))

car::scatterplot(Attrition~DistanceFromHome|Gender,data=attrition.df,
xlab="DistanceFromHome", ylab="Attrition",
main="Distance from home",
labels=row.names(attrition.df))

Not able to draw much analysis from this.

Attrition vs other variables

car::scatterplot.matrix(~Attrition+PercentSalaryHike+TrainingTimesLastYear,data=attrition.df,
main="Attrition versus other variables")
## Warning: 'car::scatterplot.matrix' is deprecated.
## Use 'scatterplotMatrix' instead.
## See help("Deprecated") and help("car-deprecated").

We can see relationship between attrition and percent salary hike and trainings till now.When training’s are less then attrition is seen highest.

Attrition vs other more variable

car::scatterplot.matrix(~Attrition+YearsAtCompany+YearsInCurrentRole+YearsWithCurrManager+YearsSinceLastPromotion,data=attrition.df,
main="Attrition versus other variables")
## Warning: 'car::scatterplot.matrix' is deprecated.
## Use 'scatterplotMatrix' instead.
## See help("Deprecated") and help("car-deprecated").
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit negative part of the spread

##From the scatter plot we can see the trend of variables with Attrition.It can be easily detected what all variables are efffecting the attrition of valuable employees. some are showing positive relationship while some are negative.

Effect of fixed variables on Attrition

Business Travel

mytable<-xtabs(~Attrition+BusinessTravel,data=attrition.df)
mytable
##          BusinessTravel
## Attrition Non-Travel Travel_Frequently Travel_Rarely
##         1        138               208           887
##         2         12                69           156

Most attrition takes place who travel rarely

Department

mytable<-xtabs(~Attrition+Department,data=attrition.df)
mytable
##          Department
## Attrition Human Resources Research & Development Sales
##         1              51                    828   354
##         2              12                    133    92

Most attrition is in research and development department

Education

mytable<-xtabs(~Attrition+Education,data=attrition.df)
mytable
##          Education
## Attrition   1   2   3   4   5
##         1 139 238 473 340  43
##         2  31  44  99  58   5

Most attrition is in education level 3

Environment satisfaction

mytable<-xtabs(~Attrition+EnvironmentSatisfaction,data=attrition.df)
mytable
##          EnvironmentSatisfaction
## Attrition   1   2   3   4
##         1 212 244 391 386
##         2  72  43  62  60

It explains that attrition is due to low environmental satisfaction

Job Involvement

mytable<-xtabs(~Attrition+JobInvolvement,data=attrition.df)
mytable
##          JobInvolvement
## Attrition   1   2   3   4
##         1  55 304 743 131
##         2  28  71 125  13

Attrition in medium job involvement

Job level

mytable<-xtabs(~Attrition+JobLevel,data=attrition.df)
mytable
##          JobLevel
## Attrition   1   2   3   4   5
##         1 400 482 186 101  64
##         2 143  52  32   5   5

Attrition is maximum in lowest job level

Job role

mytable<-xtabs(~Attrition+JobRole,data=attrition.df)
mytable
##          JobRole
## Attrition Healthcare Representative Human Resources Laboratory Technician
##         1                       122              40                   197
##         2                         9              12                    62
##          JobRole
## Attrition Manager Manufacturing Director Research Director
##         1      97                    135                78
##         2       5                     10                 2
##          JobRole
## Attrition Research Scientist Sales Executive Sales Representative
##         1                245             269                   50
##         2                 47              57                   33

Maximum attrition is in sales executive category

Job satisfaction

mytable<-xtabs(~Attrition+JobSatisfaction,data=attrition.df)
mytable
##          JobSatisfaction
## Attrition   1   2   3   4
##         1 223 234 369 407
##         2  66  46  73  52

Attrition is in job satifaction level 3

Marital status

mytable<-xtabs(~Attrition+MaritalStatus,data=attrition.df)
mytable
##          MaritalStatus
## Attrition Divorced Married Single
##         1      294     589    350
##         2       33      84    120

Single people go for attrition

Overtime

mytable<-xtabs(~Attrition+OverTime,data=attrition.df)
mytable
##          OverTime
## Attrition  No Yes
##         1 944 289
##         2 110 127

People who are forced to do overtime go for attrition

Performance rating

mytable<-xtabs(~Attrition+PerformanceRating,data=attrition.df)
mytable
##          PerformanceRating
## Attrition    3    4
##         1 1044  189
##         2  200   37

People having low performance rating go for attrition

Relationship satisfaction

mytable<-xtabs(~Attrition+RelationshipSatisfaction,data=attrition.df)
mytable
##          RelationshipSatisfaction
## Attrition   1   2   3   4
##         1 219 258 388 368
##         2  57  45  71  64

People having relationship satisfaction level 3 go for attrition

Work life balance

mytable<-xtabs(~Attrition+WorkLifeBalance,data=attrition.df)
mytable
##          WorkLifeBalance
## Attrition   1   2   3   4
##         1  55 286 766 126
##         2  25  58 127  27

People having work life balance of 3 go for attrition

CORRELATION

corrplot::corrplot.mixed(corr=cor(attrition.df[,c(1,2,4,6,7,10,11,13:15,17,19,20,21
                                                )],use="complete.obs"),upper="pie",tl.pos="lt")

cor((attrition.df[,c(1,2,4,6,7,10,11,13:15,17,19,20,21)]))
##                               ï..Age   Attrition    DailyRate
## ï..Age                   1.000000000 -0.15920501  0.010660943
## Attrition               -0.159205007  1.00000000 -0.056651992
## DailyRate                0.010660943 -0.05665199  1.000000000
## DistanceFromHome        -0.001686120  0.07792358 -0.004985337
## Education                0.208033731 -0.03137282 -0.016806433
## EmployeeNumber          -0.010145467 -0.01057724 -0.050990434
## EnvironmentSatisfaction  0.010146428 -0.10336898  0.018354854
## HourlyRate               0.024286543 -0.00684555  0.023381422
## JobInvolvement           0.029819959 -0.13001596  0.046134874
## JobLevel                 0.509604228 -0.16910475  0.002966335
## JobSatisfaction         -0.004891877 -0.10348113  0.030571008
## MonthlyIncome            0.497854567 -0.15983958  0.007707059
## MonthlyRate              0.028051167  0.01517021 -0.032181602
## NumCompaniesWorked       0.299634758  0.04349374  0.038153434
##                         DistanceFromHome   Education EmployeeNumber
## ï..Age                      -0.001686120  0.20803373   -0.010145467
## Attrition                    0.077923583 -0.03137282   -0.010577243
## DailyRate                   -0.004985337 -0.01680643   -0.050990434
## DistanceFromHome             1.000000000  0.02104183    0.032916407
## Education                    0.021041826  1.00000000    0.042070093
## EmployeeNumber               0.032916407  0.04207009    1.000000000
## EnvironmentSatisfaction     -0.016075327 -0.02712831    0.017620802
## HourlyRate                   0.031130586  0.01677483    0.035179212
## JobInvolvement               0.008783280  0.04243763   -0.006887923
## JobLevel                     0.005302731  0.10158889   -0.018519194
## JobSatisfaction             -0.003668839 -0.01129612   -0.046246735
## MonthlyIncome               -0.017014445  0.09496068   -0.014828516
## MonthlyRate                  0.027472864 -0.02608420    0.012648229
## NumCompaniesWorked          -0.029250804  0.12631656   -0.001251032
##                         EnvironmentSatisfaction  HourlyRate JobInvolvement
## ï..Age                              0.010146428  0.02428654    0.029819959
## Attrition                          -0.103368978 -0.00684555   -0.130015957
## DailyRate                           0.018354854  0.02338142    0.046134874
## DistanceFromHome                   -0.016075327  0.03113059    0.008783280
## Education                          -0.027128313  0.01677483    0.042437634
## EmployeeNumber                      0.017620802  0.03517921   -0.006887923
## EnvironmentSatisfaction             1.000000000 -0.04985696   -0.008277598
## HourlyRate                         -0.049856956  1.00000000    0.042860641
## JobInvolvement                     -0.008277598  0.04286064    1.000000000
## JobLevel                            0.001211699 -0.02785349   -0.012629883
## JobSatisfaction                    -0.006784353 -0.07133462   -0.021475910
## MonthlyIncome                      -0.006259088 -0.01579430   -0.015271491
## MonthlyRate                         0.037599623 -0.01529675   -0.016322079
## NumCompaniesWorked                  0.012594323  0.02215688    0.015012413
##                             JobLevel JobSatisfaction MonthlyIncome
## ï..Age                   0.509604228   -0.0048918771   0.497854567
## Attrition               -0.169104751   -0.1034811261  -0.159839582
## DailyRate                0.002966335    0.0305710078   0.007707059
## DistanceFromHome         0.005302731   -0.0036688392  -0.017014445
## Education                0.101588886   -0.0112961167   0.094960677
## EmployeeNumber          -0.018519194   -0.0462467349  -0.014828516
## EnvironmentSatisfaction  0.001211699   -0.0067843526  -0.006259088
## HourlyRate              -0.027853486   -0.0713346244  -0.015794304
## JobInvolvement          -0.012629883   -0.0214759103  -0.015271491
## JobLevel                 1.000000000   -0.0019437080   0.950299913
## JobSatisfaction         -0.001943708    1.0000000000  -0.007156742
## MonthlyIncome            0.950299913   -0.0071567424   1.000000000
## MonthlyRate              0.039562951    0.0006439169   0.034813626
## NumCompaniesWorked       0.142501124   -0.0556994260   0.149515216
##                           MonthlyRate NumCompaniesWorked
## ï..Age                   0.0280511671        0.299634758
## Attrition                0.0151702125        0.043493739
## DailyRate               -0.0321816015        0.038153434
## DistanceFromHome         0.0274728635       -0.029250804
## Education               -0.0260841972        0.126316560
## EmployeeNumber           0.0126482292       -0.001251032
## EnvironmentSatisfaction  0.0375996229        0.012594323
## HourlyRate              -0.0152967496        0.022156883
## JobInvolvement          -0.0163220791        0.015012413
## JobLevel                 0.0395629510        0.142501124
## JobSatisfaction          0.0006439169       -0.055699426
## MonthlyIncome            0.0348136261        0.149515216
## MonthlyRate              1.0000000000        0.017521353
## NumCompaniesWorked       0.0175213534        1.000000000

From this we get the result that job level is highly correlated with age and monthly income and monthly income is highly correlated with age and job level.All are positively correlated

Now we will do correlation with other variables

corrplot::corrplot.mixed(corr=cor(attrition.df[,c(24:26,28:35 )],use="complete.obs"),upper="pie",tl.pos="lt")

cor((attrition.df[,c(24:26,28:35)]))
##                          PercentSalaryHike PerformanceRating
## PercentSalaryHike              1.000000000       0.773549996
## PerformanceRating              0.773549996       1.000000000
## RelationshipSatisfaction      -0.040490081      -0.031351455
## StockOptionLevel               0.007527748       0.003506472
## TotalWorkingYears             -0.020608488       0.006743668
## TrainingTimesLastYear         -0.005221012      -0.015578882
## WorkLifeBalance               -0.003279636       0.002572361
## YearsAtCompany                -0.035991262       0.003435126
## YearsInCurrentRole            -0.001520027       0.034986260
## YearsSinceLastPromotion       -0.022154313       0.017896066
## YearsWithCurrManager          -0.011985248       0.022827169
##                          RelationshipSatisfaction StockOptionLevel
## PercentSalaryHike                   -0.0404900811      0.007527748
## PerformanceRating                   -0.0313514554      0.003506472
## RelationshipSatisfaction             1.0000000000     -0.045952491
## StockOptionLevel                    -0.0459524907      1.000000000
## TotalWorkingYears                    0.0240542918      0.010135969
## TrainingTimesLastYear                0.0024965264      0.011274070
## WorkLifeBalance                      0.0196044057      0.004128730
## YearsAtCompany                       0.0193667869      0.015058008
## YearsInCurrentRole                  -0.0151229149      0.050817873
## YearsSinceLastPromotion              0.0334925021      0.014352185
## YearsWithCurrManager                -0.0008674968      0.024698227
##                          TotalWorkingYears TrainingTimesLastYear
## PercentSalaryHike             -0.020608488          -0.005221012
## PerformanceRating              0.006743668          -0.015578882
## RelationshipSatisfaction       0.024054292           0.002496526
## StockOptionLevel               0.010135969           0.011274070
## TotalWorkingYears              1.000000000          -0.035661571
## TrainingTimesLastYear         -0.035661571           1.000000000
## WorkLifeBalance                0.001007646           0.028072207
## YearsAtCompany                 0.628133155           0.003568666
## YearsInCurrentRole             0.460364638          -0.005737504
## YearsSinceLastPromotion        0.404857759          -0.002066536
## YearsWithCurrManager           0.459188397          -0.004095526
##                          WorkLifeBalance YearsAtCompany YearsInCurrentRole
## PercentSalaryHike           -0.003279636   -0.035991262       -0.001520027
## PerformanceRating            0.002572361    0.003435126        0.034986260
## RelationshipSatisfaction     0.019604406    0.019366787       -0.015122915
## StockOptionLevel             0.004128730    0.015058008        0.050817873
## TotalWorkingYears            0.001007646    0.628133155        0.460364638
## TrainingTimesLastYear        0.028072207    0.003568666       -0.005737504
## WorkLifeBalance              1.000000000    0.012089185        0.049856498
## YearsAtCompany               0.012089185    1.000000000        0.758753737
## YearsInCurrentRole           0.049856498    0.758753737        1.000000000
## YearsSinceLastPromotion      0.008941249    0.618408865        0.548056248
## YearsWithCurrManager         0.002759440    0.769212425        0.714364762
##                          YearsSinceLastPromotion YearsWithCurrManager
## PercentSalaryHike                   -0.022154313        -0.0119852485
## PerformanceRating                    0.017896066         0.0228271689
## RelationshipSatisfaction             0.033492502        -0.0008674968
## StockOptionLevel                     0.014352185         0.0246982266
## TotalWorkingYears                    0.404857759         0.4591883971
## TrainingTimesLastYear               -0.002066536        -0.0040955260
## WorkLifeBalance                      0.008941249         0.0027594402
## YearsAtCompany                       0.618408865         0.7692124251
## YearsInCurrentRole                   0.548056248         0.7143647616
## YearsSinceLastPromotion              1.000000000         0.5102236358
## YearsWithCurrManager                 0.510223636         1.0000000000

We see that performance rating is highly correlated to percent salary hike.Also total working years,years at company,years in current role,years since last promotion,years with current manager are inter correlated

T-test(to reject null hypothesis)

t.test(PercentSalaryHike~PerformanceRating,data=attrition.df)
## 
##  Welch Two Sample t-test
## 
## data:  PercentSalaryHike by PerformanceRating
## t = -63.389, df = 456.99, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.089594 -7.603091
## sample estimates:
## mean in group 3 mean in group 4 
##        14.00322        21.84956

In this case we can reject the null hypothesis.All the other four are inter correlated so no need to check.

REGRESSION

Model1

attach(attrition.df)
Model1 <-Attrition~ï..Age+BusinessTravel+DailyRate+Department+DistanceFromHome+Education+EducationField+EnvironmentSatisfaction+Gender+JobLevel+JobRole+JobSatisfaction+MaritalStatus+MonthlyIncome+NumCompaniesWorked+OverTime+PercentSalaryHike+JobInvolvement+PerformanceRating+EnvironmentSatisfaction+TotalWorkingYears+TrainingTimesLastYear+WorkLifeBalance+YearsAtCompany+YearsInCurrentRole+YearsSinceLastPromotion+YearsWithCurrManager
fit1 <- lm(Model1, data = attrition.df)
summary(fit1)
## 
## Call:
## lm(formula = Model1, data = attrition.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58046 -0.20418 -0.08046  0.08004  1.11625 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)                       1.475e+00  1.722e-01   8.564  < 2e-16
## ï..Age                           -3.673e-03  1.327e-03  -2.767 0.005722
## BusinessTravelTravel_Frequently   1.532e-01  3.310e-02   4.629 4.01e-06
## BusinessTravelTravel_Rarely       6.826e-02  2.855e-02   2.391 0.016948
## DailyRate                        -2.742e-05  2.119e-05  -1.294 0.195949
## DepartmentResearch & Development  1.242e-01  1.172e-01   1.060 0.289123
## DepartmentSales                   9.838e-02  1.211e-01   0.812 0.416868
## DistanceFromHome                  3.518e-03  1.048e-03   3.357 0.000809
## Education                         1.767e-03  8.540e-03   0.207 0.836125
## EducationFieldLife Sciences      -1.170e-01  8.378e-02  -1.396 0.162927
## EducationFieldMarketing          -7.742e-02  8.924e-02  -0.868 0.385809
## EducationFieldMedical            -1.310e-01  8.414e-02  -1.556 0.119812
## EducationFieldOther              -1.352e-01  8.993e-02  -1.503 0.133073
## EducationFieldTechnical Degree   -2.050e-02  8.750e-02  -0.234 0.814836
## EnvironmentSatisfaction          -4.039e-02  7.798e-03  -5.180 2.53e-07
## GenderMale                        3.394e-02  1.742e-02   1.948 0.051626
## JobLevel                         -4.253e-03  2.854e-02  -0.149 0.881559
## JobRoleHuman Resources            2.083e-01  1.224e-01   1.702 0.088977
## JobRoleLaboratory Technician      1.357e-01  4.005e-02   3.388 0.000723
## JobRoleManager                    5.380e-02  6.794e-02   0.792 0.428598
## JobRoleManufacturing Director     1.418e-02  3.926e-02   0.361 0.718054
## JobRoleResearch Director          9.994e-04  6.058e-02   0.016 0.986840
## JobRoleResearch Scientist         3.702e-02  3.964e-02   0.934 0.350466
## JobRoleSales Executive            1.030e-01  7.762e-02   1.327 0.184692
## JobRoleSales Representative       2.579e-01  8.622e-02   2.992 0.002822
## JobSatisfaction                  -3.709e-02  7.697e-03  -4.818 1.60e-06
## MaritalStatusMarried              2.186e-02  2.192e-02   0.997 0.318839
## MaritalStatusSingle               1.335e-01  2.361e-02   5.652 1.91e-08
## MonthlyIncome                     9.389e-07  7.600e-06   0.124 0.901691
## NumCompaniesWorked                1.653e-02  3.808e-03   4.340 1.53e-05
## OverTimeYes                       2.083e-01  1.897e-02  10.981  < 2e-16
## PercentSalaryHike                -1.949e-03  3.680e-03  -0.530 0.596403
## JobInvolvement                   -5.936e-02  1.200e-02  -4.947 8.42e-07
## PerformanceRating                 1.845e-02  3.724e-02   0.496 0.620306
## TotalWorkingYears                -3.401e-03  2.419e-03  -1.406 0.159899
## TrainingTimesLastYear            -1.390e-02  6.642e-03  -2.092 0.036578
## WorkLifeBalance                  -3.263e-02  1.208e-02  -2.702 0.006980
## YearsAtCompany                    5.250e-03  2.990e-03   1.756 0.079286
## YearsInCurrentRole               -8.868e-03  3.878e-03  -2.287 0.022341
## YearsSinceLastPromotion           1.055e-02  3.420e-03   3.085 0.002072
## YearsWithCurrManager             -9.634e-03  3.976e-03  -2.423 0.015518
##                                     
## (Intercept)                      ***
## ï..Age                           ** 
## BusinessTravelTravel_Frequently  ***
## BusinessTravelTravel_Rarely      *  
## DailyRate                           
## DepartmentResearch & Development    
## DepartmentSales                     
## DistanceFromHome                 ***
## Education                           
## EducationFieldLife Sciences         
## EducationFieldMarketing             
## EducationFieldMedical               
## EducationFieldOther                 
## EducationFieldTechnical Degree      
## EnvironmentSatisfaction          ***
## GenderMale                       .  
## JobLevel                            
## JobRoleHuman Resources           .  
## JobRoleLaboratory Technician     ***
## JobRoleManager                      
## JobRoleManufacturing Director       
## JobRoleResearch Director            
## JobRoleResearch Scientist           
## JobRoleSales Executive              
## JobRoleSales Representative      ** 
## JobSatisfaction                  ***
## MaritalStatusMarried                
## MaritalStatusSingle              ***
## MonthlyIncome                       
## NumCompaniesWorked               ***
## OverTimeYes                      ***
## PercentSalaryHike                   
## JobInvolvement                   ***
## PerformanceRating                   
## TotalWorkingYears                   
## TrainingTimesLastYear            *  
## WorkLifeBalance                  ** 
## YearsAtCompany                   .  
## YearsInCurrentRole               *  
## YearsSinceLastPromotion          ** 
## YearsWithCurrManager             *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3225 on 1429 degrees of freedom
## Multiple R-squared:  0.2523, Adjusted R-squared:  0.2313 
## F-statistic: 12.05 on 40 and 1429 DF,  p-value: < 2.2e-16

This Regression model is unfit.It has very low R squared value.Hence we will run a test to determine best variable sto be under taken for regression model ##Test1

library(leaps)
## Warning: package 'leaps' was built under R version 3.4.3
leap1 <- regsubsets(Model1, data = attrition.df, nbest=1)
# summary(leap1)
plot(leap1, scale="adjr2")

In new regression model we will have total working years, education, distance from home,age,monthly income,overtime as variables.

Model2

attach(attrition.df)
## The following objects are masked from attrition.df (pos = 4):
## 
##     Attrition, BusinessTravel, DailyRate, Department,
##     DistanceFromHome, Education, EducationField, EmployeeCount,
##     EmployeeNumber, EnvironmentSatisfaction, Gender, HourlyRate,
##     ï..Age, JobInvolvement, JobLevel, JobRole, JobSatisfaction,
##     MaritalStatus, MonthlyIncome, MonthlyRate, NumCompaniesWorked,
##     Over18, OverTime, PercentSalaryHike, PerformanceRating,
##     RelationshipSatisfaction, StandardHours, StockOptionLevel,
##     TotalWorkingYears, TrainingTimesLastYear, WorkLifeBalance,
##     YearsAtCompany, YearsInCurrentRole, YearsSinceLastPromotion,
##     YearsWithCurrManager
Model2 <-Attrition~EnvironmentSatisfaction+JobSatisfaction+OverTime+JobInvolvement+BusinessTravel+JobRole+MaritalStatus+TotalWorkingYears
fit2<- lm(Model2, data = attrition.df)
summary(fit2)
## 
## Call:
## lm(formula = Model2, data = attrition.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.64053 -0.20175 -0.08617  0.05029  1.25881 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      1.350651   0.064724  20.868  < 2e-16 ***
## EnvironmentSatisfaction         -0.040500   0.007935  -5.104 3.76e-07 ***
## JobSatisfaction                 -0.039566   0.007836  -5.049 5.00e-07 ***
## OverTimeYes                      0.211422   0.019255  10.980  < 2e-16 ***
## JobInvolvement                  -0.063648   0.012170  -5.230 1.94e-07 ***
## BusinessTravelTravel_Frequently  0.146786   0.033715   4.354 1.43e-05 ***
## BusinessTravelTravel_Rarely      0.063610   0.029031   2.191 0.028603 *  
## JobRoleHuman Resources           0.132329   0.054945   2.408 0.016147 *  
## JobRoleLaboratory Technician     0.133775   0.036704   3.645 0.000277 ***
## JobRoleManager                   0.041907   0.046354   0.904 0.366115    
## JobRoleManufacturing Director   -0.007684   0.039906  -0.193 0.847339    
## JobRoleResearch Director        -0.016877   0.048190  -0.350 0.726231    
## JobRoleResearch Scientist        0.042960   0.036053   1.192 0.233620    
## JobRoleSales Executive           0.080124   0.034482   2.324 0.020281 *  
## JobRoleSales Representative      0.242834   0.048592   4.997 6.52e-07 ***
## MaritalStatusMarried             0.019822   0.022355   0.887 0.375392    
## MaritalStatusSingle              0.133104   0.024002   5.546 3.48e-08 ***
## TotalWorkingYears               -0.004542   0.001479  -3.071 0.002173 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3303 on 1452 degrees of freedom
## Multiple R-squared:  0.2029, Adjusted R-squared:  0.1936 
## F-statistic: 21.75 on 17 and 1452 DF,  p-value: < 2.2e-16

This model has high R squared value,however some of the variables have high p value too so we need to change our set of variables.

Test2

library(leaps)
leap2 <- regsubsets(Model2, data = attrition.df, nbest=1)

plot(leap2, scale="adjr2")

Model3

attach(attrition.df)
## The following objects are masked from attrition.df (pos = 3):
## 
##     Attrition, BusinessTravel, DailyRate, Department,
##     DistanceFromHome, Education, EducationField, EmployeeCount,
##     EmployeeNumber, EnvironmentSatisfaction, Gender, HourlyRate,
##     ï..Age, JobInvolvement, JobLevel, JobRole, JobSatisfaction,
##     MaritalStatus, MonthlyIncome, MonthlyRate, NumCompaniesWorked,
##     Over18, OverTime, PercentSalaryHike, PerformanceRating,
##     RelationshipSatisfaction, StandardHours, StockOptionLevel,
##     TotalWorkingYears, TrainingTimesLastYear, WorkLifeBalance,
##     YearsAtCompany, YearsInCurrentRole, YearsSinceLastPromotion,
##     YearsWithCurrManager
## The following objects are masked from attrition.df (pos = 5):
## 
##     Attrition, BusinessTravel, DailyRate, Department,
##     DistanceFromHome, Education, EducationField, EmployeeCount,
##     EmployeeNumber, EnvironmentSatisfaction, Gender, HourlyRate,
##     ï..Age, JobInvolvement, JobLevel, JobRole, JobSatisfaction,
##     MaritalStatus, MonthlyIncome, MonthlyRate, NumCompaniesWorked,
##     Over18, OverTime, PercentSalaryHike, PerformanceRating,
##     RelationshipSatisfaction, StandardHours, StockOptionLevel,
##     TotalWorkingYears, TrainingTimesLastYear, WorkLifeBalance,
##     YearsAtCompany, YearsInCurrentRole, YearsSinceLastPromotion,
##     YearsWithCurrManager
Model3 <-Attrition~OverTime+TotalWorkingYears
fit3<- lm(Model3, data = attrition.df)
summary(fit3)
## 
## Call:
## lm(formula = Model3, data = attrition.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.39127 -0.16382 -0.11439 -0.02378  1.13273 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.196765   0.017064  70.132  < 2e-16 ***
## OverTimeYes        0.202738   0.020324   9.975  < 2e-16 ***
## TotalWorkingYears -0.008237   0.001177  -6.998 3.92e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.351 on 1467 degrees of freedom
## Multiple R-squared:  0.09093,    Adjusted R-squared:  0.08969 
## F-statistic: 73.36 on 2 and 1467 DF,  p-value: < 2.2e-16

This seems to be the best model with

library(leaps)
leap3 <- regsubsets(Model3, data = attrition.df, nbest=1)

plot(leap3, scale="adjr2")

AIC

AIC(fit1)
## [1] 887.2047
AIC(fit2)
## [1] 935.1092
AIC(fit3)
## [1] 1098.415

CONCLUSION

WE have visualised all the important variable in this data set

After lots of try i have come to the conclusion that the most fit model is the one containing all the variables(fit1).It has highest R squared value among all the 3 models.

REFERENCE

http://www.wisegeek.com/what-is-employee-attrition.html https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset https://en.wikipedia.org/wiki/Kaggle