Project Title-IBM attrition Analysis Name-Himesh Mendiratta Email-himeshmendiratta1998@gmail.com College-Srm university — title: “IBM HR Analytics Employee Attrition & Performance” author: “Himesh Mendiratta” date: “28 December 2017” output: pdf_document: default word_document: default html_document: default —
Employee attrition refers to the loss of employees through a number of circumstances, such as resignation and retirement. The cause of attrition may be either voluntary or involuntary, though employer-initiated events such as layoffs are not typically included in the definition. Each industry has its own standards for acceptable attrition rates, and these rates can also differ between skilled and unskilled positions. Due to the expenses associated with training new employees, any type of employee attrition is typically seen to have a monetary cost. It is also possible for a company to use employee attrition to its benefit in some circumstances, such as relying on it to control labor costs without issuing mass layoffs.
Uncover the factors that lead to employee attrition and explore important questions such as ‘show me a breakdown of distance from home by job role and attrition’ or ‘compare average monthly income by education and attrition’. This is a data set created by IBM data scientists.It is adetail analysis
IBM (International Business Machines Corporation) is an American multinational technology company headquartered in Armonk, New York, United States, with operations in over 170 countries. The company originated in 1911 as the Computing-Tabulating-Recording Company (CTR) and was renamed “International Business Machines” in 1924.IBM manufactures and markets computer hardware, middleware and software, and provides hosting and consulting services in areas ranging from mainframe computers to nanotechnology. IBM is also a major research organization, holding the record for most patents generated by a business (as of 2017) for 24 consecutive years
Data has been collected from kaggle website.Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective.Data set link is given below-https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset
Education 1 ‘Below College’ 2 ‘College’ 3 ‘Bachelor’ 4 ‘Master’ 5 ‘Doctor’
EnvironmentSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
JobInvolvement 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
JobSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
PerformanceRating 1 ‘Low’ 2 ‘Good’ 3 ‘Excellent’ 4 ‘Outstanding’
RelationshipSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
WorkLifeBalance 1 ‘Bad’ 2 ‘Good’ 3 ‘Better’ 4 ‘Best’
attrition.df<-read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")
View("attrition.df")
dim(attrition.df)
## [1] 1470 35
car::some(attrition.df)
## ï..Age Attrition BusinessTravel DailyRate Department
## 206 29 Yes Travel_Rarely 121 Sales
## 445 48 No Travel_Rarely 163 Sales
## 730 35 No Travel_Rarely 583 Research & Development
## 858 44 Yes Travel_Rarely 1097 Research & Development
## 917 46 No Travel_Rarely 168 Sales
## 935 25 No Travel_Rarely 266 Research & Development
## 1055 49 No Travel_Rarely 1490 Research & Development
## 1056 34 No Travel_Frequently 829 Research & Development
## 1281 37 No Travel_Rarely 1239 Human Resources
## 1295 41 No Travel_Rarely 447 Research & Development
## DistanceFromHome Education EducationField EmployeeCount
## 206 27 3 Marketing 1
## 445 2 5 Marketing 1
## 730 25 4 Medical 1
## 858 10 4 Life Sciences 1
## 917 4 2 Marketing 1
## 935 1 3 Medical 1
## 1055 7 4 Life Sciences 1
## 1056 15 3 Medical 1
## 1281 8 2 Other 1
## 1295 5 3 Life Sciences 1
## EmployeeNumber EnvironmentSatisfaction Gender HourlyRate
## 206 283 2 Female 35
## 445 595 2 Female 37
## 730 1014 3 Female 57
## 858 1200 3 Male 96
## 917 1280 4 Female 33
## 935 1303 4 Female 40
## 1055 1484 3 Male 35
## 1056 1485 2 Male 71
## 1281 1794 3 Male 89
## 1295 1814 2 Male 85
## JobInvolvement JobLevel JobRole JobSatisfaction
## 206 3 3 Sales Executive 4
## 445 3 2 Sales Executive 4
## 730 3 3 Healthcare Representative 3
## 858 3 1 Research Scientist 3
## 917 2 5 Manager 2
## 935 3 1 Research Scientist 2
## 1055 3 3 Healthcare Representative 2
## 1056 3 4 Research Director 1
## 1281 3 2 Human Resources 2
## 1295 4 2 Healthcare Representative 2
## MaritalStatus MonthlyIncome MonthlyRate NumCompaniesWorked Over18
## 206 Married 7639 24525 1 Y
## 445 Married 4051 19658 2 Y
## 730 Divorced 10388 6975 1 Y
## 858 Single 2936 10826 1 Y
## 917 Married 18789 9946 2 Y
## 935 Single 2096 18830 1 Y
## 1055 Divorced 10466 20948 3 Y
## 1056 Divorced 17007 11929 7 Y
## 1281 Divorced 4071 12832 2 Y
## 1295 Single 6870 15530 3 Y
## OverTime PercentSalaryHike PerformanceRating RelationshipSatisfaction
## 206 No 22 4 4
## 445 No 14 3 1
## 730 Yes 11 3 3
## 858 Yes 11 3 3
## 917 No 14 3 3
## 935 No 18 3 4
## 1055 No 14 3 2
## 1056 No 14 3 4
## 1281 No 13 3 3
## 1295 No 12 3 1
## StandardHours StockOptionLevel TotalWorkingYears
## 206 80 3 10
## 445 80 1 14
## 730 80 1 16
## 858 80 0 6
## 917 80 1 26
## 935 80 0 2
## 1055 80 2 29
## 1056 80 2 16
## 1281 80 0 19
## 1295 80 0 11
## TrainingTimesLastYear WorkLifeBalance YearsAtCompany
## 206 3 2 10
## 445 2 3 9
## 730 3 2 16
## 858 4 3 6
## 917 2 3 11
## 935 3 2 2
## 1055 3 3 8
## 1056 3 2 14
## 1281 4 2 10
## 1295 3 1 3
## YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
## 206 4 1 9
## 445 7 6 7
## 730 10 10 1
## 858 4 0 2
## 917 4 0 8
## 935 2 2 1
## 1055 7 0 7
## 1056 8 6 9
## 1281 0 4 7
## 1295 2 1 2
attrition.df[, c(2)] <- sapply(attrition.df[, c(2)], as.numeric)
1 stands for no and 2 stands for yes
head(attrition.df)
## ï..Age Attrition BusinessTravel DailyRate Department
## 1 41 2 Travel_Rarely 1102 Sales
## 2 49 1 Travel_Frequently 279 Research & Development
## 3 37 2 Travel_Rarely 1373 Research & Development
## 4 33 1 Travel_Frequently 1392 Research & Development
## 5 27 1 Travel_Rarely 591 Research & Development
## 6 32 1 Travel_Frequently 1005 Research & Development
## DistanceFromHome Education EducationField EmployeeCount EmployeeNumber
## 1 1 2 Life Sciences 1 1
## 2 8 1 Life Sciences 1 2
## 3 2 2 Other 1 4
## 4 3 4 Life Sciences 1 5
## 5 2 1 Medical 1 7
## 6 2 2 Life Sciences 1 8
## EnvironmentSatisfaction Gender HourlyRate JobInvolvement JobLevel
## 1 2 Female 94 3 2
## 2 3 Male 61 2 2
## 3 4 Male 92 2 1
## 4 4 Female 56 3 1
## 5 1 Male 40 3 1
## 6 4 Male 79 3 1
## JobRole JobSatisfaction MaritalStatus MonthlyIncome
## 1 Sales Executive 4 Single 5993
## 2 Research Scientist 2 Married 5130
## 3 Laboratory Technician 3 Single 2090
## 4 Research Scientist 3 Married 2909
## 5 Laboratory Technician 2 Married 3468
## 6 Laboratory Technician 4 Single 3068
## MonthlyRate NumCompaniesWorked Over18 OverTime PercentSalaryHike
## 1 19479 8 Y Yes 11
## 2 24907 1 Y No 23
## 3 2396 6 Y Yes 15
## 4 23159 1 Y Yes 11
## 5 16632 9 Y No 12
## 6 11864 0 Y No 13
## PerformanceRating RelationshipSatisfaction StandardHours
## 1 3 1 80
## 2 4 4 80
## 3 3 2 80
## 4 3 3 80
## 5 3 4 80
## 6 3 3 80
## StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance
## 1 0 8 0 1
## 2 1 10 3 3
## 3 0 7 3 3
## 4 0 8 3 3
## 5 1 6 3 3
## 6 0 8 2 2
## YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion
## 1 6 4 0
## 2 10 7 1
## 3 0 0 0
## 4 8 7 3
## 5 2 2 2
## 6 7 7 3
## YearsWithCurrManager
## 1 5
## 2 7
## 3 0
## 4 0
## 5 2
## 6 6
psych::describe(attrition.df)[,3:9]
## mean sd median trimmed mad min
## ï..Age 36.92 9.14 36.0 36.47 8.90 18
## Attrition 1.16 0.37 1.0 1.08 0.00 1
## BusinessTravel* 2.61 0.67 3.0 2.76 0.00 1
## DailyRate 802.49 403.51 802.0 803.83 510.01 102
## Department* 2.26 0.53 2.0 2.25 0.00 1
## DistanceFromHome 9.19 8.11 7.0 8.08 7.41 1
## Education 2.91 1.02 3.0 2.98 1.48 1
## EducationField* 3.25 1.33 3.0 3.10 1.48 1
## EmployeeCount 1.00 0.00 1.0 1.00 0.00 1
## EmployeeNumber 1024.87 602.02 1020.5 1023.40 790.97 1
## EnvironmentSatisfaction 2.72 1.09 3.0 2.78 1.48 1
## Gender* 1.60 0.49 2.0 1.62 0.00 1
## HourlyRate 65.89 20.33 66.0 66.02 26.69 30
## JobInvolvement 2.73 0.71 3.0 2.74 0.00 1
## JobLevel 2.06 1.11 2.0 1.90 1.48 1
## JobRole* 5.46 2.46 6.0 5.61 2.97 1
## JobSatisfaction 2.73 1.10 3.0 2.79 1.48 1
## MaritalStatus* 2.10 0.73 2.0 2.12 1.48 1
## MonthlyIncome 6502.93 4707.96 4919.0 5667.24 3260.24 1009
## MonthlyRate 14313.10 7117.79 14235.5 14286.48 9201.76 2094
## NumCompaniesWorked 2.69 2.50 2.0 2.36 1.48 0
## Over18* 1.00 0.00 1.0 1.00 0.00 1
## OverTime* 1.28 0.45 1.0 1.23 0.00 1
## PercentSalaryHike 15.21 3.66 14.0 14.80 2.97 11
## PerformanceRating 3.15 0.36 3.0 3.07 0.00 3
## RelationshipSatisfaction 2.71 1.08 3.0 2.77 1.48 1
## StandardHours 80.00 0.00 80.0 80.00 0.00 80
## StockOptionLevel 0.79 0.85 1.0 0.67 1.48 0
## TotalWorkingYears 11.28 7.78 10.0 10.37 5.93 0
## TrainingTimesLastYear 2.80 1.29 3.0 2.72 1.48 0
## WorkLifeBalance 2.76 0.71 3.0 2.77 0.00 1
## YearsAtCompany 7.01 6.13 5.0 5.99 4.45 0
## YearsInCurrentRole 4.23 3.62 3.0 3.85 4.45 0
## YearsSinceLastPromotion 2.19 3.22 1.0 1.48 1.48 0
## YearsWithCurrManager 4.12 3.57 3.0 3.77 4.45 0
## max
## ï..Age 60
## Attrition 2
## BusinessTravel* 3
## DailyRate 1499
## Department* 3
## DistanceFromHome 29
## Education 5
## EducationField* 6
## EmployeeCount 1
## EmployeeNumber 2068
## EnvironmentSatisfaction 4
## Gender* 2
## HourlyRate 100
## JobInvolvement 4
## JobLevel 5
## JobRole* 9
## JobSatisfaction 4
## MaritalStatus* 3
## MonthlyIncome 19999
## MonthlyRate 26999
## NumCompaniesWorked 9
## Over18* 1
## OverTime* 2
## PercentSalaryHike 25
## PerformanceRating 4
## RelationshipSatisfaction 4
## StandardHours 80
## StockOptionLevel 3
## TotalWorkingYears 40
## TrainingTimesLastYear 6
## WorkLifeBalance 4
## YearsAtCompany 40
## YearsInCurrentRole 18
## YearsSinceLastPromotion 15
## YearsWithCurrManager 17
We can see the important variables mean median and mode from this
mytable<-with(attrition.df,table(Gender))
mytable
## Gender
## Female Male
## 588 882
Most of the employees in this data set is male
hist(attrition.df$ï..Age,main="Distribution of age",xlab="Age",ylab="Count",col=blues9)
Age lies between 30-40 yrs
hist(attrition.df$MonthlyIncome,main="Distribution of Monthly Income",xlab="Monthly Income",ylab="Count",col="orange")
Most of the employees earn less.
hist(attrition.df$Attrition,main=" Distribution of Attrition",xlab="Attrition ",ylab="Count",col="yellowgreen" )
Attrition NO-1,Yes-2.Low number of people go for attrition
hist(attrition.df$DistanceFromHome,main="Distance from home distribution",xlab="Distance from Home",ylab="Count",col="Pink")
Most of the employees in this data set live in close proximity to the office.
hist(attrition.df$NumCompaniesWorked,main="Number of companies Worked with",xlab="Number of companies",ylab="Count",col="orange")
Most of the employees have changed there company less than 2 times
hist(attrition.df$PercentSalaryHike,main="Distribution of salary hike",xlab="Salary hike(%)",ylab="Count",col="red")
Hike of 12-14 percent is common
hist(attrition.df$TrainingTimesLastYear,main="Number of times the employee has gone under training",xlab="Training count",ylab="Count",col="brown")
hist(attrition.df$YearsAtCompany,main="Years at company",xlab="Number of years",ylab="Count",col="yellow")
Most of the employees of IBM are new and have served the company for less than 10 years
hist(attrition.df$YearsInCurrentRole,main="Years in current role",xlab="Years count",ylab="Count",col="yellow")
Most of the Employees have been in the same role for long period.
hist(attrition.df$YearsWithCurrManager,main="Years with current manager",xlab="Years count",ylab="Count",col="yellow")
Most of the employees have been with same manager for less than 5 yrs
hist(attrition.df$YearsSinceLastPromotion,main="years since last promotion",xlab="Years count",ylab="Count",col="yellow")
Most of the employees get promotion within five years.
mytable<-with(attrition.df,table(BusinessTravel))
mytable
## BusinessTravel
## Non-Travel Travel_Frequently Travel_Rarely
## 150 277 1043
Most of the employees travel rarely
mytable<-with(attrition.df,table(Department))
mytable
## Department
## Human Resources Research & Development Sales
## 63 961 446
IBM company’s main focus is on research and development having the largest team.
mytable<-with(attrition.df,table(Education))
mytable
## Education
## 1 2 3 4 5
## 170 282 572 398 48
Most of the employees have studied till bachelor’s degree
mytable<-with(attrition.df,table(EnvironmentSatisfaction))
mytable
## EnvironmentSatisfaction
## 1 2 3 4
## 284 287 453 446
Most of the employees have hif=gh environment satisfaction.
mytable<-with(attrition.df,table(JobInvolvement))
mytable
## JobInvolvement
## 1 2 3 4
## 83 375 868 144
Most of the employees have high job involvement
mytable<-with(attrition.df,table(JobLevel))
mytable
## JobLevel
## 1 2 3 4 5
## 543 534 218 106 69
Most of the employees have low job level
mytable<-with(attrition.df,table(JobRole))
mytable
## JobRole
## Healthcare Representative Human Resources
## 131 52
## Laboratory Technician Manager
## 259 102
## Manufacturing Director Research Director
## 145 80
## Research Scientist Sales Executive
## 292 326
## Sales Representative
## 83
This company has highest number of sales executives when compared to other roles.
mytable<-with(attrition.df,table(JobSatisfaction))
mytable
## JobSatisfaction
## 1 2 3 4
## 289 280 442 459
Most of the employees have very high job satisfaction
mytable<-with(attrition.df,table(MaritalStatus))
mytable
## MaritalStatus
## Divorced Married Single
## 327 673 470
Employees in this company are mostly married
mytable<-with(attrition.df,table(OverTime))
mytable
## OverTime
## No Yes
## 1054 416
Most of the employees dont go for over time in this firm
mytable<-with(attrition.df,table(PerformanceRating))
mytable
## PerformanceRating
## 3 4
## 1244 226
Mostly employees have performance rating of 3
mytable<-with(attrition.df,table(RelationshipSatisfaction))
mytable
## RelationshipSatisfaction
## 1 2 3 4
## 276 303 459 432
Relationship satisfaction is high in this company
mytable<-with(attrition.df,table(WorkLifeBalance))
mytable
## WorkLifeBalance
## 1 2 3 4
## 80 344 893 153
Work life balance is better in IBM
Job role vs distance from home vs attrition
mytable<-xtabs(~JobRole+DistanceFromHome+Attrition,data=attrition.df)
mytable
## , , Attrition = 1
##
## DistanceFromHome
## JobRole 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## Healthcare Representative 23 14 4 3 6 4 8 8 5 7 4 3 2 1 2
## Human Resources 8 9 3 2 1 2 0 5 1 2 1 0 1 0 0
## Laboratory Technician 27 25 12 9 7 12 10 10 18 6 5 3 2 1 3
## Manager 13 22 6 8 6 3 5 3 4 5 3 0 1 0 0
## Manufacturing Director 23 22 4 6 4 5 10 5 5 6 3 3 1 0 1
## Research Director 13 10 6 1 3 4 7 5 2 6 1 1 0 3 5
## Research Scientist 40 32 15 9 14 14 13 10 19 12 2 1 2 5 5
## Sales Executive 33 37 19 14 11 7 16 21 10 26 5 3 2 6 5
## Sales Representative 2 12 1 3 3 1 4 3 3 5 1 0 2 1 0
## DistanceFromHome
## JobRole 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## Healthcare Representative 4 1 4 0 4 1 0 2 3 4 1 2 1 1
## Human Resources 0 1 0 1 0 0 0 0 1 1 1 0 0 0
## Laboratory Technician 5 3 3 6 4 3 1 5 5 2 0 1 3 6
## Manager 1 2 1 0 0 1 3 0 0 1 6 0 0 3
## Manufacturing Director 3 2 6 3 2 2 1 2 3 4 0 3 2 4
## Research Director 0 1 1 0 0 0 2 2 0 1 1 0 3 0
## Research Scientist 4 3 3 6 4 2 3 7 2 2 6 2 5 3
## Sales Executive 8 2 4 3 5 4 2 3 1 4 7 1 6 4
## Sales Representative 0 0 0 0 2 2 1 1 1 0 0 0 1 1
##
## , , Attrition = 2
##
## DistanceFromHome
## JobRole 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## Healthcare Representative 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1
## Human Resources 1 1 0 0 0 1 0 1 1 0 0 0 1 0 0
## Laboratory Technician 4 11 3 3 2 3 6 4 4 3 0 1 0 2 2
## Manager 0 3 0 0 0 0 0 0 0 0 0 0 0 0 1
## Manufacturing Director 1 2 1 0 0 0 1 1 0 2 0 0 0 0 0
## Research Director 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0
## Research Scientist 7 4 5 2 3 1 1 2 2 4 1 0 0 1 0
## Sales Executive 6 2 3 4 2 1 0 2 5 2 2 2 5 0 1
## Sales Representative 7 3 2 0 3 1 3 0 6 0 0 2 0 0 0
## DistanceFromHome
## JobRole 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## Healthcare Representative 0 0 0 0 1 1 0 1 1 0 0 0 0 1
## Human Resources 0 1 1 0 1 0 2 1 0 0 0 0 0 0
## Laboratory Technician 2 2 0 0 0 1 0 0 5 2 0 0 1 1
## Manager 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## Manufacturing Director 0 0 0 0 0 0 1 1 0 0 0 0 0 0
## Research Director 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Research Scientist 2 2 2 1 0 0 1 1 1 3 0 0 0 1
## Sales Executive 3 0 1 1 1 0 1 1 3 1 3 3 1 1
## Sales Representative 0 0 0 1 1 1 1 0 2 0 0 0 0 0
margin.table(mytable,1)
## JobRole
## Healthcare Representative Human Resources
## 131 52
## Laboratory Technician Manager
## 259 102
## Manufacturing Director Research Director
## 145 80
## Research Scientist Sales Executive
## 292 326
## Sales Representative
## 83
Mostly sales executive travel long distance for job and they are the ones who mostly go for attrition.Hence distance affects the attrition of employees
aggregate(cbind(Attrition,ï..Age,MonthlyIncome) ~ Gender,
data = attrition.df, mean)
## Gender Attrition ï..Age MonthlyIncome
## 1 Female 1.147959 37.32993 6686.566
## 2 Male 1.170068 36.65306 6380.508
boxplot(MonthlyIncome~Attrition ,data=attrition.df, main="Attrition based on Monthly Income",ylab="monthly income",xlab="Attrition")
boxplot(ï..Age~Attrition ,data=attrition.df, main="Attrition based on age",ylab="Age",xlab="Attrition")
car::scatterplot(Attrition~MonthlyIncome|Gender,data=attrition.df,
xlab="MonthlyIncome", ylab="Attrition",
main="How Monthly income affects attrition",
labels=row.names(attrition.df))
car::scatterplot(Attrition~ï..Age|Gender,data=attrition.df,
xlab="Age", ylab="Attrition",
main="Effect of age on attrition",
labels=row.names(attrition.df))
People who opt for attrition are mostly male and have a low mothly income.Also they belong to a younger group age
aggregate(cbind(Attrition,NumCompaniesWorked,DistanceFromHome) ~ Gender,
data = attrition.df, mean)
## Gender Attrition NumCompaniesWorked DistanceFromHome
## 1 Female 1.147959 2.812925 9.210884
## 2 Male 1.170068 2.613379 9.180272
car::scatterplot(Attrition~NumCompaniesWorked|Gender,data=attrition.df,
xlab="Number of companies worked in", ylab="Attrition",
main="Effect of number of companies on attrition",
labels=row.names(attrition.df))
car::scatterplot(Attrition~DistanceFromHome|Gender,data=attrition.df,
xlab="DistanceFromHome", ylab="Attrition",
main="Distance from home",
labels=row.names(attrition.df))
Not able to draw much analysis from this.
car::scatterplot.matrix(~Attrition+PercentSalaryHike+TrainingTimesLastYear,data=attrition.df,
main="Attrition versus other variables")
## Warning: 'car::scatterplot.matrix' is deprecated.
## Use 'scatterplotMatrix' instead.
## See help("Deprecated") and help("car-deprecated").
We can see relationship between attrition and percent salary hike and trainings till now.When training’s are less then attrition is seen highest.
car::scatterplot.matrix(~Attrition+YearsAtCompany+YearsInCurrentRole+YearsWithCurrManager+YearsSinceLastPromotion,data=attrition.df,
main="Attrition versus other variables")
## Warning: 'car::scatterplot.matrix' is deprecated.
## Use 'scatterplotMatrix' instead.
## See help("Deprecated") and help("car-deprecated").
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit negative part of the spread
##From the scatter plot we can see the trend of variables with Attrition.It can be easily detected what all variables are efffecting the attrition of valuable employees. some are showing positive relationship while some are negative.
mytable<-xtabs(~Attrition+BusinessTravel,data=attrition.df)
mytable
## BusinessTravel
## Attrition Non-Travel Travel_Frequently Travel_Rarely
## 1 138 208 887
## 2 12 69 156
Most attrition takes place who travel rarely
mytable<-xtabs(~Attrition+Department,data=attrition.df)
mytable
## Department
## Attrition Human Resources Research & Development Sales
## 1 51 828 354
## 2 12 133 92
Most attrition is in research and development department
mytable<-xtabs(~Attrition+Education,data=attrition.df)
mytable
## Education
## Attrition 1 2 3 4 5
## 1 139 238 473 340 43
## 2 31 44 99 58 5
Most attrition is in education level 3
mytable<-xtabs(~Attrition+EnvironmentSatisfaction,data=attrition.df)
mytable
## EnvironmentSatisfaction
## Attrition 1 2 3 4
## 1 212 244 391 386
## 2 72 43 62 60
It explains that attrition is due to low environmental satisfaction
mytable<-xtabs(~Attrition+JobInvolvement,data=attrition.df)
mytable
## JobInvolvement
## Attrition 1 2 3 4
## 1 55 304 743 131
## 2 28 71 125 13
Attrition in medium job involvement
mytable<-xtabs(~Attrition+JobLevel,data=attrition.df)
mytable
## JobLevel
## Attrition 1 2 3 4 5
## 1 400 482 186 101 64
## 2 143 52 32 5 5
Attrition is maximum in lowest job level
mytable<-xtabs(~Attrition+JobRole,data=attrition.df)
mytable
## JobRole
## Attrition Healthcare Representative Human Resources Laboratory Technician
## 1 122 40 197
## 2 9 12 62
## JobRole
## Attrition Manager Manufacturing Director Research Director
## 1 97 135 78
## 2 5 10 2
## JobRole
## Attrition Research Scientist Sales Executive Sales Representative
## 1 245 269 50
## 2 47 57 33
Maximum attrition is in sales executive category
mytable<-xtabs(~Attrition+JobSatisfaction,data=attrition.df)
mytable
## JobSatisfaction
## Attrition 1 2 3 4
## 1 223 234 369 407
## 2 66 46 73 52
Attrition is in job satifaction level 3
mytable<-xtabs(~Attrition+MaritalStatus,data=attrition.df)
mytable
## MaritalStatus
## Attrition Divorced Married Single
## 1 294 589 350
## 2 33 84 120
Single people go for attrition
mytable<-xtabs(~Attrition+OverTime,data=attrition.df)
mytable
## OverTime
## Attrition No Yes
## 1 944 289
## 2 110 127
People who are forced to do overtime go for attrition
mytable<-xtabs(~Attrition+PerformanceRating,data=attrition.df)
mytable
## PerformanceRating
## Attrition 3 4
## 1 1044 189
## 2 200 37
People having low performance rating go for attrition
mytable<-xtabs(~Attrition+RelationshipSatisfaction,data=attrition.df)
mytable
## RelationshipSatisfaction
## Attrition 1 2 3 4
## 1 219 258 388 368
## 2 57 45 71 64
People having relationship satisfaction level 3 go for attrition
mytable<-xtabs(~Attrition+WorkLifeBalance,data=attrition.df)
mytable
## WorkLifeBalance
## Attrition 1 2 3 4
## 1 55 286 766 126
## 2 25 58 127 27
People having work life balance of 3 go for attrition
corrplot::corrplot.mixed(corr=cor(attrition.df[,c(1,2,4,6,7,10,11,13:15,17,19,20,21
)],use="complete.obs"),upper="pie",tl.pos="lt")
cor((attrition.df[,c(1,2,4,6,7,10,11,13:15,17,19,20,21)]))
## ï..Age Attrition DailyRate
## ï..Age 1.000000000 -0.15920501 0.010660943
## Attrition -0.159205007 1.00000000 -0.056651992
## DailyRate 0.010660943 -0.05665199 1.000000000
## DistanceFromHome -0.001686120 0.07792358 -0.004985337
## Education 0.208033731 -0.03137282 -0.016806433
## EmployeeNumber -0.010145467 -0.01057724 -0.050990434
## EnvironmentSatisfaction 0.010146428 -0.10336898 0.018354854
## HourlyRate 0.024286543 -0.00684555 0.023381422
## JobInvolvement 0.029819959 -0.13001596 0.046134874
## JobLevel 0.509604228 -0.16910475 0.002966335
## JobSatisfaction -0.004891877 -0.10348113 0.030571008
## MonthlyIncome 0.497854567 -0.15983958 0.007707059
## MonthlyRate 0.028051167 0.01517021 -0.032181602
## NumCompaniesWorked 0.299634758 0.04349374 0.038153434
## DistanceFromHome Education EmployeeNumber
## ï..Age -0.001686120 0.20803373 -0.010145467
## Attrition 0.077923583 -0.03137282 -0.010577243
## DailyRate -0.004985337 -0.01680643 -0.050990434
## DistanceFromHome 1.000000000 0.02104183 0.032916407
## Education 0.021041826 1.00000000 0.042070093
## EmployeeNumber 0.032916407 0.04207009 1.000000000
## EnvironmentSatisfaction -0.016075327 -0.02712831 0.017620802
## HourlyRate 0.031130586 0.01677483 0.035179212
## JobInvolvement 0.008783280 0.04243763 -0.006887923
## JobLevel 0.005302731 0.10158889 -0.018519194
## JobSatisfaction -0.003668839 -0.01129612 -0.046246735
## MonthlyIncome -0.017014445 0.09496068 -0.014828516
## MonthlyRate 0.027472864 -0.02608420 0.012648229
## NumCompaniesWorked -0.029250804 0.12631656 -0.001251032
## EnvironmentSatisfaction HourlyRate JobInvolvement
## ï..Age 0.010146428 0.02428654 0.029819959
## Attrition -0.103368978 -0.00684555 -0.130015957
## DailyRate 0.018354854 0.02338142 0.046134874
## DistanceFromHome -0.016075327 0.03113059 0.008783280
## Education -0.027128313 0.01677483 0.042437634
## EmployeeNumber 0.017620802 0.03517921 -0.006887923
## EnvironmentSatisfaction 1.000000000 -0.04985696 -0.008277598
## HourlyRate -0.049856956 1.00000000 0.042860641
## JobInvolvement -0.008277598 0.04286064 1.000000000
## JobLevel 0.001211699 -0.02785349 -0.012629883
## JobSatisfaction -0.006784353 -0.07133462 -0.021475910
## MonthlyIncome -0.006259088 -0.01579430 -0.015271491
## MonthlyRate 0.037599623 -0.01529675 -0.016322079
## NumCompaniesWorked 0.012594323 0.02215688 0.015012413
## JobLevel JobSatisfaction MonthlyIncome
## ï..Age 0.509604228 -0.0048918771 0.497854567
## Attrition -0.169104751 -0.1034811261 -0.159839582
## DailyRate 0.002966335 0.0305710078 0.007707059
## DistanceFromHome 0.005302731 -0.0036688392 -0.017014445
## Education 0.101588886 -0.0112961167 0.094960677
## EmployeeNumber -0.018519194 -0.0462467349 -0.014828516
## EnvironmentSatisfaction 0.001211699 -0.0067843526 -0.006259088
## HourlyRate -0.027853486 -0.0713346244 -0.015794304
## JobInvolvement -0.012629883 -0.0214759103 -0.015271491
## JobLevel 1.000000000 -0.0019437080 0.950299913
## JobSatisfaction -0.001943708 1.0000000000 -0.007156742
## MonthlyIncome 0.950299913 -0.0071567424 1.000000000
## MonthlyRate 0.039562951 0.0006439169 0.034813626
## NumCompaniesWorked 0.142501124 -0.0556994260 0.149515216
## MonthlyRate NumCompaniesWorked
## ï..Age 0.0280511671 0.299634758
## Attrition 0.0151702125 0.043493739
## DailyRate -0.0321816015 0.038153434
## DistanceFromHome 0.0274728635 -0.029250804
## Education -0.0260841972 0.126316560
## EmployeeNumber 0.0126482292 -0.001251032
## EnvironmentSatisfaction 0.0375996229 0.012594323
## HourlyRate -0.0152967496 0.022156883
## JobInvolvement -0.0163220791 0.015012413
## JobLevel 0.0395629510 0.142501124
## JobSatisfaction 0.0006439169 -0.055699426
## MonthlyIncome 0.0348136261 0.149515216
## MonthlyRate 1.0000000000 0.017521353
## NumCompaniesWorked 0.0175213534 1.000000000
From this we get the result that job level is highly correlated with age and monthly income and monthly income is highly correlated with age and job level.All are positively correlated
Now we will do correlation with other variables
corrplot::corrplot.mixed(corr=cor(attrition.df[,c(24:26,28:35 )],use="complete.obs"),upper="pie",tl.pos="lt")
cor((attrition.df[,c(24:26,28:35)]))
## PercentSalaryHike PerformanceRating
## PercentSalaryHike 1.000000000 0.773549996
## PerformanceRating 0.773549996 1.000000000
## RelationshipSatisfaction -0.040490081 -0.031351455
## StockOptionLevel 0.007527748 0.003506472
## TotalWorkingYears -0.020608488 0.006743668
## TrainingTimesLastYear -0.005221012 -0.015578882
## WorkLifeBalance -0.003279636 0.002572361
## YearsAtCompany -0.035991262 0.003435126
## YearsInCurrentRole -0.001520027 0.034986260
## YearsSinceLastPromotion -0.022154313 0.017896066
## YearsWithCurrManager -0.011985248 0.022827169
## RelationshipSatisfaction StockOptionLevel
## PercentSalaryHike -0.0404900811 0.007527748
## PerformanceRating -0.0313514554 0.003506472
## RelationshipSatisfaction 1.0000000000 -0.045952491
## StockOptionLevel -0.0459524907 1.000000000
## TotalWorkingYears 0.0240542918 0.010135969
## TrainingTimesLastYear 0.0024965264 0.011274070
## WorkLifeBalance 0.0196044057 0.004128730
## YearsAtCompany 0.0193667869 0.015058008
## YearsInCurrentRole -0.0151229149 0.050817873
## YearsSinceLastPromotion 0.0334925021 0.014352185
## YearsWithCurrManager -0.0008674968 0.024698227
## TotalWorkingYears TrainingTimesLastYear
## PercentSalaryHike -0.020608488 -0.005221012
## PerformanceRating 0.006743668 -0.015578882
## RelationshipSatisfaction 0.024054292 0.002496526
## StockOptionLevel 0.010135969 0.011274070
## TotalWorkingYears 1.000000000 -0.035661571
## TrainingTimesLastYear -0.035661571 1.000000000
## WorkLifeBalance 0.001007646 0.028072207
## YearsAtCompany 0.628133155 0.003568666
## YearsInCurrentRole 0.460364638 -0.005737504
## YearsSinceLastPromotion 0.404857759 -0.002066536
## YearsWithCurrManager 0.459188397 -0.004095526
## WorkLifeBalance YearsAtCompany YearsInCurrentRole
## PercentSalaryHike -0.003279636 -0.035991262 -0.001520027
## PerformanceRating 0.002572361 0.003435126 0.034986260
## RelationshipSatisfaction 0.019604406 0.019366787 -0.015122915
## StockOptionLevel 0.004128730 0.015058008 0.050817873
## TotalWorkingYears 0.001007646 0.628133155 0.460364638
## TrainingTimesLastYear 0.028072207 0.003568666 -0.005737504
## WorkLifeBalance 1.000000000 0.012089185 0.049856498
## YearsAtCompany 0.012089185 1.000000000 0.758753737
## YearsInCurrentRole 0.049856498 0.758753737 1.000000000
## YearsSinceLastPromotion 0.008941249 0.618408865 0.548056248
## YearsWithCurrManager 0.002759440 0.769212425 0.714364762
## YearsSinceLastPromotion YearsWithCurrManager
## PercentSalaryHike -0.022154313 -0.0119852485
## PerformanceRating 0.017896066 0.0228271689
## RelationshipSatisfaction 0.033492502 -0.0008674968
## StockOptionLevel 0.014352185 0.0246982266
## TotalWorkingYears 0.404857759 0.4591883971
## TrainingTimesLastYear -0.002066536 -0.0040955260
## WorkLifeBalance 0.008941249 0.0027594402
## YearsAtCompany 0.618408865 0.7692124251
## YearsInCurrentRole 0.548056248 0.7143647616
## YearsSinceLastPromotion 1.000000000 0.5102236358
## YearsWithCurrManager 0.510223636 1.0000000000
We see that performance rating is highly correlated to percent salary hike.Also total working years,years at company,years in current role,years since last promotion,years with current manager are inter correlated
t.test(PercentSalaryHike~PerformanceRating,data=attrition.df)
##
## Welch Two Sample t-test
##
## data: PercentSalaryHike by PerformanceRating
## t = -63.389, df = 456.99, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.089594 -7.603091
## sample estimates:
## mean in group 3 mean in group 4
## 14.00322 21.84956
In this case we can reject the null hypothesis.All the other four are inter correlated so no need to check.
attach(attrition.df)
Model1 <-Attrition~ï..Age+BusinessTravel+DailyRate+Department+DistanceFromHome+Education+EducationField+EnvironmentSatisfaction+Gender+JobLevel+JobRole+JobSatisfaction+MaritalStatus+MonthlyIncome+NumCompaniesWorked+OverTime+PercentSalaryHike+JobInvolvement+PerformanceRating+EnvironmentSatisfaction+TotalWorkingYears+TrainingTimesLastYear+WorkLifeBalance+YearsAtCompany+YearsInCurrentRole+YearsSinceLastPromotion+YearsWithCurrManager
fit1 <- lm(Model1, data = attrition.df)
summary(fit1)
##
## Call:
## lm(formula = Model1, data = attrition.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58046 -0.20418 -0.08046 0.08004 1.11625
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.475e+00 1.722e-01 8.564 < 2e-16
## ï..Age -3.673e-03 1.327e-03 -2.767 0.005722
## BusinessTravelTravel_Frequently 1.532e-01 3.310e-02 4.629 4.01e-06
## BusinessTravelTravel_Rarely 6.826e-02 2.855e-02 2.391 0.016948
## DailyRate -2.742e-05 2.119e-05 -1.294 0.195949
## DepartmentResearch & Development 1.242e-01 1.172e-01 1.060 0.289123
## DepartmentSales 9.838e-02 1.211e-01 0.812 0.416868
## DistanceFromHome 3.518e-03 1.048e-03 3.357 0.000809
## Education 1.767e-03 8.540e-03 0.207 0.836125
## EducationFieldLife Sciences -1.170e-01 8.378e-02 -1.396 0.162927
## EducationFieldMarketing -7.742e-02 8.924e-02 -0.868 0.385809
## EducationFieldMedical -1.310e-01 8.414e-02 -1.556 0.119812
## EducationFieldOther -1.352e-01 8.993e-02 -1.503 0.133073
## EducationFieldTechnical Degree -2.050e-02 8.750e-02 -0.234 0.814836
## EnvironmentSatisfaction -4.039e-02 7.798e-03 -5.180 2.53e-07
## GenderMale 3.394e-02 1.742e-02 1.948 0.051626
## JobLevel -4.253e-03 2.854e-02 -0.149 0.881559
## JobRoleHuman Resources 2.083e-01 1.224e-01 1.702 0.088977
## JobRoleLaboratory Technician 1.357e-01 4.005e-02 3.388 0.000723
## JobRoleManager 5.380e-02 6.794e-02 0.792 0.428598
## JobRoleManufacturing Director 1.418e-02 3.926e-02 0.361 0.718054
## JobRoleResearch Director 9.994e-04 6.058e-02 0.016 0.986840
## JobRoleResearch Scientist 3.702e-02 3.964e-02 0.934 0.350466
## JobRoleSales Executive 1.030e-01 7.762e-02 1.327 0.184692
## JobRoleSales Representative 2.579e-01 8.622e-02 2.992 0.002822
## JobSatisfaction -3.709e-02 7.697e-03 -4.818 1.60e-06
## MaritalStatusMarried 2.186e-02 2.192e-02 0.997 0.318839
## MaritalStatusSingle 1.335e-01 2.361e-02 5.652 1.91e-08
## MonthlyIncome 9.389e-07 7.600e-06 0.124 0.901691
## NumCompaniesWorked 1.653e-02 3.808e-03 4.340 1.53e-05
## OverTimeYes 2.083e-01 1.897e-02 10.981 < 2e-16
## PercentSalaryHike -1.949e-03 3.680e-03 -0.530 0.596403
## JobInvolvement -5.936e-02 1.200e-02 -4.947 8.42e-07
## PerformanceRating 1.845e-02 3.724e-02 0.496 0.620306
## TotalWorkingYears -3.401e-03 2.419e-03 -1.406 0.159899
## TrainingTimesLastYear -1.390e-02 6.642e-03 -2.092 0.036578
## WorkLifeBalance -3.263e-02 1.208e-02 -2.702 0.006980
## YearsAtCompany 5.250e-03 2.990e-03 1.756 0.079286
## YearsInCurrentRole -8.868e-03 3.878e-03 -2.287 0.022341
## YearsSinceLastPromotion 1.055e-02 3.420e-03 3.085 0.002072
## YearsWithCurrManager -9.634e-03 3.976e-03 -2.423 0.015518
##
## (Intercept) ***
## ï..Age **
## BusinessTravelTravel_Frequently ***
## BusinessTravelTravel_Rarely *
## DailyRate
## DepartmentResearch & Development
## DepartmentSales
## DistanceFromHome ***
## Education
## EducationFieldLife Sciences
## EducationFieldMarketing
## EducationFieldMedical
## EducationFieldOther
## EducationFieldTechnical Degree
## EnvironmentSatisfaction ***
## GenderMale .
## JobLevel
## JobRoleHuman Resources .
## JobRoleLaboratory Technician ***
## JobRoleManager
## JobRoleManufacturing Director
## JobRoleResearch Director
## JobRoleResearch Scientist
## JobRoleSales Executive
## JobRoleSales Representative **
## JobSatisfaction ***
## MaritalStatusMarried
## MaritalStatusSingle ***
## MonthlyIncome
## NumCompaniesWorked ***
## OverTimeYes ***
## PercentSalaryHike
## JobInvolvement ***
## PerformanceRating
## TotalWorkingYears
## TrainingTimesLastYear *
## WorkLifeBalance **
## YearsAtCompany .
## YearsInCurrentRole *
## YearsSinceLastPromotion **
## YearsWithCurrManager *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3225 on 1429 degrees of freedom
## Multiple R-squared: 0.2523, Adjusted R-squared: 0.2313
## F-statistic: 12.05 on 40 and 1429 DF, p-value: < 2.2e-16
This Regression model is unfit.It has very low R squared value.Hence we will run a test to determine best variable sto be under taken for regression model ##Test1
library(leaps)
## Warning: package 'leaps' was built under R version 3.4.3
leap1 <- regsubsets(Model1, data = attrition.df, nbest=1)
# summary(leap1)
plot(leap1, scale="adjr2")
In new regression model we will have total working years, education, distance from home,age,monthly income,overtime as variables.
attach(attrition.df)
## The following objects are masked from attrition.df (pos = 4):
##
## Attrition, BusinessTravel, DailyRate, Department,
## DistanceFromHome, Education, EducationField, EmployeeCount,
## EmployeeNumber, EnvironmentSatisfaction, Gender, HourlyRate,
## ï..Age, JobInvolvement, JobLevel, JobRole, JobSatisfaction,
## MaritalStatus, MonthlyIncome, MonthlyRate, NumCompaniesWorked,
## Over18, OverTime, PercentSalaryHike, PerformanceRating,
## RelationshipSatisfaction, StandardHours, StockOptionLevel,
## TotalWorkingYears, TrainingTimesLastYear, WorkLifeBalance,
## YearsAtCompany, YearsInCurrentRole, YearsSinceLastPromotion,
## YearsWithCurrManager
Model2 <-Attrition~EnvironmentSatisfaction+JobSatisfaction+OverTime+JobInvolvement+BusinessTravel+JobRole+MaritalStatus+TotalWorkingYears
fit2<- lm(Model2, data = attrition.df)
summary(fit2)
##
## Call:
## lm(formula = Model2, data = attrition.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.64053 -0.20175 -0.08617 0.05029 1.25881
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.350651 0.064724 20.868 < 2e-16 ***
## EnvironmentSatisfaction -0.040500 0.007935 -5.104 3.76e-07 ***
## JobSatisfaction -0.039566 0.007836 -5.049 5.00e-07 ***
## OverTimeYes 0.211422 0.019255 10.980 < 2e-16 ***
## JobInvolvement -0.063648 0.012170 -5.230 1.94e-07 ***
## BusinessTravelTravel_Frequently 0.146786 0.033715 4.354 1.43e-05 ***
## BusinessTravelTravel_Rarely 0.063610 0.029031 2.191 0.028603 *
## JobRoleHuman Resources 0.132329 0.054945 2.408 0.016147 *
## JobRoleLaboratory Technician 0.133775 0.036704 3.645 0.000277 ***
## JobRoleManager 0.041907 0.046354 0.904 0.366115
## JobRoleManufacturing Director -0.007684 0.039906 -0.193 0.847339
## JobRoleResearch Director -0.016877 0.048190 -0.350 0.726231
## JobRoleResearch Scientist 0.042960 0.036053 1.192 0.233620
## JobRoleSales Executive 0.080124 0.034482 2.324 0.020281 *
## JobRoleSales Representative 0.242834 0.048592 4.997 6.52e-07 ***
## MaritalStatusMarried 0.019822 0.022355 0.887 0.375392
## MaritalStatusSingle 0.133104 0.024002 5.546 3.48e-08 ***
## TotalWorkingYears -0.004542 0.001479 -3.071 0.002173 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3303 on 1452 degrees of freedom
## Multiple R-squared: 0.2029, Adjusted R-squared: 0.1936
## F-statistic: 21.75 on 17 and 1452 DF, p-value: < 2.2e-16
This model has high R squared value,however some of the variables have high p value too so we need to change our set of variables.
library(leaps)
leap2 <- regsubsets(Model2, data = attrition.df, nbest=1)
plot(leap2, scale="adjr2")
attach(attrition.df)
## The following objects are masked from attrition.df (pos = 3):
##
## Attrition, BusinessTravel, DailyRate, Department,
## DistanceFromHome, Education, EducationField, EmployeeCount,
## EmployeeNumber, EnvironmentSatisfaction, Gender, HourlyRate,
## ï..Age, JobInvolvement, JobLevel, JobRole, JobSatisfaction,
## MaritalStatus, MonthlyIncome, MonthlyRate, NumCompaniesWorked,
## Over18, OverTime, PercentSalaryHike, PerformanceRating,
## RelationshipSatisfaction, StandardHours, StockOptionLevel,
## TotalWorkingYears, TrainingTimesLastYear, WorkLifeBalance,
## YearsAtCompany, YearsInCurrentRole, YearsSinceLastPromotion,
## YearsWithCurrManager
## The following objects are masked from attrition.df (pos = 5):
##
## Attrition, BusinessTravel, DailyRate, Department,
## DistanceFromHome, Education, EducationField, EmployeeCount,
## EmployeeNumber, EnvironmentSatisfaction, Gender, HourlyRate,
## ï..Age, JobInvolvement, JobLevel, JobRole, JobSatisfaction,
## MaritalStatus, MonthlyIncome, MonthlyRate, NumCompaniesWorked,
## Over18, OverTime, PercentSalaryHike, PerformanceRating,
## RelationshipSatisfaction, StandardHours, StockOptionLevel,
## TotalWorkingYears, TrainingTimesLastYear, WorkLifeBalance,
## YearsAtCompany, YearsInCurrentRole, YearsSinceLastPromotion,
## YearsWithCurrManager
Model3 <-Attrition~OverTime+TotalWorkingYears
fit3<- lm(Model3, data = attrition.df)
summary(fit3)
##
## Call:
## lm(formula = Model3, data = attrition.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.39127 -0.16382 -0.11439 -0.02378 1.13273
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.196765 0.017064 70.132 < 2e-16 ***
## OverTimeYes 0.202738 0.020324 9.975 < 2e-16 ***
## TotalWorkingYears -0.008237 0.001177 -6.998 3.92e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.351 on 1467 degrees of freedom
## Multiple R-squared: 0.09093, Adjusted R-squared: 0.08969
## F-statistic: 73.36 on 2 and 1467 DF, p-value: < 2.2e-16
This seems to be the best model with
library(leaps)
leap3 <- regsubsets(Model3, data = attrition.df, nbest=1)
plot(leap3, scale="adjr2")
AIC(fit1)
## [1] 887.2047
AIC(fit2)
## [1] 935.1092
AIC(fit3)
## [1] 1098.415
WE have visualised all the important variable in this data set
After lots of try i have come to the conclusion that the most fit model is the one containing all the variables(fit1).It has highest R squared value among all the 3 models.