ABOUT IBM :
IBM (International Business Machines Corporation) is an American multinational technology company headquartered in Armonk, New York, United States, with operations in over 170 countries. The company originated in 1911 as the Computing-Tabulating-Recording Company (CTR) and was renamed “International Business Machines” in 1924.IBM manufactures and markets computer hardware, middleware and software, and provides hosting and consulting services in areas ranging from mainframe computers to nanotechnology. IBM is also a major research organization, holding the record for most patents generated by a business (as of 2017) for 24 consecutive years.
ABOUT ATTRITION :
Attrition in business can mean the reduction in staff and employees in a company through normal means, such as retirement and resignation, the loss of customers or clients to old age or growing out of the company’s target demographic.The cause of attrition may be either voluntary or involuntary. Each industry has its own standards for acceptable attrition rates, and these rates can also differ between skilled and unskilled positions.
However, there is more to attrition than a shrinking workforce. As employees leave an organization, they take with them much-needed skills and qualifications that they developed during their tenure. On the other hand, junior professionals with promising qualifications can then succeed into higher level positions or business owners can introduce more diversity in experience or expertise.Due to the expenses associated with training new employees, any type of employee attrition is typically seen to have a monetary cost.
Although it is possible for a company to use employee attrition to its benefit in some circumstances, such as relying on it to control labor costs without issuing mass layoffs, in general, relatively high attrition is problematic for companies. HR professionals often assume a leadership role in designing company compensation programs, work culture and motivation systems that help the organization retain top employees.
In this project, we try to analyse what factors lead to employee retention in companies, and what factors influence them the most. We use a dataset that is published by the Human Resource department of IBM.
Data has been collected from kaggle website. Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective.Data set link is -https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset/data
attrition.df<-read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")
View(attrition.df)
dim(attrition.df)
1 = No
2 = Yes
car::some(attrition.df)
attrition.df[, c(2)] <- sapply(attrition.df[, c(2)], as.numeric)
head(attrition.df)
psych::describe(attrition.df)[,3:9]
mytable<-with(attrition.df,table(Gender))
mytable
There are more male employees than female employees in this data set.
hist(attrition.df$ï..Age,main="Distribution of age",xlab="Age",ylab="Count",col=blues9)
Average age is between 30-40 years.
hist(attrition.df$Attrition,main=" Distribution of Attrition",xlab="Attrition ",ylab="Count",col="yellowgreen" )
Most of the people don’t go for attrition.
mytable<-with(attrition.df,table(BusinessTravel))
mytable
Most of the employees rarely travel.
mytable<-with(attrition.df,table(Department))
mytable
IBM company’s main focus is on research and development as it has the largest team.
hist(attrition.df$DistanceFromHome,main="Distance from home distribution",xlab="Distance from Home",ylab="Count",col="Pink")
Most of the people in this dataset live in close proximity to the office.
Education : 1 ‘Below College’ 2 ‘College’ 3 ‘Bachelor’ 4 ‘Master’ 5 ‘Doctor’
mytable<-with(attrition.df,table(Education))
mytable
Most of the employees have a bachelor’s degree.
EnvironmentSatisfaction : 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
mytable<-with(attrition.df,table(EnvironmentSatisfaction))
mytable
Most of the employees have high environment satisfaction.
JobInvolvement : 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
mytable<-with(attrition.df,table(JobInvolvement))
mytable
Most of the employees have high job involvement.
mytable<-with(attrition.df,table(JobLevel))
mytable
Most of the employees have low job level.
mytable<-with(attrition.df,table(JobRole))
mytable
This company has highest number of sales executives as compared to it’s other roles.
JobSatisfaction : 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
mytable<-with(attrition.df,table(JobSatisfaction))
mytable
Most of the employees have very high job satisfaction.
mytable<-with(attrition.df,table(MaritalStatus))
mytable
Employees in this company are mostly married.
hist(attrition.df$MonthlyIncome,main="Distribution of Monthly Income",xlab="Monthly Income",ylab="Count",col="orange")
Most of the employees earn less.
hist(attrition.df$NumCompaniesWorked,main="Number of companies Worked with",xlab="Number of companies",ylab="Count",col="orange")
MOst of the people have changed their company less than 2 times.
mytable<-with(attrition.df,table(OverTime))
mytable
Most of the employees dont go for over time in this firm.
hist(attrition.df$PercentSalaryHike,main="Distribution of salary hike",xlab="Salary hike(%)",ylab="Count",col="red")
Hike of 12-14% is common.
PerformanceRating : 1 ‘Low’ 2 ‘Good’ 3 ‘Excellent’ 4 ‘Outstanding’
mytable<-with(attrition.df,table(PerformanceRating))
mytable
Mostly employees have performance rating of 3.
RelationshipSatisfaction : 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
mytable<-with(attrition.df,table(RelationshipSatisfaction))
mytable
Relationship satisfaction is high in this company.
WorkLifeBalance : 1 ‘Bad’ 2 ‘Good’ 3 ‘Better’ 4 ‘Best’
mytable<-with(attrition.df,table(WorkLifeBalance))
mytable
Work life balance is better in IBM.
hist(attrition.df$TrainingTimesLastYear,main="Number of times the employee has gone under training",xlab="Training count",ylab="Count",col="brown")
hist(attrition.df$YearsAtCompany,main="Years at company",xlab="Number of years",ylab="Count",col="yellow")
Most of the employees at IBM are new and have served the company for less than 10 years.
hist(attrition.df$YearsInCurrentRole,main="Years in current role",xlab="Years count",ylab="Count",col="yellow")
Most of the employees have been in the same role for long period.
hist(attrition.df$YearsSinceLastPromotion,main="years since last promotion",xlab="Years count",ylab="Count",col="yellow")
Most of the employees have been promoted in the last five years.
hist(attrition.df$YearsWithCurrManager,main="Years with current manager",xlab="Years count",ylab="Count",col="yellow")
Most of the employees have been with the same manager for less than 5 years.
mytable<-xtabs(~JobRole+DistanceFromHome+Attrition,data=attrition.df)
mytable
margin.table(mytable,1)
Sales executives travel long distance for their job and they are the ones who mostly go for attrition.
Hence distance affects the attrition of employees.
aggregate(cbind(Attrition,ï..Age,MonthlyIncome) ~ Gender,
data = attrition.df, mean)
boxplot(MonthlyIncome~Attrition ,data=attrition.df, main="Attrition based on Monthly Income",ylab="monthly income",xlab="Attrition")
boxplot(ï..Age~Attrition ,data=attrition.df, main="Attrition based on age",ylab="Age",xlab="Attrition")
car::scatterplot(Attrition~MonthlyIncome|Gender,data=attrition.df,
xlab="MonthlyIncome", ylab="Attrition",
main="How Monthly income affects attrition",
labels=row.names(attrition.df))
car::scatterplot(Attrition~ï..Age|Gender,data=attrition.df,
xlab="Age", ylab="Attrition",
main="Effect of age on attrition",
labels=row.names(attrition.df))
People who opt for attrition are mostly male and have a low mothly income. They also belong to a younger group age.
aggregate(cbind(Attrition,NumCompaniesWorked,DistanceFromHome) ~ Gender,
data = attrition.df, mean)
car::scatterplot(Attrition~NumCompaniesWorked|Gender,data=attrition.df,
xlab="Number of companies worked in", ylab="Attrition",
main="Effect of number of companies on attrition",
labels=row.names(attrition.df))
car::scatterplot(Attrition~DistanceFromHome|Gender,data=attrition.df,
xlab="DistanceFromHome", ylab="Attrition",
main="Distance from home",
labels=row.names(attrition.df))
car::scatterplot.matrix(~Attrition+PercentSalaryHike+TrainingTimesLastYear,data=attrition.df,
main="Attrition versus PercentSalaryHike and TrainingTime")
We can see that When training’s are less attrition is seen highest.
car::scatterplot.matrix(~Attrition+YearsAtCompany+YearsInCurrentRole+YearsWithCurrManager+YearsSinceLastPromotion,data=attrition.df,
main="Attrition versus other variables")
From the scatter plot we can see the trend of variables with Attrition.
It can be easily detected which all variables are afffecting the attrition of valuable employees. Some are showing positive relationship while some are negative.
mytable<-xtabs(~Attrition+BusinessTravel,data=attrition.df)
mytable
Most attrition takes place when employees travel rarely.
mytable<-xtabs(~Attrition+Department,data=attrition.df)
mytable
Most attrition is in research and development department.
mytable<-xtabs(~Attrition+Education,data=attrition.df)
mytable
Most attrition is in education level 3.
mytable<-xtabs(~Attrition+EnvironmentSatisfaction,data=attrition.df)
mytable
It explains that attrition is due to low environmental satisfaction.
mytable<-xtabs(~Attrition+JobInvolvement,data=attrition.df)
mytable
Attrition is more medium job involvement.
mytable<-xtabs(~Attrition+JobLevel,data=attrition.df)
mytable
Attrition is maximum in lowest job level.
mytable<-xtabs(~Attrition+JobRole,data=attrition.df)
mytable
Maximum attrition is in sales executive category.
mytable<-xtabs(~Attrition+JobSatisfaction,data=attrition.df)
mytable
Attrition is in job satifaction level 3.
mytable<-xtabs(~Attrition+MaritalStatus,data=attrition.df)
mytable
Single people go for attrition.
mytable<-xtabs(~Attrition+OverTime,data=attrition.df)
mytable
People who are forced to do overtime go for attrition.
mytable<-xtabs(~Attrition+PerformanceRating,data=attrition.df)
mytable
People having low performance rating go for attrition.
mytable<-xtabs(~Attrition+RelationshipSatisfaction,data=attrition.df)
mytable
People having relationship satisfaction level 3 go for attrition.
mytable<-xtabs(~Attrition+WorkLifeBalance,data=attrition.df)
mytable
People having work life balance of 3 go for attrition.
corrplot::corrplot.mixed(corr=cor(attrition.df[,c(1,2,4,6,7,10,11,13:15,17,19,20,21
)],use="complete.obs"),upper="pie",tl.pos="lt")
cor((attrition.df[,c(1,2,4,6,7,10,11,13:15,17,19,20,21)]))
We can see that job level is highly correlated with age and monthly income and monthly income is highly correlated with age and job level. Thus all are positively correlated.
Correlation with other variables :
corrplot::corrplot.mixed(corr=cor(attrition.df[,c(24:26,28:35 )],use="complete.obs"),upper="pie",tl.pos="lt")
cor((attrition.df[,c(24:26,28:35)]))
We see that performance rating is highly correlated to percent salary hike.Also total working years,years at company,years in current role,years since last promotion,years with current manager are inter correlated.
To reject null hypothesis :
t.test(PercentSalaryHike~PerformanceRating,data=attrition.df)
In this case we can reject the null hypothesis.
The other four variables are inter correlated so no need to test them.