INTRODUCTION

ABOUT IBM :

IBM (International Business Machines Corporation) is an American multinational technology company headquartered in Armonk, New York, United States, with operations in over 170 countries. The company originated in 1911 as the Computing-Tabulating-Recording Company (CTR) and was renamed “International Business Machines” in 1924.IBM manufactures and markets computer hardware, middleware and software, and provides hosting and consulting services in areas ranging from mainframe computers to nanotechnology. IBM is also a major research organization, holding the record for most patents generated by a business (as of 2017) for 24 consecutive years.

ABOUT ATTRITION :

Attrition in business can mean the reduction in staff and employees in a company through normal means, such as retirement and resignation, the loss of customers or clients to old age or growing out of the company’s target demographic.The cause of attrition may be either voluntary or involuntary. Each industry has its own standards for acceptable attrition rates, and these rates can also differ between skilled and unskilled positions.

However, there is more to attrition than a shrinking workforce. As employees leave an organization, they take with them much-needed skills and qualifications that they developed during their tenure. On the other hand, junior professionals with promising qualifications can then succeed into higher level positions or business owners can introduce more diversity in experience or expertise.Due to the expenses associated with training new employees, any type of employee attrition is typically seen to have a monetary cost.

Although it is possible for a company to use employee attrition to its benefit in some circumstances, such as relying on it to control labor costs without issuing mass layoffs, in general, relatively high attrition is problematic for companies. HR professionals often assume a leadership role in designing company compensation programs, work culture and motivation systems that help the organization retain top employees.

OVERVIEW OF OUR ANALYSIS

In this project, we try to analyse what factors lead to employee retention in companies, and what factors influence them the most. We use a dataset that is published by the Human Resource department of IBM.

Data has been collected from kaggle website. Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective.Data set link is -https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset/data

ANALYSIS

Read the Dataset into R

attrition.df<-read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")
View(attrition.df)
dim(attrition.df)

Changing Attrition Column

1 = No

2 = Yes

car::some(attrition.df)
attrition.df[, c(2)] <- sapply(attrition.df[, c(2)], as.numeric)
head(attrition.df)

Describing the Dataset

psych::describe(attrition.df)[,3:9]

VISUALISING EACH COLUMN

1.Gender

mytable<-with(attrition.df,table(Gender))
mytable

There are more male employees than female employees in this data set.

2.Age

hist(attrition.df$ï..Age,main="Distribution of age",xlab="Age",ylab="Count",col=blues9)

Average age is between 30-40 years.

3.Attrition

hist(attrition.df$Attrition,main=" Distribution of Attrition",xlab="Attrition ",ylab="Count",col="yellowgreen"  )

Most of the people don’t go for attrition.

4.Business Travel

mytable<-with(attrition.df,table(BusinessTravel))
mytable

Most of the employees rarely travel.

5.Department

mytable<-with(attrition.df,table(Department))
mytable

IBM company’s main focus is on research and development as it has the largest team.

6.Distance From Home

hist(attrition.df$DistanceFromHome,main="Distance from home distribution",xlab="Distance from Home",ylab="Count",col="Pink")

Most of the people in this dataset live in close proximity to the office.

7.Level Of Education

Education : 1 ‘Below College’ 2 ‘College’ 3 ‘Bachelor’ 4 ‘Master’ 5 ‘Doctor’

mytable<-with(attrition.df,table(Education))
mytable

Most of the employees have a bachelor’s degree.

8.Environment Satisfaction

EnvironmentSatisfaction : 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

mytable<-with(attrition.df,table(EnvironmentSatisfaction))
mytable

Most of the employees have high environment satisfaction.

9.Job Involvement

JobInvolvement : 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

mytable<-with(attrition.df,table(JobInvolvement))
mytable

Most of the employees have high job involvement.

10.Job Level

mytable<-with(attrition.df,table(JobLevel))
mytable

Most of the employees have low job level.

11.Job Role

mytable<-with(attrition.df,table(JobRole))
mytable

This company has highest number of sales executives as compared to it’s other roles.

12.Job Satisfaction

JobSatisfaction : 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

mytable<-with(attrition.df,table(JobSatisfaction))
mytable

Most of the employees have very high job satisfaction.

13.Marital Status

mytable<-with(attrition.df,table(MaritalStatus))
mytable

Employees in this company are mostly married.

14.Monthly Income

hist(attrition.df$MonthlyIncome,main="Distribution of Monthly Income",xlab="Monthly Income",ylab="Count",col="orange")

Most of the employees earn less.

15.Number Of Companies Worked With

hist(attrition.df$NumCompaniesWorked,main="Number of companies Worked with",xlab="Number of companies",ylab="Count",col="orange")

MOst of the people have changed their company less than 2 times.

16.Overtime

mytable<-with(attrition.df,table(OverTime))
mytable

Most of the employees dont go for over time in this firm.

17.Percent Salaryhike

hist(attrition.df$PercentSalaryHike,main="Distribution of salary hike",xlab="Salary hike(%)",ylab="Count",col="red")

Hike of 12-14% is common.

18.Performance Rating

PerformanceRating : 1 ‘Low’ 2 ‘Good’ 3 ‘Excellent’ 4 ‘Outstanding’

mytable<-with(attrition.df,table(PerformanceRating))
mytable

Mostly employees have performance rating of 3.

19.Relationship Satisfaction

RelationshipSatisfaction : 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

mytable<-with(attrition.df,table(RelationshipSatisfaction))
mytable

Relationship satisfaction is high in this company.

20.Work Life Balance

WorkLifeBalance : 1 ‘Bad’ 2 ‘Good’ 3 ‘Better’ 4 ‘Best’

mytable<-with(attrition.df,table(WorkLifeBalance))
mytable

Work life balance is better in IBM.

21.Number Of Trainings

hist(attrition.df$TrainingTimesLastYear,main="Number of times the employee has gone under training",xlab="Training count",ylab="Count",col="brown")

22.Years At Company

hist(attrition.df$YearsAtCompany,main="Years at company",xlab="Number of years",ylab="Count",col="yellow")

Most of the employees at IBM are new and have served the company for less than 10 years.

23.Years In Current Role

hist(attrition.df$YearsInCurrentRole,main="Years in current role",xlab="Years count",ylab="Count",col="yellow")

Most of the employees have been in the same role for long period.

24.Years Since Last Promotion

hist(attrition.df$YearsSinceLastPromotion,main="years since last promotion",xlab="Years count",ylab="Count",col="yellow")

Most of the employees have been promoted in the last five years.

25.Years With Current Manager

hist(attrition.df$YearsWithCurrManager,main="Years with current manager",xlab="Years count",ylab="Count",col="yellow")

Most of the employees have been with the same manager for less than 5 years.

RELATIONSHIP BETWEEN ATTRITION AND OTHER VARIABLES

1.Job Role vs Distance From Home vs Attrition

mytable<-xtabs(~JobRole+DistanceFromHome+Attrition,data=attrition.df)
mytable
margin.table(mytable,1)

Sales executives travel long distance for their job and they are the ones who mostly go for attrition.

Hence distance affects the attrition of employees.

2.Effect of Monthly Income, Age and Gender on Attrition

aggregate(cbind(Attrition,ï..Age,MonthlyIncome) ~ Gender,
data = attrition.df, mean)
boxplot(MonthlyIncome~Attrition ,data=attrition.df, main="Attrition based on Monthly Income",ylab="monthly income",xlab="Attrition")
boxplot(ï..Age~Attrition ,data=attrition.df, main="Attrition based on age",ylab="Age",xlab="Attrition")
car::scatterplot(Attrition~MonthlyIncome|Gender,data=attrition.df,
xlab="MonthlyIncome", ylab="Attrition",
main="How Monthly income affects attrition",
labels=row.names(attrition.df))
car::scatterplot(Attrition~ï..Age|Gender,data=attrition.df,
xlab="Age", ylab="Attrition",
main="Effect of age on attrition",
labels=row.names(attrition.df))

People who opt for attrition are mostly male and have a low mothly income. They also belong to a younger group age.

3.Effect of Number Of Companies worked and Distance From Home on Attrition

aggregate(cbind(Attrition,NumCompaniesWorked,DistanceFromHome) ~ Gender,
data = attrition.df, mean)
car::scatterplot(Attrition~NumCompaniesWorked|Gender,data=attrition.df,
xlab="Number of companies worked in", ylab="Attrition",
main="Effect of number of companies on attrition",
labels=row.names(attrition.df))
car::scatterplot(Attrition~DistanceFromHome|Gender,data=attrition.df,
xlab="DistanceFromHome", ylab="Attrition",
main="Distance from home",
labels=row.names(attrition.df))

4.Attrition vs PercentSalaryHike vs TrainingTimesLastYear

car::scatterplot.matrix(~Attrition+PercentSalaryHike+TrainingTimesLastYear,data=attrition.df,
main="Attrition versus PercentSalaryHike and TrainingTime")

We can see that When training’s are less attrition is seen highest.

5.Attrition vs Other Variables

car::scatterplot.matrix(~Attrition+YearsAtCompany+YearsInCurrentRole+YearsWithCurrManager+YearsSinceLastPromotion,data=attrition.df,
main="Attrition versus other variables")

From the scatter plot we can see the trend of variables with Attrition.

It can be easily detected which all variables are afffecting the attrition of valuable employees. Some are showing positive relationship while some are negative.

EFFECT OF FIXED VARIABLES ON ATTRITION

1.Business Travel

mytable<-xtabs(~Attrition+BusinessTravel,data=attrition.df)
mytable

Most attrition takes place when employees travel rarely.

2.Department

mytable<-xtabs(~Attrition+Department,data=attrition.df)
mytable

Most attrition is in research and development department.

3.Education

mytable<-xtabs(~Attrition+Education,data=attrition.df)
mytable

Most attrition is in education level 3.

4.Environment Satisfaction

mytable<-xtabs(~Attrition+EnvironmentSatisfaction,data=attrition.df)
mytable

It explains that attrition is due to low environmental satisfaction.

5.Job Involvement

mytable<-xtabs(~Attrition+JobInvolvement,data=attrition.df)
mytable

Attrition is more medium job involvement.

6.Job Level

mytable<-xtabs(~Attrition+JobLevel,data=attrition.df)
mytable

Attrition is maximum in lowest job level.

7.Job Role

mytable<-xtabs(~Attrition+JobRole,data=attrition.df)
mytable

Maximum attrition is in sales executive category.

8.Job Satisfaction

mytable<-xtabs(~Attrition+JobSatisfaction,data=attrition.df)
mytable

Attrition is in job satifaction level 3.

9.Marital Status

mytable<-xtabs(~Attrition+MaritalStatus,data=attrition.df)
mytable

Single people go for attrition.

10.Overtime

mytable<-xtabs(~Attrition+OverTime,data=attrition.df)
mytable

People who are forced to do overtime go for attrition.

11.Performance Rating

mytable<-xtabs(~Attrition+PerformanceRating,data=attrition.df)
mytable

People having low performance rating go for attrition.

12.Relationship Satisfaction

mytable<-xtabs(~Attrition+RelationshipSatisfaction,data=attrition.df)
mytable

People having relationship satisfaction level 3 go for attrition.

13.Work Life Balance

mytable<-xtabs(~Attrition+WorkLifeBalance,data=attrition.df)
mytable

People having work life balance of 3 go for attrition.

CORRELATION

corrplot::corrplot.mixed(corr=cor(attrition.df[,c(1,2,4,6,7,10,11,13:15,17,19,20,21
                                                )],use="complete.obs"),upper="pie",tl.pos="lt")
                                                
cor((attrition.df[,c(1,2,4,6,7,10,11,13:15,17,19,20,21)]))

We can see that job level is highly correlated with age and monthly income and monthly income is highly correlated with age and job level. Thus all are positively correlated.

Correlation with other variables :

corrplot::corrplot.mixed(corr=cor(attrition.df[,c(24:26,28:35 )],use="complete.obs"),upper="pie",tl.pos="lt")
cor((attrition.df[,c(24:26,28:35)]))

We see that performance rating is highly correlated to percent salary hike.Also total working years,years at company,years in current role,years since last promotion,years with current manager are inter correlated.

T-TEST

To reject null hypothesis :

t.test(PercentSalaryHike~PerformanceRating,data=attrition.df)

In this case we can reject the null hypothesis.

The other four variables are inter correlated so no need to test them.