This analysis deals with exploring different categorical and quantitative variables and their relationship to employee attrition. We will begin this analysis with a general overview to see how the data set is organized, calculate and visualize the correlation between the different variables and employee attrition, and generate a generalized linear model to reveal the good predictors of employee attrition. The data represented in this analysis is artificial/hypothetical and can be found on kaggle.com.
Key:
Education 1 ‘Below College’ 2 ‘College’ 3 ‘Bachelor’ 4 ‘Master’ 5 ‘Doctor’
EnvironmentSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
JobInvolvement 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
JobSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
PerformanceRating 1 ‘Low’ 2 ‘Good’ 3 ‘Excellent’ 4 ‘Outstanding’
RelationshipSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
WorkLifeBalance 1 ‘Bad’ 2 ‘Good’ 3 ‘Better’ 4 ‘Best’
The data analysis will be broken down by the following sections:
Proportion of Attrition Within Dataset
Pearson’s Correlation - Heat Map
Positive Correlation to Attrition
Neutral Correlation to Attrition
Negative Correlation to Attrition
Logistic Linear Regression Model
Categorical Variable Analysis
Predictors of Interest - Numeric vs Categorical
ggplot(employee_data, aes(x = Attrition)) +
geom_bar(position = "stack", fill = wes_palette("GrandBudapest2", n = 2)) +
theme_dark() +
labs(x = "Attrition",
y = "Count",
title = "Less Attrition in this Data Set",
caption = "Source: IBM HR Analytics") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
employee_data %>%
group_by(Attrition) %>%
summarise(n = n())
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 2
## Attrition n
## <chr> <int>
## 1 No 1233
## 2 Yes 237
corr_data <- employee_data %>%
mutate(attrition = ifelse(Attrition == "No", 0, 1),
gender = ifelse(Gender == "Female", 0, 1),
overtime = ifelse(OverTime == "No", 0, 1)) %>%
select(Age, attrition, DistanceFromHome, Education, NumCompaniesWorked, EnvironmentSatisfaction, gender, HourlyRate, JobSatisfaction, PercentSalaryHike, overtime, TotalWorkingYears, WorkLifeBalance, YearsAtCompany:YearsWithCurrManager)
ggcorrplot(cor(corr_data), hc.order = TRUE, lab = TRUE, lab_size = 2) +
labs(title = "Correlation Between Variables and Attrition",
subtitle = "Netural and Positive Correlation",
caption = "Source: IBM HR Analytics") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
First we have narrowed our data set to focus on variables of interest. To ensure that the correlation heat map correctly ran, all variables must be a numeric value.
Based on this correlation plot, we are more interested in how attrition fares with the other employee factors
We can separate by positive correlation, neutral correlation. and negative correlation
This will be our overhead analysis and we will now dive a bit deeper into each group and see if it matches our linear regression model (which we will formulate later in our analysis).
In this section, we will evaluate the positive correlations within our data set in relation to attrition. According to our heat map, we have 2:
OverTime
Distance from Home
This indicates that as overtime increases and work distance from home increases, so does the likelihood of attrition as shown below:
ot_bar <- ggplot(employee_data, aes(x = OverTime, fill = Attrition)) +
geom_bar(position = "fill") +
theme_dark() +
labs(x = "Over Time",
y = "Proportion",
title = "Over Time Employees",
subtitle = "Have More Attrition",
caption = "Source: IBM HR Analytics") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
dist_box <- ggplot(employee_data, aes(x = Attrition, y = DistanceFromHome)) +
geom_boxplot(fill = wes_palette("Moonrise3", n = 2)) +
theme_dark() +
labs(x = "Attrition",
y = "Distance from Home",
title = "More Distance from Work to Home",
subtitle = "Has More Attrition",
caption = "Source: IBM HR Analytics") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
multiplot(ot_bar, dist_box, cols = 2)
This bar chart tells us, that there is more attrition in those that decide to work overtime vs those that do not. Also, this box plot tells us, that there is more attrition in those that have to commute greater distances than those that do not.
Although both plots show a positive correlation to attrition, is it statistically significant?
t.test(corr_data$overtime ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
##
## Welch Two Sample t-test
##
## data: corr_data$overtime by corr_data$attrition
## t = -8.7046, df = 304.63, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3696299 -0.2333247
## sample estimates:
## mean in group 0 mean in group 1
## 0.2343877 0.5358650
t.test(corr_data$DistanceFromHome ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
##
## Welch Two Sample t-test
##
## data: corr_data$DistanceFromHome by corr_data$attrition
## t = -2.8882, df = 322.72, p-value = 0.004137
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.8870025 -0.5475146
## sample estimates:
## mean in group 0 mean in group 1
## 8.915653 10.632911
Null Hypothesis 1: states that there is no significant difference between the attrition of those who work and do not work overtime.
Null Hypothesis 2: states that there is no significant difference between the attrition of those who work closer to home and those who work further from home.
Due to our p-values being less than 5% and the confidence interval does not include zero, we can reject both of the null hypotheses above with 95% confidence and state that there is a statistically significant difference in attrition for each respective variable.
Therefore, overtime workers and commuters (average of >10.63 miles) have a higher likelihood of attrition.
In this section, we will evaluate the neutral correlation within our data set in relation to attrition. According to our heat map, we have 7 (+- .05 from zero):
This indicates, that as each variable increases there will be no change in the likelihood of attrition as shown below:
hr_att <- ggplot(employee_data, aes(x = Attrition, y = HourlyRate)) +
geom_boxplot(fill = wes_palette("GrandBudapest1", n = 2)) +
theme_dark() +
labs(y = "Hourly Rate"
)
perc_att <- ggplot(employee_data, aes(x = Attrition, y = PercentSalaryHike)) +
geom_boxplot(fill = wes_palette("GrandBudapest1", n = 2)) +
theme_dark() +
labs(y = "% Salary Hike"
)
years_att <- ggplot(employee_data, aes(x = Attrition, y = YearsSinceLastPromotion)) +
geom_boxplot(fill = wes_palette("GrandBudapest1", n = 2)) +
theme_dark() +
labs(y = "Yrs Since Promo"
)
numcomp_att <- ggplot(employee_data, aes(x = Attrition, y = NumCompaniesWorked)) +
geom_boxplot(fill = wes_palette("GrandBudapest1", n = 2)) +
theme_dark() +
labs(y = "# Companies Worked"
)
edu_att <- ggplot(employee_data, aes(x = Attrition, y = Education)) +
geom_boxplot(fill = wes_palette("GrandBudapest1", n = 2)) +
theme_dark() +
labs(y = "Education"
)
gender_bar<- ggplot(employee_data, aes(x = Gender, fill = Attrition)) +
geom_bar(position = "fill") +
theme_dark()
multiplot(hr_att, perc_att, years_att, numcomp_att, edu_att, gender_bar, cols = 2)
As we can see, the averages of each variable are similar regardless of attrition status. Let’s confirm this by running t.tests for each variable in relation to attrition.
Because these are neutral correlation variables, we hypothesize, that each p-value will be > 5% which claims insignificance.
hourrate_t <- t.test(corr_data$HourlyRate ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
percsalhike_t <- t.test(corr_data$PercentSalaryHike ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
yrslastpromo_t <- t.test(corr_data$YearsSinceLastPromotion ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
gender_t <- t.test(corr_data$gender ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
numcompworked_t <- t.test(corr_data$NumCompaniesWorked ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
edu_t <- t.test(corr_data$Education ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
kable1 <- tribble(
~name, ~p.value,
"Hourly Rate", hourrate_t$p.value,
"Percent Salary Hike", percsalhike_t$p.value,
"Years Since Last Promotion", yrslastpromo_t$p.value,
"Number of Companies Worked", numcompworked_t$p.value,
"Education", edu_t$p.value,
"Gender", gender_t$p.value
)
knitr::kable(kable1)
| name | p.value |
|---|---|
| Hourly Rate | 0.7913501 |
| Percent Salary Hike | 0.6144301 |
| Years Since Last Promotion | 0.1986513 |
| Number of Companies Worked | 0.1163340 |
| Education | 0.2241713 |
| Gender | 0.2542175 |
Hourly rate, percent salary hike, years since last promotion, number of companies worked, education, and gender all show no significant difference in those who did and did not leave their job. Which aligns with Pearson’s correlation test and our hypothesis!
In this section, we will evaluate the negative correlation within our data set in relation to attrition. According to our heat map, we have 8:
This indicates, that as each variable increases there will be less likelihood of attrition as shown below:
job_att <- ggplot(employee_data, aes(x = Attrition, y = JobSatisfaction)) +
geom_boxplot(fill = wes_palette("Darjeeling1", n = 2)) +
theme_dark() +
labs(y = "Job Satisfaction"
)
age_att <- ggplot(employee_data, aes(x = Attrition, y = Age)) +
geom_boxplot(fill = wes_palette("Darjeeling1", n = 2)) +
theme_dark() +
labs(y = "Age"
)
work_att <- ggplot(employee_data, aes(x = Attrition, y = WorkLifeBalance)) +
geom_boxplot(fill = wes_palette("GrandBudapest1", n = 2)) +
theme_dark() +
labs(y = "Work Life Balance"
)
worky_att <- ggplot(employee_data, aes(x = Attrition, y = TotalWorkingYears)) +
geom_boxplot(fill = wes_palette("Darjeeling1", n = 2)) +
theme_dark() +
labs(y = "Total Working Yrs"
)
yearscurr_att <- ggplot(employee_data, aes(x = Attrition, y = YearsInCurrentRole)) +
geom_boxplot(fill = wes_palette("Darjeeling1", n = 2)) +
theme_dark() +
labs(y = "Yrs in Curr Role"
)
yearscomp_att <- ggplot(employee_data, aes(x = Attrition, y = YearsAtCompany)) +
geom_boxplot(fill = wes_palette("Darjeeling1", n = 2)) +
theme_dark() +
labs(y = "Yrs at Company"
)
yearsmgr_att <- ggplot(employee_data, aes(x = Attrition, y = YearsWithCurrManager)) +
geom_boxplot(fill = wes_palette("Darjeeling1", n = 2)) +
theme_dark() +
labs(y = "Yrs w/ Curr Manager"
)
env_att <- ggplot(employee_data, aes(x = Attrition, y = EnvironmentSatisfaction)) +
geom_boxplot(fill = wes_palette("Darjeeling1", n = 2)) +
theme_dark() +
labs(y = "Env Satisfaction"
)
multiplot(job_att, age_att, work_att, worky_att, yearscurr_att, yearscomp_att, yearsmgr_att, env_att, cols = 3)
jobsat_t <- t.test(corr_data$JobSatisfaction ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
age_t <- t.test(corr_data$Age ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
wrklifebal_t <- t.test(corr_data$WorkLifeBalance ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
workyrs_t <- t.test(corr_data$TotalWorkingYears ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
yrsrole_t <- t.test(corr_data$YearsInCurrentRole ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
yrscomp_t <- t.test(corr_data$YearsAtCompany ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
yrsmgr_t <- t.test(corr_data$YearsWithCurrManager ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
envsat_t <- t.test(corr_data$EnvironmentSatisfaction ~ corr_data$attrition, mu = 0, alt = "two.sided", conf = 0.95, var.eq = FALSE, paired = FALSE)
kable2 <- tribble(
~name, ~p.value,
"Job Satisfaction", jobsat_t$p.value,
"Age", age_t$p.value,
"Work Life Balance", wrklifebal_t$p.value,
"Total Working Years", workyrs_t$p.value,
"Years In Current Role", yrsrole_t$p.value,
"Years at Company", yrscomp_t$p.value,
"Years with Current Mgr", yrsmgr_t$p.value,
"Environment Satisfaction", envsat_t$p.value
)
knitr::kable(kable2)
| name | p.value |
|---|---|
| Job Satisfaction | 0.0001052 |
| Age | 0.0000000 |
| Work Life Balance | 0.0304657 |
| Total Working Years | 0.0000000 |
| Years In Current Role | 0.0000000 |
| Years at Company | 0.0000002 |
| Years with Current Mgr | 0.0000000 |
| Environment Satisfaction | 0.0002092 |
All t.tests above show a significant difference between the average of each variable with respect to attrition.
To retain employees, they are more satisfied with their job, have had more tenure and working experience at the company.
They have also stayed under their current manager and are satisfied with their environment.
As shown in the table above, the p-value is the smallest for Years In Current Role, Total Working Years, and years with current Manager.
As promised in earlier sections, we will build a binomial generalized linear model to predict the likelihood of attrition depending on all variables explored within the data analysis thus far.
When analyzing the results of the model, the variables with significant p-values (less than 5%) and large effect sizes will be categorized as good predictors of attrition.
Below is our Logistic Linear Regression Model and the summary of each variable in relation to each other and how it corresponds to the probability of attrition:
predicted <- glm(attrition ~ ., family = "binomial", data = corr_data)
summary(predicted)
##
## Call:
## glm(formula = attrition ~ ., family = "binomial", data = corr_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7683 -0.5781 -0.3673 -0.1867 3.4194
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.177261 0.733859 2.967 0.003009 **
## Age -0.040006 0.012531 -3.193 0.001410 **
## DistanceFromHome 0.030448 0.009570 3.182 0.001465 **
## Education -0.004928 0.078363 -0.063 0.949853
## NumCompaniesWorked 0.146438 0.033706 4.345 1.40e-05 ***
## EnvironmentSatisfaction -0.369854 0.073564 -5.028 4.97e-07 ***
## gender 0.263308 0.164557 1.600 0.109575
## HourlyRate -0.002323 0.003969 -0.585 0.558363
## JobSatisfaction -0.341895 0.072292 -4.729 2.25e-06 ***
## PercentSalaryHike -0.014925 0.021932 -0.681 0.496163
## overtime 1.635352 0.165674 9.871 < 2e-16 ***
## TotalWorkingYears -0.080278 0.022430 -3.579 0.000345 ***
## WorkLifeBalance -0.260008 0.108545 -2.395 0.016603 *
## YearsAtCompany 0.068049 0.034250 1.987 0.046939 *
## YearsInCurrentRole -0.125399 0.040721 -3.079 0.002074 **
## YearsSinceLastPromotion 0.143608 0.037811 3.798 0.000146 ***
## YearsWithCurrManager -0.115703 0.041306 -2.801 0.005093 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1298.6 on 1469 degrees of freedom
## Residual deviance: 1038.1 on 1453 degrees of freedom
## AIC: 1072.1
##
## Number of Fisher Scoring iterations: 6
Based on the summary: Age, Distance from Home, Environment satisfaction, job satisfaction, overtime, total working years, work life balance, years in current role, years since last promotion, and years with current manager are all good predictors.
Top 3 Predictors
It is interesting to see that Numbers of Companies Worked and Years Since Last Promotion is statistically significant and are good predictors of attrition when it showed no average difference, noted by the t.test and heat map in the neutral correlation section.
It is also surprising to see that years at company is also not statistically significant variable and a good predictor of attrition despite the t.test and heat map (in the negative correlation section) stating there was a significant difference.
In this section, we will evaluate the correlation between the categorical variables within our data set in relation to attrition. Within this data set we have 5:
We want to explore if someone who travels rarely will have a different likelihood of attrition vs someone who travels frequently and so on.
To do this I have created a bar chart for each categorical variable and visualized the proportion of attrition for each level.
As shown below each categorical variable has a set of different levels:
Business Travel: Travel Rarely, Travel Freq, Non-Travel
Department: Sales, Research & Development, HR
Education Field: Life Sciences, Other, Medical, Marketing, Tech degree, HR
Job Role: Sales Exec, Research Scientist, Lab tech, manufacturing director, HC rep, Manager, Sales rep, research director, and HR
Marital Status: Single, Married, and Divorced
Below are the bar charts for each categorical variable in relation to attrition:
ggplot(employee_data, aes(x = BusinessTravel, fill = Attrition)) +
geom_bar(position = "fill") +
theme_dark() +
coord_flip() +
labs(y = "Proportion")
ggplot(employee_data, aes(x = Department, fill = Attrition)) +
geom_bar(position = "fill") +
theme_dark() +
coord_flip() +
labs(y = "Proportion")
ggplot(employee_data, aes(x = EducationField, fill = Attrition)) +
geom_bar(position = "fill") +
theme_dark() +
coord_flip() +
labs(y = "Proportion")
ggplot(employee_data, aes(x = JobRole, fill = Attrition)) +
geom_bar(position = "fill") +
theme_dark() +
coord_flip() +
labs(y = "Proportion")
Sales representatives have the highest attrition rate of all job roles in this data set, next are those who work in HR and as a lab technician
Sales executives and research scientists have lower attrition than the above but still have high attrition rates.
Manufacturing directors, managers, healthcare representatives, and research directors all have low attrition rates.
ggplot(employee_data, aes(x = MaritalStatus, fill = Attrition)) +
geom_bar(position = "fill") +
theme_dark() +
coord_flip() +
labs(y = "Proportion")
Keep in mind that these outcomes are a representation of this particular data set population but not the general population. In order to make a statistical inference on the general population, we would need a larger/diverse data set.
Within this section of the analysis, I wanted to dive deeper into the good predictors and analyze their relationship to one another.
We will facet wrap multiple variables together and analyze the relationship shown:
facet_data <- employee_data %>%
mutate(education = mapvalues(employee_data$Education,
from = c(1, 2, 3, 4, 5),
to = c("Below College", "College", "Bachelor", "Master", "Doctor")),
jobsatisfaction = mapvalues(employee_data$JobSatisfaction,
from = c(1, 2, 3, 4),
to = c("Low", "Medium", "High", "Very High")),
environmentsatisfaction = mapvalues(employee_data$EnvironmentSatisfaction,
from = c(1, 2, 3, 4),
to = c("Low", "Medium", "High", "Very High")),
worklifebalance = mapvalues(employee_data$WorkLifeBalance,
from = c(1, 2, 3, 4),
to = c("Low", "Medium", "High", "Very High"))
)
facet_data$jobsatisfaction_f <- factor(facet_data$jobsatisfaction, levels = c("Low", "Medium", "High", "Very High"))
facet_data$education_f <- factor(facet_data$education, levels = c("Below College", "College", "Bachelor", "Master", "Doctor"))
facet_data$environmentsatisfaction_f <- factor(facet_data$environmentsatisfaction, levels = c("Low", "Medium", "High", "Very High"))
facet_data$worklifebalance_f <- factor(facet_data$worklifebalance, levels = c("Low", "Medium", "High", "Very High"))
ggplot(facet_data, aes(x = YearsInCurrentRole, y = YearsWithCurrManager)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_grid(Attrition ~ jobsatisfaction_f)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
As seen in the above graph, years in current role and years with current manager has a positive linear relationship in relation to attrition and job satisfaction
Let’s look at each section separately by attrition:
Those without attrition: low job satisfaction shows a positive and then negative correlation (a peak) in years with current role and current manager.
Medium, High, and Very High job satisfaction shows a consistently positive relationship, with no noticeable decline/peak
Those with attrition: Low, medium, and high job satisfaction showed two declines in the relationship between years in current role and years with current manager.
The dips occur at different lengths of time, people with low satisfaction tend to have a decline sooner than those with medium and high job satisfaction.
As expected, those with very high job satisfaction but still left their job showed a positive correlation throughout. This could imply that individuals might have suddenly left for a better opportunity towards the end of their tenure which explains no dips throughout their time with the current manager/current role.
ggplot(facet_data, aes(x = TotalWorkingYears, y = YearsSinceLastPromotion)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_grid(Attrition ~ education_f)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
As seen in the above graph, total working years and years since last promotion, there are varying relationships in relation to attrition and education
Let’s look at each section separately by attrition:
Those without attrition: over the total years of work, there is an increase in years since last promotion for those with education levels of below college and a Bachelor’s.
We see that there is a dip in years since last promotion over total working years within education levels of college and master’s. This means that these individuals had a promotion within their tenure.
It is surprising to see that college students will receive a promotion within their jobs at a sooner rate than those with Bachelor’s. However, this can be explained by the type of work.
College students (less than a Bachelor’s degree) tend to have different types of part-time/full-time positions (food service, sales associate, etc) vs Bachelors (entry-level corporate roles, etc) which might lead to earlier promotions. Individuals with Bachelor Degrees that have entry-level positions normally require a certain amount of tenure/experience with the company prior to promotions.
Those with attrition: individuals with below college and college education (less than Bachelor’s) showed attrition after promotions (as noted by the dip). This can be explained by the type of jobs obtained with that particular education level.
As expected, those with a Bachelor’s and Master’s Degree who were not promoted for a long time over the total tenure have left their jobs.
Those with doctorates, showed attrition even after promotions, this could be due to suddenly finding a better opportunity with their tenure at the company.
In conclusion, the Pearson’s correlation heat map was very accurate in correspondence to the t.test results. Those that showed statistical significance showed either a positive or negative correlation with attrition and those that did not show statistical significance showed a neutral correlation with attrition. Although the accuracy was profound, we noticed varying results from our Linear Regression Model (which is a better representation for real-world application). The number of companies an individual worked at and years since their last promotion showed statistically significant positive relationships to attrition, although these two variables showed a neutral correlation in Pearson’s correlation heat map. Also, we found in our Linear Regression Model that the amount of years at a company is not statistically significant in comparison to the other variables used in this analysis.
There are many reasons why an individual chooses to leave their current role/company. However, throughout this analysis, we have generated possible focus points to improve retention (negative correlation section). Such as acquiring tenured professionals, both in total work experience and experience in their role, as well as maintaining a good employee to manager relationship. Lastly, we have also generated good predictors of attrition that, if focused on, can predict the likelihood of attrition within an individual. Such as, Overtime (more overtime workers have a higher likelihood of attrition), Environment satisfaction, and Job satisfaction (more satisfied the less likelihood of attrition).