chisq.test(hr$salary, hr$left)
##
## Pearson's Chi-squared test
##
## data: hr$salary and hr$left
## X-squared = 381.23, df = 2, p-value < 2.2e-16
The p-value is very small, so we reject the null hypothesis that salary level and employee attrition are independent. The probability of observing this association by random chance alone is essentially zero, indicating a statistically significant relationship between salary and whether an employee left the company.
Employees with low salaries are far more likely to leave the company than those earning medium or high salaries.
prop_salary <- hr %>%
group_by(salary) %>%
summarise(
Stayed = sum(left == 0) / n(),
Left = sum(left == 1) / n()
) %>%
mutate(salary = factor(salary, levels = c("low", "medium", "high")))
plot_ly(prop_salary) %>%
add_bars(x = ~salary, y = ~Stayed, name = "Stayed",
marker = list(color = "#2196F3")) %>%
add_bars(x = ~salary, y = ~Left, name = "Left",
marker = list(color = "#F44336")) %>%
layout(
barmode = "stack",
xaxis = list(title = "Salary Level"),
yaxis = list(title = "Proportion", tickformat = ",.0%"),
title = "Employees with low salaries are far more likely to leave the company",
legend = list(orientation = "h", x = 0.3, y = -0.2)
)
chisq.test(hr$Work_accident, hr$left)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: hr$Work_accident and hr$left
## X-squared = 357.56, df = 1, p-value < 2.2e-16
The p-value is far below 0.05, so we reject the null hypothesis that work accidents and employee attrition are independent. There is a statistically significant association between whether an employee experienced a workplace accident and whether they chose to leave the company.
Employees who have been involved in a workplace accident are actually less likely to leave the company.
prop_accident <- hr %>%
group_by(Work_accident) %>%
summarise(
Stayed = sum(left == 0) / n(),
Left = sum(left == 1) / n()
) %>%
mutate(Work_accident = factor(Work_accident, levels = c(0, 1),
labels = c("No Accident", "Had Accident")))
plot_ly(prop_accident) %>%
add_bars(x = ~Work_accident, y = ~Stayed, name = "Stayed",
marker = list(color = "#2196F3")) %>%
add_bars(x = ~Work_accident, y = ~Left, name = "Left",
marker = list(color = "#F44336")) %>%
layout(
barmode = "stack",
xaxis = list(title = "Work Accident Status"),
yaxis = list(title = "Proportion", tickformat = ",.0%"),
title = "Employees who had a workplace accident are less likely to leave",
legend = list(orientation = "h", x = 0.3, y = -0.2)
)
chisq.test(hr$promotion_last_5years, hr$left)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: hr$promotion_last_5years and hr$left
## X-squared = 56.262, df = 1, p-value = 6.344e-14
The p-value below the 0.05 significance threshold, so we reject the null hypothesis that promotions and employee attrition are independent. The data provides strong statistical evidence that receiving (or not receiving) a promotion within the last five years is significantly associated with whether an employee leaves the company.
Employees who were not promoted in the last 5 years are significantly more likely to leave the company.
prop_promo <- hr %>%
group_by(promotion_last_5years) %>%
summarise(
Stayed = sum(left == 0) / n(),
Left = sum(left == 1) / n()
) %>%
mutate(promotion_last_5years = factor(promotion_last_5years, levels = c(0, 1),
labels = c("No Promotion", "Promoted")))
plot_ly(prop_promo) %>%
add_bars(x = ~promotion_last_5years, y = ~Stayed, name = "Stayed",
marker = list(color = "#2196F3")) %>%
add_bars(x = ~promotion_last_5years, y = ~Left, name = "Left",
marker = list(color = "#F44336")) %>%
layout(
barmode = "stack",
xaxis = list(title = "Promotion in Last 5 Years"),
yaxis = list(title = "Proportion", tickformat = ",.0%"),
title = "Employees not promoted in the last 5 years are more likely to leave",
legend = list(orientation = "h", x = 0.3, y = -0.2)
)
chisq.test(hr$Department, hr$left)
##
## Pearson's Chi-squared test
##
## data: hr$Department and hr$left
## X-squared = 86.825, df = 9, p-value = 7.042e-15
The p-value is below 0.05, so we reject the null hypothesis that department and attrition are independent. There is a statistically significant relationship between the department an employee works in and whether they leave the company. Different departments have meaningfully different retention rates.
The HR and Accounting departments have the highest employee turnover, while Management has the lowest.
prop_dept <- hr %>%
group_by(Department) %>%
summarise(
Stayed = sum(left == 0) / n(),
Left = sum(left == 1) / n(),
attrition_rate = sum(left == 1) / n()
) %>%
arrange(desc(attrition_rate))
plot_ly(prop_dept) %>%
add_bars(x = ~reorder(Department, -attrition_rate), y = ~Stayed,
name = "Stayed", marker = list(color = "#2196F3")) %>%
add_bars(x = ~reorder(Department, -attrition_rate), y = ~Left,
name = "Left", marker = list(color = "#F44336")) %>%
layout(
barmode = "stack",
xaxis = list(title = "Department", tickangle = -30),
yaxis = list(title = "Proportion", tickformat = ",.0%"),
title = "HR and Accounting have the highest turnover; Management retains employees best",
legend = list(orientation = "h", x = 0.3, y = -0.3)
)