R Markdown
library(readr)
library(plotly)
library(dplyr)
library(ggplot2)
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
1. Department vs. Attrition
chisq.test(hr$Department , hr$left)
##
## Pearson's Chi-squared test
##
## data: hr$Department and hr$left
## X-squared = 86.825, df = 9, p-value = 7.042e-15
- Technical: the p-value of 7.042e-15 is incredibly small meaning
that there is very small probability that the results are random. Based
on the chi-squared test results, the likelihood of an employee leaving
the company is significantly somewhat influenced by the department they
work in.
- Non-technical: Employees decision ot leave was sometimes becasue
of their department.
dept_data <- hr %>%
group_by(Department) %>%
summarise(
stayed = sum(left == 0) / n(),
left = sum(left == 1) / n()
)
plot_ly(dept_data) %>%
add_bars(x = ~Department, y = ~stayed, name = "Stayed",
marker = list(color = "#1f77b4")) %>%
add_bars(x = ~Department, y = ~left, name = "Left",
marker = list(color = "#ff7f0e")) %>%
layout(
barmode = "stack",
xaxis = list(
title = "Department",
tickvals = unique(dept_data$Department), # Ensure each department has a tick value
ticktext = unique(dept_data$Department) # Use the department names as labels
),
yaxis = list(title = "Proportion", tickformat = ",.0%"),
title = "Department Has an Effect on Attrition"
)
2. Salary vs. Attrition
chisq.test(hr$salary, hr$left)
##
## Pearson's Chi-squared test
##
## data: hr$salary and hr$left
## X-squared = 381.23, df = 2, p-value < 2.2e-16
- Technical: The p-value of 2.2e-16 is very small indicating that
there is a very small probability that the results are random. The
chi-squared value of 381.23 indicates that there is a significant
difference between the observed and expected frequencies, suggesting
that the variables salary and left are not independent.
- Non-technical: Employees with certain salary levels are more
likely to leave, while others are more likely to stay.
salary_data <- hr %>%
group_by(salary) %>%
summarise(
stayed = sum(left == 0) / n(),
left = sum(left == 1) / n()
)
plot_ly(salary_data) %>%
add_bars(x = ~salary, y = ~stayed, name = "Stayed",
marker = list(color = "#1f77b4")) %>%
add_bars(x = ~salary, y = ~left, name = "Left",
marker = list(color = "#ff7f0e")) %>%
layout(
barmode = "stack",
xaxis = list(
title = "Salary",
tickvals = c(0, 1, 2),
ticktext = c("Low Salary", "Medium Salary", "High Salary")
),
yaxis = list(title = "Proportion", tickformat = ",.0%"),
title = "Low Salary Employees Leave More Than Others"
)
4. Work Accident vs. Attrition
chisq.test(hr$Work_accident, hr$left)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: hr$Work_accident and hr$left
## X-squared = 357.56, df = 1, p-value < 2.2e-16
- Technical: the p value of < 2.2e-16 is very very small meaning
that the results are very unlikely to have occurred by chance. The
chi-squared test results of such a large number, 357.56, indicates that
the likelihood of leaving the company is strongly influenced by whether
an employee had a work accident.
- Non-technical: employees who have had a work accident are much
more likely to leave the company.
accident_data <- hr %>%
group_by(Work_accident) %>%
summarise(
stayed = sum(left == 0) / n(),
left = sum(left == 1) / n()
)
plot_ly(accident_data) %>%
add_bars(x = ~Work_accident, y = ~stayed, name = "Stayed",
marker = list(color = "#1f77b4")) %>%
add_bars(x = ~Work_accident, y = ~left, name = "Left",
marker = list(color = "#ff7f0e")) %>%
layout(
barmode = "stack",
xaxis = list(
title = "Accidents at Work",
tickvals = c(0, 1),
ticktext = c("Accident", "No Accident")
),
yaxis = list(title = "Proportion", tickformat = ",.0%"),
title = "Accidents tend to affect attrition"
)