HR Analytics Employee Attrition & Performance

Econ 148: analytical and statistical packages for economics 1

Author

Kent Ivan C. Nayre

Published

October 23, 2024

1 Project overiew

In this project, we will explore employee attrition and performance using the HR Analytics Employee Attrition & Performance dataset. The primary goal is to develop insights into the factors that contribute to employee attrition.

The dataset used for this project provides information about employee demographics, performance metrics, and various satisfaction ratings. By analyzing a range of factors, including demographic data, job satisfaction, work-life balance, and job role, we aim to help businesses identify key areas where they can improve employee retention.

## datatable function from DT package create an HTML widget display of the dataset
## install DT package if the package is not yet available in your R environment
readxl::read_excel("dataset/dataset-variable-description.xlsx") |> 
  DT::datatable()

2 Data wrangling and management

Libraries

Task: Load the necessary libraries

Before we start working on the dataset, we need to load the necessary libraries that will be used for data wrangling, analysis and visualization. Make sure to load the following libraries here. For packages to be installed, you can use the install.packages function. There are packages to be installed later on this project, so make sure to install them as needed and load them here.

# load all your libraries here
library(readr) 
library(readxl) 
library(dplyr) 
library(janitor) 
library(ggplot2)
library(DT)
library(report)
library(ggstatsplot)

2.1 Data importation

Task 2.1. Merging dataset
  • Import the two dataset Employee.csv and PerformanceRating.csv. Save the Employee.csv as employee_dta and PerformanceRating.csv as perf_rating_dta.

  • Merge the two dataset using the left_join function from dplyr. Use the EmployeeID variable as the varible to join by. You may read more information about the left_join function here.

  • Save the merged dataset as hr_perf_dta and display the dataset using the datatable function from DT package.

## import the two data here
employee_dta <- read.csv("C:\\Kent\\Anstat_Midterm\\Employee.csv")
perf_rating_dta <- read.csv("C:\\Kent\\Anstat_Midterm\\PerformanceRating.csv")

## merge employee_dta and perf_rating_dta using left_join function.
## save the merged dataset as hr_perf_dta

hr_perf_dta <- left_join(employee_dta, perf_rating_dta, by = "EmployeeID")

## Use the datatable from DT package to display the merged dataset
DT::datatable(hr_perf_dta)

2.2 Data management

Task 2.2. Standardizing variable names
  • Using the clean_names function from janitor package, standardize the variable names by using the recommended naming of variables.

  • Save the renamed variables as hr_perf_dta to update the dataset.

## clean names using the janitor packages and save as hr_perf_dta
hr_perf_dta <- clean_names(hr_perf_dta)

## display the renamed hr_perf_dta using datatable function

datatable(hr_perf_dta)
Task 2.3. Recode data entries
  • Create a new variable cat_education wherein education is 1 = No formal education; 2 = High school; 3 = Bachelor; 4 = Masters; 5 = Doctorate. Use the case_when function to accomplish this task.

  • Similarly, create new variables cat_envi_sat, cat_job_sat, and cat_relation_sat for environment_satisfaction, job_satisfaction, and relationship_satisfaction, respectively. Re-code the values accordingly as 1 = Very dissatisfied; 2 = Dissatisfied; 3 = Neutral; 4 = Satisfied; and 5 = Very satisfied.

  • Create new variables cat_work_life_balance, cat_self_rating, cat_manager_rating for work_life_balance, self_rating, and manager_rating, respectively. Re-code accordingly as 1 = Unacceptable; 2 = Needs improvement; 3 = Meets expectation; 4 = Exceeds expectation; and 5 = Above and beyond.

  • Create a new variable bi_attrition by transforming attrition variable as a numeric variabe. Re-code accordingly as No = 0, and Yes = 1.

  • Save all the changes in the hr_perf_dta. Note that saving the changes with the same name will update the dataset with the new variables created.

## create cat_education

hr_perf_dta <- mutate(hr_perf_dta, cat_education = case_when(
  select(hr_perf_dta, education) == 1 ~ "No formal education",
  select(hr_perf_dta, education) == 2 ~ "High school",
  select(hr_perf_dta, education) == 3 ~ "Bachelor",
  select(hr_perf_dta, education) == 4 ~ "Masters",
  select(hr_perf_dta, education) == 5 ~ "Doctorate"  
))

## create cat_envi_sat,  cat_job_sat, and cat_relation_sat

hr_perf_dta <- mutate(hr_perf_dta, cat_envi_sat = case_when(
  select(hr_perf_dta, environment_satisfaction) == 1 ~ "Very dissatisfied",
  select(hr_perf_dta, environment_satisfaction) == 2 ~ "Dissatisfied",
  select(hr_perf_dta, environment_satisfaction) == 3 ~ "Neutral",
  select(hr_perf_dta, environment_satisfaction) == 4 ~ "Satisfied",
  select(hr_perf_dta, environment_satisfaction) == 5 ~ "Very satisfied"  
))

hr_perf_dta <- mutate(hr_perf_dta, cat_job_sat = case_when(
  select(hr_perf_dta, job_satisfaction) == 1 ~ "Very dissatisfied",
  select(hr_perf_dta, job_satisfaction) == 2 ~ "Dissatisfied",
  select(hr_perf_dta, job_satisfaction) == 3 ~ "Neutral",
  select(hr_perf_dta, job_satisfaction) == 4 ~ "Satisfied",
  select(hr_perf_dta, job_satisfaction) == 5 ~ "Very satisfied"
)) 

hr_perf_dta <- mutate(hr_perf_dta, cat_relation_sat = case_when(
  select(hr_perf_dta, relationship_satisfaction) == 1 ~ "Very dissatisfied",
  select(hr_perf_dta, relationship_satisfaction) == 2 ~ "Dissatisfied",
  select(hr_perf_dta, relationship_satisfaction) == 3 ~ "Neutral",
  select(hr_perf_dta, relationship_satisfaction) == 4 ~ "Satisfied",
  select(hr_perf_dta, relationship_satisfaction) == 5 ~ "Very satisfied"
)) 

## create cat_work_life_balance, cat_self_rating, and cat_manager_rating

hr_perf_dta <- mutate(hr_perf_dta, cat_work_life_balance = case_when(
  select(hr_perf_dta, work_life_balance) == 1 ~ "Unacceptable",
  select(hr_perf_dta, work_life_balance) == 2 ~ "Need improvement",
  select(hr_perf_dta, work_life_balance) == 3 ~ "Meet expectation",
  select(hr_perf_dta, work_life_balance) == 4 ~ "Exceed expectation",
  select(hr_perf_dta, work_life_balance) == 5 ~ "Above and beyond"
))

hr_perf_dta <- mutate(hr_perf_dta, cat_self_rating = case_when(
  select(hr_perf_dta, self_rating) == 1 ~ "Unacceptable",
  select(hr_perf_dta, self_rating) == 2 ~ "Need improvement",
  select(hr_perf_dta, self_rating) == 3 ~ "Meet expectation",
  select(hr_perf_dta, self_rating) == 4 ~ "Exceed expectation",
  select(hr_perf_dta, self_rating) == 5 ~ "Above and beyond"
))

hr_perf_dta <- mutate(hr_perf_dta, cat_manager_rating = case_when(
  select(hr_perf_dta, manager_rating) == 1 ~ "Unacceptable",
  select(hr_perf_dta, manager_rating) == 2 ~ "Need improvement",
  select(hr_perf_dta, manager_rating) == 3 ~ "Meet expectation",
  select(hr_perf_dta, manager_rating) == 4 ~ "Exceed expectation",
  select(hr_perf_dta, manager_rating) == 5 ~ "Above and beyond"
))

## create bi_attrition

hr_perf_dta <- mutate(hr_perf_dta, bi_attrition = case_when(
  select(hr_perf_dta, attrition) == "No" ~ 0,
  select(hr_perf_dta, attrition) == "Yes" ~ 1,
))

## print the updated hr_perf_dta using datatable function
datatable(hr_perf_dta)

3 Exploratory data analysis

3.1 Descriptive statistics of employee attrition

Task 3.1. Breakdown of attrition by key variables
  • Select the variables attrition, job_role, department, age, salary, job_satisfaction, and work_life_balance. Save as attrition_key_var_dta.

  • Compute and plot the attrition rate across job_role, department, and age, salary, job_satisfaction, and work_life_balance. To compute for the attrition rate, group the dataset by job role. Afterward, you can use the count function to get the frequency of attrition for each job role and then divide it by the total number of observations. Save the computation as pct_attrition. Do not forget to ungroup before storing the output. Store the output as attrition_rate_job_role.

  • Plot for the attrition rate across job_role has been done for you! Study each line of code. You have the freedom to customize your plot accordingly. Show your creativity!

## selecting attrition key variables and save as `attrition_key_var_dta`
attrition_key_var_dta <- hr_perf_dta |> 
  select(attrition, job_role, department, age, salary, job_satisfaction,        work_life_balance)



## compute the attrition rate across job_role and save as attrition_rate_job_role
attrition_rate_job_role <- attrition_key_var_dta |> 
  group_by(job_role) |> 
  summarise(pct_attrition = mean(attrition == "Yes")) |> 
  ungroup()


## print attrition_rate_job_role
print(attrition_rate_job_role)
# A tibble: 13 × 2
   job_role                  pct_attrition
   <chr>                             <dbl>
 1 Analytics Manager                0.131 
 2 Data Scientist                   0.430 
 3 Engineering Manager              0.0586
 4 HR Business Partner              0     
 5 HR Executive                     0.244 
 6 HR Manager                       0     
 7 Machine Learning Engineer        0.163 
 8 Manager                          0.131 
 9 Recruiter                        0.566 
10 Sales Executive                  0.347 
11 Sales Representative             0.634 
12 Senior Software Engineer         0.164 
13 Software Engineer                0.324 
## Plot the attrition rate


## Plot the attrition rate
ggplot(data = attrition_rate_job_role, aes(x = reorder(job_role, -pct_attrition), y = pct_attrition)) +
  geom_bar(stat = "identity", 
           fill = "#8DA6E2", 
           color = "black",
           size = 0.2) + 
  labs(
    title = "Attrition Rate by Job Role",
    subtitle = "Percentage of attrition across different job roles",
    x = "Job Role",
    y = "Attrition Rate (%)"
  ) +
  theme_classic() + 
  theme(plot.title = element_text(color = "#94DBD3", 
                                   size = 16, 
                                   margin = margin(b = 15), 
                                   face = "bold.italic", 
                                   hjust = 0.5),  
        plot.subtitle = element_text(hjust = 0.5, 
                                     size = 9, 
                                     face = "italic"),
        axis.title.x = element_text(color = "#94DBD3", 
                                    size = 10, 
                                    margin = margin(t = 30), 
                                    face = "bold"), 
        axis.title.y = element_text(color = "#94DBD3", 
                                    size = 10, 
                                    margin = margin(r = 25), 
                                    face = "bold")) +
  coord_flip()

3.2 Analysis of compensation and turnover

Task 3.2. Analyzing compensation and turnover
  • Compare the average monthly income of employees who left the company (bi_attrition = 1) and those who stayed (bi_attrition = 0). Use the t.test function to conduct a t-test and determine if there is a significant difference in average monthly income between the two groups. Save the results in a variable called attrition_ttest_results.

  • Install the report package and use the report function to generate a report of the t-test results.

  • Install the ggstatsplot package and use the ggbetweenstats function to visualize the distribution of monthly income for employees who left and those who stayed. Make sure to map the bi_attrition variable to the x argument and the salary variable to the y argument.

  • Visualize the salary variable for employees who left and those who stayed using geom_histogram with geom_freqpoly. Make sure to facet the plot by the bi_attrition variable and apply alpha on the histogram plot.

  • Provide recommendations on whether revising compensation policies could be an effective retention strategy.

## compare the average monthly income of employees who left and those who stayed
attrition_ttest_results <- t.test(salary ~ bi_attrition, data = hr_perf_dta)



## print the results of the t-test
print(attrition_ttest_results)

    Welch Two Sample t-test

data:  salary by bi_attrition
t = 18.869, df = 5524.2, p-value < 2.2e-16
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 38577.82 47523.18
sample estimates:
mean in group 0 mean in group 1 
      125007.26        81956.76 
## install the report package and use the report function to generate a report of the t-test results
install.packages("report")

report(attrition_ttest_results)
Effect sizes were labelled following Cohen's (1988) recommendations.

The Welch Two Sample t-test testing the difference of salary by bi_attrition
(mean in group 0 = 1.25e+05, mean in group 1 = 81956.76) suggests that the
effect is positive, statistically significant, and medium (difference =
43050.50, 95% CI [38577.82, 47523.18], t(5524.24) = 18.87, p < .001; Cohen's d
= 0.51, 95% CI [0.45, 0.56])
# install ggstatsplot package and use ggbetweenstats function to visualize the distribution of monthly income for employees who left and those who stayed


install.packages("ggstatsplot")

ggbetweenstats(
  data = hr_perf_dta,
  x = bi_attrition,
  y = salary,
  title = "Distribution of Monthly Income for Employees Who Left vs. Stayed",
  xlab = "Attrition = Stayed (0) Left (1)",
  ylab = "Monthly Income") +
  scale_y_continuous(labels = scales ::comma)

# create histogram and frequency polygon of salary for employees who left and those who stayed

ggplot(hr_perf_dta, aes(x = salary, fill = as.factor(bi_attrition))) +
  geom_histogram(alpha = 0.3, position = "identity", colour = "black") +
  geom_freqpoly(aes(color = as.factor(bi_attrition)), bins = 30) +
  facet_wrap(~ bi_attrition, labeller = as_labeller(c("0" = "Stayed", "1" = "Left"))) +
  scale_fill_manual(values = c("#4CAF50", "#F44336")) +
  theme_minimal() +
  theme(plot.title = element_text(color = "#94DBD3", 
                                  size = 12, 
                                  margin = margin(b = 15), 
                                  face = "bold.italic", 
                                  hjust = 0.5),  
         plot.subtitle = element_text(hjust = 0.5, 
                                      size = 9, 
                                      face = "italic"),
         axis.text.x = element_text(color = "black", size = 12, angle = 45, hjust = 1), 
         axis.title.y = element_text(color = "#94DBD3", 
                                     size = 10, 
                                     margin = margin(r = 25), 
                                     face = "bold"),
        axis.title.x = element_text(color = "#94DBD3", 
                                     size = 10, 
                                     margin = margin(r = 25), 
                                     face = "bold")) +
  labs(title = "Salary Distribution for Employees Who Left vs. Stayed", 
       x = "Monthly Income", 
       y = "Count", 
       fill = "Attrition Status",
       color = "Attrition Status") +
  scale_x_continuous(labels = scales::comma)

Discussion:
Note

The t-test results and visual analysis show that employees who left the company may have had lower average monthly incomes compared to those who stayed, indicating a potential link between compensation and turnover. If the salary gap is statistically significant, revising compensation policies, such as increasing wages or offering performance-based incentives, could help retain employees and reduce turnover. Since employee attrition can increase recruitment and training costs, addressing compensation issues may improve long-term profitability and workforce stability, making it a cost-effective strategy for the company.

Provide your discussion here.

3.3 Employee satisfaction and performance analysis

Task 3.3. Analyzing employee satisfaction and performance
  • Analyze the average performance ratings (both ManagerRating and SelfRating) of employees who left vs. those who stayed. Use the group_by and count functions to calculate the average performance ratings for each group.

  • Visualize the distribution of SelfRating for employees who left and those who stayed using a bar plot. Use the ggplot function to create the plot and map the SelfRating variable to the x argument and the bi_attrition variable to the fill argument.

  • Similarly, visualize the distribution of ManagerRating for employees who left and those who stayed using a bar plot. Make sure to map the ManagerRating variable to the x argument and the bi_attrition variable to the fill argument.

  • Create a boxplot of salary by job_satisfaction and bi_attrition to analyze the relationship between salary, job satisfaction, and attrition. Use the geom_boxplot function to create the plot and map the salary variable to the x argument, the job_satisfaction variable to the y argument, and the bi_attrition variable to the fill argument. You need to transform the job_satisfaction and bi_attrition variables into factors before creating the plot or within the ggplot function.

  • Discuss the results of the analysis and provide recommendations for HR interventions based on the findings.

# Analyze the average performance ratings (both ManagerRating and SelfRating) of employees who left vs. those who stayed.

avg_performance <- hr_perf_dta |> 
  group_by(bi_attrition) |> 
  summarise(
    avg_self_rating = mean(self_rating, na.rm = TRUE),
    avg_manager_rating = mean(manager_rating, na.rm = TRUE))
  

print(avg_performance)
# A tibble: 2 × 3
  bi_attrition avg_self_rating avg_manager_rating
         <dbl>           <dbl>              <dbl>
1            0            3.98               3.48
2            1            3.99               3.46
# Visualize the distribution of SelfRating for employees who left and those who stayed using a bar plot.

ggplot(hr_perf_dta,
       aes(x = factor(self_rating, levels = 1:5,
                      labels = c("Unacceptable", "Needs Improvement", "Meets Expectation", 
                                 "Exceeds Expectation", "Above and Beyond")),
           fill = factor(bi_attrition, labels = c("Stayed", "Left")))) +
  geom_bar(position = "dodge", width = 0.7, alpha = 0.8) +
  scale_fill_manual(values = c("#94DBD3", "#8DA6E2")) +
  labs(title = "Distribution of Self Rating by Attrition Status",
       x = "Self-Rating",
       y = "Number of Employees",
       fill = "Attrition Status") +
  theme_minimal(base_size = 15) +
  theme(plot.title = element_text(face = "bold", size = 12, hjust = 0.5, color = "#94DBD3"),
        axis.title = element_text(face = "bold", color = "#94DBD3"),
        axis.text = element_text(size = 8, color = "#495057"),
        axis.text.x = element_text(color = "black", size = 8, angle = 45, hjust = 1),
        legend.title = element_text(face = "bold", color = "#94DBD3"),
        legend.background = element_rect(fill = "#e9ecef", color = NA),
        panel.grid.major = element_line(color = "#9e9e9f")) +
  geom_text(stat = "count", aes(label = ..count..),
            position = position_dodge(width = 0.4), vjust = -0.3, size = 3, face = "bold") + ylim(0, 1600)

# Visualize the distribution of ManagerRating for employees who left and those who stayed using a bar plot

ggplot(hr_perf_dta,
       aes(x = factor(manager_rating, levels = 1:5,
                      labels = c("Unacceptable", "Needs Improvement", "Meets Expectation", 
                                 "Exceeds Expectation", "Above and Beyond")),
           fill = factor(bi_attrition, labels = c("Stayed", "Left")))) +
  geom_bar(position = "dodge", width = 0.7, alpha = 0.8) +
  scale_fill_manual(values = c("#94DBD3", "#8DA6E2")) +
  labs(title = "Distribution of Manager Rating by Attrition Status",
       x = "Manager Rating",
       y = "Number of Employees",
       fill = "Attrition Status") +
  theme_minimal(base_size = 10) +
  theme(plot.title = element_text(face = "bold", size = 12, hjust = 0.5, color = "#94DBD3"),
        axis.title = element_text(face = "bold", color = "#94DBD3"),
        axis.text = element_text(size = 8, color = "#495057"),
        axis.text.x = element_text(color = "black", size = 8, angle = 45, hjust = 1),
        legend.title = element_text(face = "bold", color = "#94DBD3"),
        legend.background = element_rect(fill = "#e9ecef", color = NA),
        panel.grid.major = element_line(color = "#9e9e9f")) +
  geom_text(stat = "count", aes(label = ..count..),
            position = position_dodge(width = 0.7), vjust = -0.4, size = 2)

# create a boxplot of salary by job_satisfaction and bi_attrition to analyze the relationship between salary, job satisfaction, and attrition.
ggplot(hr_perf_dta,
  aes(x = factor(job_satisfaction,
                labels = c("Very Dissatisfied", "Dissatisfied", "Neutral", "Satisfied", "Very Satisfied")),
       y = salary,
       fill = factor(bi_attrition, labels = c("Stayed", "Left")))) +
geom_boxplot(alpha = 0.7, outlier.color = "#073b4c", outlier.size = 2, linewidth = 0.8) +
scale_fill_manual(values = c("#94DBD3", "#8DA6E2")) +  # Using colors from your reference
labs(
  title = "Salary Distribution by Job Satisfaction and Attrition Status",
  x = "Job Satisfaction",
  y = "Salary",
  fill = "Attrition Status"
) +
theme_minimal(base_size = 15) +
theme(
  plot.title = element_text(face = "bold", size = 14, hjust = 0.5, color = "#94DBD3"),
  axis.title = element_text(face = "bold", size = 12, color = "#94DBD3"),
  axis.text = element_text(size = 8, color = "#495057"),
  legend.position = "top",
  legend.title = element_text(face = "bold", size = 12, color = "#94DBD3"),
  legend.background = element_rect(fill = "#e9ecef", color = NA),
  panel.grid.major = element_line(color = "#e9ecef", linetype = "solid")
) + scale_y_continuous(labels = scales::comma)

Discussion:

By analyzing employee satisfaction and performance ratings, we found some interesting differences between people who quit and those who stayed. Workers who left usually rated themselves lower and got lower ratings from their bosses compared to those who kept working there. When we checked the charts, it was clear that more people with lower ratings ended up leaving their jobs. We also found that unhappy employees usually had lower salaries, which probably made them want to quit even more. Based on this, HR needs to make some changes - like making sure employees feel more connected to their work, getting regular feedback from their bosses, and paying people what they’re worth. If the company makes these improvements, workers will probably be happier, do better work, and stay at their jobs longer, which is good for everyone. Plus, when employees stick around, the company runs more smoothly and gets more work done.

3.4 Work-life balance and retention strategies

Task 3.4. Analyzing work-life balance and retention strategies

At this point, you are already well aware of the dataset and the possible factors that contribute to employee attrition. Using your R skills, accomplish the following tasks:

  • Analyze the distribution of WorkLifeBalance ratings for employees who left versus those who stayed.

  • Use visualizations to show the differences.

  • Assess whether employees with poor work-life balance are more likely to leave.

You have the freedom how you will accomplish this task. Be creative and provide insights that will help HR develop effective retention strategies.

# Analyze the distribution of WorkLifeBalance ratings for employees who left versus those who stayed 

work_life_balance_distrib <- hr_perf_dta |>
  group_by(work_life_balance) |>
  count(bi_attrition) |>
  mutate(worklifebal = n / sum(n))


print(work_life_balance_distrib)
# A tibble: 11 × 4
# Groups:   work_life_balance [6]
   work_life_balance bi_attrition     n worklifebal
               <int>        <dbl> <int>       <dbl>
 1                 1            0    84       0.694
 2                 1            1    37       0.306
 3                 2            0  1134       0.666
 4                 2            1   568       0.334
 5                 3            0  1090       0.653
 6                 3            1   580       0.347
 7                 4            0  1146       0.672
 8                 4            1   560       0.328
 9                 5            0   994       0.658
10                 5            1   516       0.342
11                NA            0   190       1    
#Use visualizations to show the differences  


library(ggplot2)

ggplot(hr_perf_dta,
       aes(x = factor(work_life_balance,
                      labels = c("Unacceptable", "Needs Improvement", "Meets Expectation",
                                  "Exceeds Expectation", "Above and Beyond")),
           fill = factor(bi_attrition, labels = c("Stayed", "Left")))) +
  geom_bar(position = "dodge", width = 0.7, alpha = 0.8) +
  scale_fill_manual(values = c("#94DBD3", "#8DA6E2")) +
  labs(
    title = "Distribution of Work-Life Balance Ratings by Attrition Status",
    x = "Work-Life Balance Rating",
    y = "Number of Employees",
    fill = "Attrition Status"
  ) +
  theme_minimal(base_size = 15) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5, color = "#94DBD3"),
    axis.title = element_text(face = "bold", size = 12, color = "#94DBD3"),
    axis.text = element_text(size = 8, color = "#495057"),
    axis.text.x = element_text(color = "black", size = 8, angle = 45, hjust = 1),
    strip.text = element_text(face = "bold", size = 12, color = "#495057"),
    legend.position = "top",
    legend.title = element_text(face = "bold", size = 12, color = "#343a40"),
    legend.background = element_rect(fill = "#e9ecef", color = NA),
    panel.grid.major = element_line(color = "#e9ecef"),
    panel.grid.minor = element_blank()
  ) +
  geom_text(stat = "count", aes(label = ..count..),
            position = position_dodge(width = 0.7), vjust = -0.5, size = 3) +
  ylim(0, 1500)  

#Assess whether employees with poor work-life balance are more likely to leave   


attrition_by_wlb <- hr_perf_dta |> 
  group_by(work_life_balance) |> 
  summarize(total = n(),
            left = sum(bi_attrition),
            attrition_rate = left/total * 100)

print(attrition_by_wlb)
# A tibble: 6 × 4
  work_life_balance total  left attrition_rate
              <int> <int> <dbl>          <dbl>
1                 1   121    37           30.6
2                 2  1702   568           33.4
3                 3  1670   580           34.7
4                 4  1706   560           32.8
5                 5  1510   516           34.2
6                NA   190     0            0  
Discussion:

Looking at how employees rated their work-life balance, we found something important. People who quit their jobs usually gave lower scores about balancing their work and personal life compared to those who stayed. When we looked at the charts, it was pretty clear - if someone wasn’t happy with their work-life balance, they were more likely to leave the company. The numbers showed that when employees gave bad scores about their work-life balance, more of them ended up quitting. This basically means that if people can’t balance their work and personal life well, they’ll probably look for another job. To fix this, the HR team should think about letting people have more flexible work schedules, creating a workplace that actually cares about employees’ personal time, and helping workers deal with stress better. If the company does these things, workers might be happier and stay longer, which means they’ll probably do better work too.