library(readr)

hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Test 1

t.test(hr$average_montly_hours ~ hr$left)
## 
##  Welch Two Sample t-test
## 
## data:  hr$average_montly_hours by hr$left
## t = -7.5323, df = 4875.1, p-value = 5.907e-14
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -10.534631  -6.183384
## sample estimates:
## mean in group 0 mean in group 1 
##        199.0602        207.4192
plot_data <- hr %>% 
  mutate(left = as.factor(ifelse(left == 1 , '1' , '0')))

plot_ly(plot_data , 
        x = ~average_montly_hours ,
        y = ~left ,
        type = 'box')

The p-value is very small, therefore the difference between average monthly hours and whether or not an employee has left is significant.

The more average monthly hours an employee works, the more likely they are to leave.

Test 2

t.test(hr$satisfaction_level ~ hr$left)
## 
##  Welch Two Sample t-test
## 
## data:  hr$satisfaction_level by hr$left
## t = 46.636, df = 5167, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.2171815 0.2362417
## sample estimates:
## mean in group 0 mean in group 1 
##       0.6668096       0.4400980
plot_data <- hr %>% 
  mutate(left = as.factor(ifelse(left == 1 , '1' , '0')))

plot_ly(plot_data , 
        x = ~satisfaction_level ,
        y = ~left ,
        type = 'box')

The p-value is very small, therefore the difference between satisfaction level and whether or not an employee has left is significant.

Employees that leave, have lower satisfaction levels

Test 3

t.test(hr$number_project ~ hr$left)
## 
##  Welch Two Sample t-test
## 
## data:  hr$number_project by hr$left
## t = -2.1663, df = 4236.5, p-value = 0.03034
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.131136535 -0.006540119
## sample estimates:
## mean in group 0 mean in group 1 
##        3.786664        3.855503
plot_data <- hr %>% 
  mutate(left = as.factor(ifelse(left == 1 , '1' , '0')))

plot_ly(plot_data , 
        x = ~number_project ,
        y = ~left ,
        type = 'box')

The p-value is less than 0.05, therefore the difference between number of projects worked and whether or not an employee has left is significant.

Employees that work more projects, tend to leave.

Test 4

t.test(hr$last_evaluation ~ hr$left)
## 
##  Welch Two Sample t-test
## 
## data:  hr$last_evaluation by hr$left
## t = -0.72534, df = 5154.9, p-value = 0.4683
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.009772224  0.004493874
## sample estimates:
## mean in group 0 mean in group 1 
##       0.7154734       0.7181126
plot_data <- hr %>% 
  mutate(left = as.factor(ifelse(left == 1 , '1' , '0')))

plot_ly(plot_data , 
        x = ~last_evaluation ,
        y = ~left ,
        type = 'box')

The p-value is 0.463(greater than 0.05), therefore the difference between an employees last evaluation and whether or not an employee has left is not significant.

Employees last evaluation has no effect on whether or not they leave the company.