Justin Kaplan

Assignment 8

library(dplyr)
library(plotly)
library(readr)
HR <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')

Question 1 : A comparison of projects worked on vs. whether employees left the company

t.test(HR$number_project ~ HR$left)
## 
##  Welch Two Sample t-test
## 
## data:  HR$number_project by HR$left
## t = -2.1663, df = 4236.5, p-value = 0.03034
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.131136535 -0.006540119
## sample estimates:
## mean in group 0 mean in group 1 
##        3.786664        3.855503

p-value interpretation: The P-Value is 0.03 which means that there is a fairly low chance that this data is up to chance.

t-test interpretation: The difference in mean projects worked is a very low number and likely not the reason employees are leaving the company

Non-technical interpretation: Employees that left the company worked on 0.06 more projects than those who are still with the company.

Graph: Employees who leave the company were working on slightly more projects than those who stayed

plot_data <- HR %>% 
  mutate(left = as.factor(ifelse(left == 0 , 'Left' , 'Still There')))
plot_ly(plot_data , 
        x = HR$left ,
        y = HR$number_project ,
        type = 'box')

Question 2 : A comparison between satisfaction level and whether the employee left the company

t.test(HR$satisfaction_level ~ HR$left)
## 
##  Welch Two Sample t-test
## 
## data:  HR$satisfaction_level by HR$left
## t = 46.636, df = 5167, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.2171815 0.2362417
## sample estimates:
## mean in group 0 mean in group 1 
##       0.6668096       0.4400980

p-value interpretation: The P-Value is incredibly low which means that there is a very low chance that this data would change signifcantly given more data.

t-test interpretation: The difference in satisfaction between those who left the company and those who stayed was 0.236 which is a large amount on a scale of 1.

non-technical interpretation: Workers who are less satisfied at work are more likely to leave the company.

Graph: Workers who are less satisfied are much more likely to leave the company

plot_data <- HR %>% 
  mutate(left = as.factor(ifelse(left == 0 , 'Left' , 'Still There')))
plot_ly(plot_data , 
        x = HR$left ,
        y = HR$satisfaction_level ,
        type = 'box')

Question 3

t.test(HR$Work_accident ~ HR$left)
## 
##  Welch Two Sample t-test
## 
## data:  HR$Work_accident by HR$left
## t = 25.403, df = 10883, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.1178305 0.1375356
## sample estimates:
## mean in group 0 mean in group 1 
##      0.17500875      0.04732568

p-value interpretation: The P-Value is incredibly low which means that more data would likely not change the results

t-test interpretation: Surprisingly the people who left the company have been in a little under a quarter of the accidents compared to those who stayed

non-technical interpretation: Employees who stay with the company are 4 times as likely to have been in an accident

Graph:

plot_data <- HR %>% 
  mutate(left = as.factor(ifelse(left == 0 , 'Left' , 'Still There')))
plot_ly(plot_data , 
        x = HR$left ,
        y = HR$Work_accident ,
        type = 'bar')

Question 4

t.test(HR$average_montly_hours ~ HR$left)
## 
##  Welch Two Sample t-test
## 
## data:  HR$average_montly_hours by HR$left
## t = -7.5323, df = 4875.1, p-value = 5.907e-14
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -10.534631  -6.183384
## sample estimates:
## mean in group 0 mean in group 1 
##        199.0602        207.4192

p-value interpretation: The P-Value is very low which means that the data would likely not change largely given more data

t-test interpretation: Those who left the company work 8.36 hours more a week than those who stayed.

non-technical interpretation: If this comoany can find a way to limit hours by just a little bit a week they would be more likely to stay with the company.

Graph:

plot_data <- HR %>% 
  mutate(left = as.factor(ifelse(left == 0 , 'Left' , 'Still There')))
plot_ly(plot_data , 
        x = HR$left ,
        y = HR$average_montly_hours ,
        type = 'box')