Justin Kaplan
Assignment 8
library(dplyr)
library(plotly)
library(readr)
HR <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
Question 1 : A comparison of projects worked on vs. whether
employees left the company
t.test(HR$number_project ~ HR$left)
##
## Welch Two Sample t-test
##
## data: HR$number_project by HR$left
## t = -2.1663, df = 4236.5, p-value = 0.03034
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.131136535 -0.006540119
## sample estimates:
## mean in group 0 mean in group 1
## 3.786664 3.855503
p-value interpretation: The P-Value is 0.03 which means that there
is a fairly low chance that this data is up to chance.
t-test interpretation: The difference in mean projects worked is a
very low number and likely not the reason employees are leaving the
company
Non-technical interpretation: Employees that left the company worked
on 0.06 more projects than those who are still with the company.
Graph: Employees who leave the company were working on slightly more
projects than those who stayed
plot_data <- HR %>%
mutate(left = as.factor(ifelse(left == 0 , 'Left' , 'Still There')))
plot_ly(plot_data ,
x = HR$left ,
y = HR$number_project ,
type = 'box')
Question 2 : A comparison between satisfaction level and whether the
employee left the company
t.test(HR$satisfaction_level ~ HR$left)
##
## Welch Two Sample t-test
##
## data: HR$satisfaction_level by HR$left
## t = 46.636, df = 5167, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 0.2171815 0.2362417
## sample estimates:
## mean in group 0 mean in group 1
## 0.6668096 0.4400980
p-value interpretation: The P-Value is incredibly low which means
that there is a very low chance that this data would change signifcantly
given more data.
t-test interpretation: The difference in satisfaction between those
who left the company and those who stayed was 0.236 which is a large
amount on a scale of 1.
non-technical interpretation: Workers who are less satisfied at work
are more likely to leave the company.
Graph: Workers who are less satisfied are much more likely to leave
the company
plot_data <- HR %>%
mutate(left = as.factor(ifelse(left == 0 , 'Left' , 'Still There')))
plot_ly(plot_data ,
x = HR$left ,
y = HR$satisfaction_level ,
type = 'box')
Question 3
t.test(HR$Work_accident ~ HR$left)
##
## Welch Two Sample t-test
##
## data: HR$Work_accident by HR$left
## t = 25.403, df = 10883, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 0.1178305 0.1375356
## sample estimates:
## mean in group 0 mean in group 1
## 0.17500875 0.04732568
p-value interpretation: The P-Value is incredibly low which means
that more data would likely not change the results
t-test interpretation: Surprisingly the people who left the company
have been in a little under a quarter of the accidents compared to those
who stayed
non-technical interpretation: Employees who stay with the company
are 4 times as likely to have been in an accident
Graph:
plot_data <- HR %>%
mutate(left = as.factor(ifelse(left == 0 , 'Left' , 'Still There')))
plot_ly(plot_data ,
x = HR$left ,
y = HR$Work_accident ,
type = 'bar')
Question 4
t.test(HR$average_montly_hours ~ HR$left)
##
## Welch Two Sample t-test
##
## data: HR$average_montly_hours by HR$left
## t = -7.5323, df = 4875.1, p-value = 5.907e-14
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -10.534631 -6.183384
## sample estimates:
## mean in group 0 mean in group 1
## 199.0602 207.4192
p-value interpretation: The P-Value is very low which means that the
data would likely not change largely given more data
t-test interpretation: Those who left the company work 8.36 hours
more a week than those who stayed.
non-technical interpretation: If this comoany can find a way to
limit hours by just a little bit a week they would be more likely to
stay with the company.
Graph:
plot_data <- HR %>%
mutate(left = as.factor(ifelse(left == 0 , 'Left' , 'Still There')))
plot_ly(plot_data ,
x = HR$left ,
y = HR$average_montly_hours ,
type = 'box')