library(readr)
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
t.test(mtcars$mpg ~ mtcars$am)
##
## Welch Two Sample t-test
##
## data: mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
##
## Welch Two Sample t-test
##
## data: mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
The p-value is very small, therefore the difference between means of mpg by am is significant.
The difference in mean MPG between manual and automatic cars is significant, where the difference in MPG is at least 3.2 MPG.
Manual cars are more fuel efficiency
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
plot_data <- mtcars %>%
mutate(Transmision = as.factor(ifelse(am == 0 , 'Automatic' , 'Manual')))
plot_ly(plot_data ,
x = ~Transmision ,
y = ~mpg ,
type = 'box')
# perform the t-test
t_test_2 <- t.test(average_montly_hours ~ left, data = hr)
# display the result
t_test_2
##
## Welch Two Sample t-test
##
## data: average_montly_hours by left
## t = -7.5323, df = 4875.1, p-value = 5.907e-14
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -10.534631 -6.183384
## sample estimates:
## mean in group 0 mean in group 1
## 199.0602 207.4192
##
## Welch Two Sample t-test
##
## data: average_montly_hours by left
## t = -7.5323, df = 4875.1, p-value = 5.907e-14
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -10.534631 -6.183384
## sample estimates:
## mean in group 0 mean in group 1
## 199.0602 207.4192
We reject the Ho, p-value < 0.01, where the satisfaction level for employees that left is lower.
Employees with lower satisfaction levels are more likely to leave.
# visualization for average monthly hours by attrition
plot_ly(hr, x = ~factor(left, labels = c("Stayed", "Left")), y = ~average_montly_hours, type = 'box') %>%
layout(title = "employees who left tend to work more hours",
xaxis = list(title = "attrition status"),
yaxis = list(title = "average monthly hours"))
# perform the t-test
t_test_3 <- t.test(last_evaluation ~ left, data = hr)
# display the result
t_test_3
##
## Welch Two Sample t-test
##
## data: last_evaluation by left
## t = -0.72534, df = 5154.9, p-value = 0.4683
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.009772224 0.004493874
## sample estimates:
## mean in group 0 mean in group 1
## 0.7154734 0.7181126
##
## Welch Two Sample t-test
##
## data: last_evaluation by left
## t = -0.72534, df = 5154.9, p-value = 0.4683
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.009772224 0.004493874
## sample estimates:
## mean in group 0 mean in group 1
## 0.7154734 0.7181126
The p-value is 0.468275, showing the significance of the difference in last evaluation scores
Employees who left generally have higher evaluation scores than those who stayed
# perform the t-test
t_test_4 <- t.test(number_project ~ left, data = hr)
# display the result
t_test_4
##
## Welch Two Sample t-test
##
## data: number_project by left
## t = -2.1663, df = 4236.5, p-value = 0.03034
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.131136535 -0.006540119
## sample estimates:
## mean in group 0 mean in group 1
## 3.786664 3.855503
##
## Welch Two Sample t-test
##
## data: number_project by left
## t = -2.1663, df = 4236.5, p-value = 0.03034
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.131136535 -0.006540119
## sample estimates:
## mean in group 0 mean in group 1
## 3.786664 3.855503
The p-value is 0.0303407, indicating the significance of the difference in the number of projects between the two groups
Employees who left the company were generally involved in more projects than those who stayed
# visualization for number of projects by attrition
plot_ly(hr, x = ~factor(left, labels = c("stayed", "left")), y = ~number_project, type = 'box') %>%
layout(title = "employees who left were involved in more projects",
xaxis = list(title = "attrition status"),
yaxis = list(title = "number of projects"))