library(readr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(plotly)

## Loading required package: ggplot2

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')

## Rows: 14999 Columns: 10

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

str(hr)

## spc_tbl_ [14,999 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ satisfaction_level   : num [1:14999] 0.38 0.8 0.11 0.72 0.37 0.41 0.1 0.92 0.89 0.42 ...
##  $ last_evaluation      : num [1:14999] 0.53 0.86 0.88 0.87 0.52 0.5 0.77 0.85 1 0.53 ...
##  $ number_project       : num [1:14999] 2 5 7 5 2 2 6 5 5 2 ...
##  $ average_montly_hours : num [1:14999] 157 262 272 223 159 153 247 259 224 142 ...
##  $ time_spend_company   : num [1:14999] 3 6 4 5 3 3 4 5 5 3 ...
##  $ Work_accident        : num [1:14999] 0 0 0 0 0 0 0 0 0 0 ...
##  $ left                 : num [1:14999] 1 1 1 1 1 1 1 1 1 1 ...
##  $ promotion_last_5years: num [1:14999] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Department           : chr [1:14999] "sales" "sales" "sales" "sales" ...
##  $ salary               : chr [1:14999] "low" "medium" "medium" "low" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   satisfaction_level = col_double(),
##   ..   last_evaluation = col_double(),
##   ..   number_project = col_double(),
##   ..   average_montly_hours = col_double(),
##   ..   time_spend_company = col_double(),
##   ..   Work_accident = col_double(),
##   ..   left = col_double(),
##   ..   promotion_last_5years = col_double(),
##   ..   Department = col_character(),
##   ..   salary = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

hr1 <- hr %>%
  mutate(Employee_Status = as.factor(ifelse(left == 0, 'stayed', 'left')))

Perform four (4) t-tests using any appropriate variables (continuous) by the variable left. Note that the variable left describes whether the employee stayed at the company (left = 0), or not (left = 1).

t.test(hr1$average_montly_hours ~ hr1$Employee_Status)

## 
##  Welch Two Sample t-test
## 
## data:  hr1$average_montly_hours by hr1$Employee_Status
## t = 7.5323, df = 4875.1, p-value = 5.907e-14
## alternative hypothesis: true difference in means between group left and group stayed is not equal to 0
## 95 percent confidence interval:
##   6.183384 10.534631
## sample estimates:
##   mean in group left mean in group stayed 
##             207.4192             199.0602

there is a significant difference between means, where employees that left work

left work at least 6 hour more

employees that left on average, work more hours, at least 3 percent

t.test(hr1$satisfaction_level ~ hr1$Employee_Status)

## 
##  Welch Two Sample t-test
## 
## data:  hr1$satisfaction_level by hr1$Employee_Status
## t = -46.636, df = 5167, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group left and group stayed is not equal to 0
## 95 percent confidence interval:
##  -0.2362417 -0.2171815
## sample estimates:
##   mean in group left mean in group stayed 
##            0.4400980            0.6668096

This suggests that, on average, employees who left the company had a higher last evaluation score than those who stayed.

Since p < 0.05, we reject the null hypothesis and conclude that there is a statistically significant difference in last_evaluation scores between employees who stayed and those who left.

Employees who left the company had a statistically significantly higher last_evaluation score than those who stayed

#A significant p-value suggests that employees who left likely spent a different amount of time at the company

t.test(hr1$last_evaluation ~ hr1$Employee_Status)

## 
##  Welch Two Sample t-test
## 
## data:  hr1$last_evaluation by hr1$Employee_Status
## t = 0.72534, df = 5154.9, p-value = 0.4683
## alternative hypothesis: true difference in means between group left and group stayed is not equal to 0
## 95 percent confidence interval:
##  -0.004493874  0.009772224
## sample estimates:
##   mean in group left mean in group stayed 
##            0.7181126            0.7154734

employees with particular evaluation scores (higher or lower) are more likely to leave

evaluation scores differ for those who left versus those who stayed

t.test(hr1$time_spend_company~ hr1$left)

## 
##  Welch Two Sample t-test
## 
## data:  hr1$time_spend_company by hr1$left
## t = -22.631, df = 9625.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.5394767 -0.4534706
## sample estimates:
## mean in group 0 mean in group 1 
##        3.380032        3.876505

employees who left had a significantly different average time at the company compared to those who stayed.

average time employees spent at the company differs significantly between those who left (left = 1) and those who stayed (left = 0).

For each of the four t-tests:

Perform the t-test (.5 point) Choose any two appropriate variables from the data and perform the t-test, displaying the results.

t_test_satisfaction <- t.test(satisfaction_level ~ left, data = hr, var.equal = FALSE)
cat("T-test for Satisfaction Level:\n")

## T-test for Satisfaction Level:

print(t_test_satisfaction)

## 
##  Welch Two Sample t-test
## 
## data:  satisfaction_level by left
## t = 46.636, df = 5167, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.2171815 0.2362417
## sample estimates:
## mean in group 0 mean in group 1 
##       0.6668096       0.4400980

t_test_hours <- t.test(average_montly_hours ~ left, data = hr, var.equal = FALSE)
cat("\nT-test for Average Monthly Hours:\n")

## 
## T-test for Average Monthly Hours:

print(t_test_hours)

## 
##  Welch Two Sample t-test
## 
## data:  average_montly_hours by left
## t = -7.5323, df = 4875.1, p-value = 5.907e-14
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -10.534631  -6.183384
## sample estimates:
## mean in group 0 mean in group 1 
##        199.0602        207.4192

t_test_evaluation <- t.test(last_evaluation ~ left, data = hr, var.equal = FALSE)
cat("\nT-test for Last Evaluation:\n")

## 
## T-test for Last Evaluation:

print(t_test_evaluation)

## 
##  Welch Two Sample t-test
## 
## data:  last_evaluation by left
## t = -0.72534, df = 5154.9, p-value = 0.4683
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.009772224  0.004493874
## sample estimates:
## mean in group 0 mean in group 1 
##       0.7154734       0.7181126

t_test_time_spend <- t.test(time_spend_company ~ left, data = hr, var.equal = FALSE)
cat("\nT-test for time spend company:\n")

## 
## T-test for time spend company:

print(t_test_time_spend)

## 
##  Welch Two Sample t-test
## 
## data:  time_spend_company by left
## t = -22.631, df = 9625.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.5394767 -0.4534706
## sample estimates:
## mean in group 0 mean in group 1 
##        3.380032        3.876505

Interpret the results in technical terms (.5 point) For each t-test, explain what the test’s p-value means (significance).

T-test for Satisfaction Level: p-value = 0

T-test for Average Monthly Hours: p-value = 5.9 e-14

T test for Last Evaluation p value = .468

T test for Time spend company p value = 1.59 e-10

Interpret the results in non-technical terms (1 point) For each t-test, what do the results mean in non-techical terms.

The p-value is extremely small, indicating a highly significant difference in satisfaction levels between employees who left and those who stayed.

similarly, a very small p-value suggests a significant difference in average monthly hours between the two groups.

The p-value is greater than 0.05, indicating no significant difference in performance evaluations between employees who left and those who stayed.

this small p-value indicates a highly significant difference in the amount of time spent at the company between employees who stayed and those who left.

Create a plot that helps visualize the t-test (.5 point) For each t-test, create a graph to help visualize the difference between means, if any. The title must be the non-technical interpretation.

plot_ly(hr1 ,
        x = ~Employee_Status ,
        y = ~average_montly_hours ,
        type = 'box' ,
        color = ~Employee_Status)

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

plot_ly(hr1, 
        x = ~Employee_Status, 
        y = ~satisfaction_level, 
        type = 'box', 
        color = ~Employee_Status, 
        boxmean = TRUE) %>%
  layout(title = "Satisfaction Level by Employee Status")

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

plot_ly(hr1, 
        x = ~Employee_Status, 
        y = ~last_evaluation, 
        type = 'box', 
        color = ~Employee_Status, 
        boxmean = TRUE) %>%
  layout(title = "Last Evaluation by Employee Status")

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

plot_ly(hr1, 
        x = ~Employee_Status, 
        y = ~time_spend_company, 
        type = 'box', 
        color = ~Employee_Status, 
        boxmean = TRUE) %>%
  layout(title = "Time Spent at Company by Employee Status")

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

assignment 7

Connor Lewis, Jack Levine

2024-11-12

Perform four (4) t-tests using any appropriate variables (continuous) by the variable left. Note that the variable left describes whether the employee stayed at the company (left = 0), or not (left = 1).

there is a significant difference between means, where employees that left work

left work at least 6 hour more

employees that left on average, work more hours, at least 3 percent

This suggests that, on average, employees who left the company had a higher last evaluation score than those who stayed.

Since p < 0.05, we reject the null hypothesis and conclude that there is a statistically significant difference in last_evaluation scores between employees who stayed and those who left.

Employees who left the company had a statistically significantly higher last_evaluation score than those who stayed

employees with particular evaluation scores (higher or lower) are more likely to leave

evaluation scores differ for those who left versus those who stayed

employees who left had a significantly different average time at the company compared to those who stayed.

average time employees spent at the company differs significantly between those who left (left = 1) and those who stayed (left = 0).

For each of the four t-tests:

Perform the t-test (.5 point) Choose any two appropriate variables from the data and perform the t-test, displaying the results.

Interpret the results in technical terms (.5 point) For each t-test, explain what the test’s p-value means (significance).

T-test for Satisfaction Level: p-value = 0

T-test for Average Monthly Hours: p-value = 5.9 e-14

T test for Last Evaluation p value = .468

T test for Time spend company p value = 1.59 e-10

Interpret the results in non-technical terms (1 point) For each t-test, what do the results mean in non-techical terms.

The p-value is extremely small, indicating a highly significant difference in satisfaction levels between employees who left and those who stayed.

similarly, a very small p-value suggests a significant difference in average monthly hours between the two groups.

The p-value is greater than 0.05, indicating no significant difference in performance evaluations between employees who left and those who stayed.

this small p-value indicates a highly significant difference in the amount of time spent at the company between employees who stayed and those who left.

Create a plot that helps visualize the t-test (.5 point) For each t-test, create a graph to help visualize the difference between means, if any. The title must be the non-technical interpretation.