hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(hr)  # View first few rows
## # A tibble: 6 × 10
##   satisfaction_level last_evaluation number_project average_montly_hours
##                <dbl>           <dbl>          <dbl>                <dbl>
## 1               0.38            0.53              2                  157
## 2               0.8             0.86              5                  262
## 3               0.11            0.88              7                  272
## 4               0.72            0.87              5                  223
## 5               0.37            0.52              2                  159
## 6               0.41            0.5               2                  153
## # ℹ 6 more variables: time_spend_company <dbl>, Work_accident <dbl>,
## #   left <dbl>, promotion_last_5years <dbl>, Department <chr>, salary <chr>
t_test1 <- t.test(hr$satisfaction_level ~ hr$left)
t_test1  # Display results
## 
##  Welch Two Sample t-test
## 
## data:  hr$satisfaction_level by hr$left
## t = 46.636, df = 5167, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.2171815 0.2362417
## sample estimates:
## mean in group 0 mean in group 1 
##       0.6668096       0.4400980
#1st t-test: Satisfaction Level

#Technical Interpretation:

#The p-value from the t-test on satisfaction_level by left indicates whether there is a statistically significant difference in satisfaction levels between employees who left and those who stayed. If the p-value is less than 0.05, we reject the null hypothesis and conclude that satisfaction levels differ significantly between the two groups.

#Non-Technical Interpretation:

#If the p-value is low, it suggests that employees who left the company had significantly different satisfaction levels compared to those who stayed. If satisfaction was lower for employees who left, it indicates dissatisfaction may be a factor in attrition.
t_test2 <- t.test(hr$last_evaluation ~ hr$left)
t_test2  # Display results
## 
##  Welch Two Sample t-test
## 
## data:  hr$last_evaluation by hr$left
## t = -0.72534, df = 5154.9, p-value = 0.4683
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.009772224  0.004493874
## sample estimates:
## mean in group 0 mean in group 1 
##       0.7154734       0.7181126
#2nd t-test: Last Evaluation Score

#Technical Interpretation:

#The p-value from the t-test on last_evaluation by left tells us whether the performance evaluation scores significantly differ between employees who left and those who stayed. A p-value below 0.05 means we can confidently say that evaluation scores are different between the two groups.

#Non-Technical Interpretation:

#If the p-value is small, it suggests that employees who left may have had either very high or very low performance evaluations compared to those who stayed. This could mean that both underperformers and overachievers are at risk of leaving.
t_test3 <- t.test(hr$average_montly_hours ~ hr$left)
t_test3  # Display results
## 
##  Welch Two Sample t-test
## 
## data:  hr$average_montly_hours by hr$left
## t = -7.5323, df = 4875.1, p-value = 5.907e-14
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -10.534631  -6.183384
## sample estimates:
## mean in group 0 mean in group 1 
##        199.0602        207.4192
#3rd t-test: Average Monthly Hours

#Technical Interpretation:

#The p-value for the average_montly_hours variable shows whether employees who left worked significantly different hours compared to those who stayed. If the p-value is below 0.05, we conclude that work hours differ meaningfully between the two groups.

#Non-Technical Interpretation:

#A significant result means that employees who left worked a different number of hours than those who stayed. If those who left worked longer hours, it could suggest burnout as a reason for attrition.
t_test4 <- t.test(hr$time_spend_company ~ hr$left)
t_test4  # Display results
## 
##  Welch Two Sample t-test
## 
## data:  hr$time_spend_company by hr$left
## t = -22.631, df = 9625.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.5394767 -0.4534706
## sample estimates:
## mean in group 0 mean in group 1 
##        3.380032        3.876505
#4th t-test: Time Spent at Company

#Technical Interpretation:

#The t-test for time_spend_company determines if there is a meaningful difference in tenure between employees who left and those who stayed. A low p-value (below 0.05) means that employees who left had significantly different tenure lengths compared to those who stayed.

#Non-Technical Interpretation:

#If the p-value is significant, it means employees who left had different levels of experience at the company. If those who left had worked longer, it might indicate dissatisfaction over time. If newer employees left more often, it could suggest poor onboarding or early job dissatisfaction.
hr %>% 
  mutate(left = as.factor(left)) %>%
  ggplot(aes(x = left, y = satisfaction_level, fill = left)) +
  geom_boxplot() +
  labs(title = "Employee Satisfaction Level by Attrition", x = "Left Company", y = "Satisfaction Level") +
  theme_minimal()

hr %>% 
  mutate(left = as.factor(left)) %>%
  ggplot(aes(x = left, y = last_evaluation, fill = left)) +
  geom_boxplot() +
  labs(title = "Last Evaluation Scores by Attrition", x = "Left Company", y = "Evaluation Score") +
  theme_minimal()

hr %>% 
  mutate(left = as.factor(left)) %>%
  ggplot(aes(x = left, y = average_montly_hours, fill = left)) +
  geom_boxplot() +
  labs(title = "Average Monthly Hours by Attrition", x = "Left Company", y = "Average Monthly Hours") +
  theme_minimal()

hr %>% 
  mutate(left = as.factor(left)) %>%
  ggplot(aes(x = left, y = time_spend_company, fill = left)) +
  geom_boxplot() +
  labs(title = "Time Spent at Company by Attrition", x = "Left Company", y = "Years at Company") +
  theme_minimal()