R Markdown

hr1 <- hr %>% 
  mutate(Employee_Status = ifelse(left == 0, 'Stayed', 'Left'),
         WorkAccidents = ifelse(Work_accident == 0, 'none', 'yes'))

Perform four (4) t-tests using any appropriate variables (continuous) by the variable left. Note that the variable left describes whether the employee left the company (left = 0), or not (left = 1).

For each of the four t-tests:

A. Perform the t-test (.5 point) Choose any two appropriate variables from the data and perform the t-test, displaying the results.

B. Interpret the results in technical terms (.5 point) For each t-test, explain what the test’s p-value means (significance).

C. Interpret the results in non-technical terms (1 point) For each t-test, what do the results mean in non-technical terms.

D. Create a plot that helps visualize the t-test (.5 point) For each t-test, create a graph to help visualize the difference between means, if any. The title must be the non-technical interpretation.


1a.

t.test(hr1$average_montly_hours ~ hr1$Employee_Status)
## 
##  Welch Two Sample t-test
## 
## data:  hr1$average_montly_hours by hr1$Employee_Status
## t = 7.5323, df = 4875.1, p-value = 5.907e-14
## alternative hypothesis: true difference in means between group Left and group Stayed is not equal to 0
## 95 percent confidence interval:
##   6.183384 10.534631
## sample estimates:
##   mean in group Left mean in group Stayed 
##             207.4192             199.0602

1b.

There is a significant difference between means, where employees that left work at least 6 hours more.

1c.

Employees that left, on average work more hours, at least 3% more.

1d.

plot_ly(hr1 , 
        x = ~Employee_Status ,
        y = ~average_montly_hours ,
        type = 'box',
        color = ~Employee_Status,
        colors= c('#29a21a','blue')
) %>% 
  layout(title = 'employees that left on average, work more hours, at least 3% more',
         yaxis = list(title = 'Average Monthly Hours', range = c(0,350)),
         xaxis = list(title = 'Employee Status'))

2a.

  t.test(hr1$average_montly_hours ~ hr1$promotion_last_5years)
## 
##  Welch Two Sample t-test
## 
## data:  hr1$average_montly_hours by hr1$promotion_last_5years
## t = 0.44937, df = 333.03, p-value = 0.6535
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.143788  6.597589
## sample estimates:
## mean in group 0 mean in group 1 
##        201.0764        199.8495

2b.

Employees with greater average monthly hours are not guaranteed a to receive a promotion over those who work less.

2c.

Working more hours per month does not increase the chances of receiving a promotion.

2d.

plot_ly(hr1 , 
        x = ~promotion_last_5years ,
        y = ~average_montly_hours ,
        type = 'box',
        color = ~as.factor(promotion_last_5years),
        colors= c('#ff7c00','#00e2ca')
) %>% 
  layout(title = 'Working more hours per month does not increase the chances of receiving a promotion ',
         yaxis = list(title = 'Average Monthly Hours', range = c(0,350)),
         xaxis = list(title = 'Promotion in Last 5 Years'))

3a.

dept_data <- hr1 %>% 
  filter(Department %in% c("marketing", "IT"))
  
t.test(satisfaction_level ~ Department, data = dept_data)
## 
##  Welch Two Sample t-test
## 
## data:  satisfaction_level by Department
## t = -0.041877, df = 1870.2, p-value = 0.9666
## alternative hypothesis: true difference in means between group IT and group marketing is not equal to 0
## 95 percent confidence interval:
##  -0.02198374  0.02106456
## sample estimates:
##        mean in group IT mean in group marketing 
##               0.6181418               0.6186014

3b.

Between the IT and Marketing departments, there is no difference in satisfaction levels.

3c.

Working in IT or Marketing will result is same satisfaction level.

3d.

plot_ly(dept_data, 
        x = ~Department, 
        y = ~satisfaction_level, 
        type = 'box', 
        color = ~Department,
        colors = c('purple','green')) %>% 
  layout(title = 'Working in IT or Marketing will result is same satisfaction level',
         yaxis = list(title = 'Satisfaction Level', range = c(0,1.2)),
         xaxis = list(title = 'Department'))

4a.

t.test( hr1$time_spend_company ~ hr1$Work_accident)
## 
##  Welch Two Sample t-test
## 
## data:  hr1$time_spend_company by hr1$Work_accident
## t = -0.23359, df = 2738.4, p-value = 0.8153
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.08269677  0.06509122
## sample estimates:
## mean in group 0 mean in group 1 
##        3.496960        3.505763

4b.

There is no significant difference between employees with and without a work accident

4c.

Time at the company isn’t affected by work accidents

4d.

plot_ly(hr1,
    x = ~WorkAccidents, 
    y = ~time_spend_company, 
    type = 'box', 
color = ~WorkAccidents,
colors = c('red','blue')) %>% 
  layout(title = 'Time at the company isn’t affected by work accidents',
         yaxis = list(title = 'Time Spent at Company'),
         xaxis = list(title = 'Work Accidents'))