R Markdown

library(readr) 
library(ggplot2)
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

1. Perform the correlation (.5 point) Choose any two appropriate variables from the data and perform the correlation, displaying the results.

2. Interpret the results in technical terms (.5 point) For each correlation, explain what the test’s p-value means (significance).

3. Interpret the results in non-technical terms (1 point) For each correlation, what do the results mean in non-techical terms.

4. Create a plot that helps visualize the correlation (.5 point) For each correlation, create a graph to help visualize the realtionship between the two variables. The title must be the non-technical interpretation.

cor.test(hr$time_spend_company , hr$satisfaction_level) 
## 
##  Pearson's product-moment correlation
## 
## data:  hr$time_spend_company and hr$satisfaction_level
## t = -12.416, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.11668153 -0.08499948
## sample estimates:
##        cor 
## -0.1008661

The p-value is less than 2.2e-16, which is very small. It is a weak negative correlation, the cor is -0.1008661.

The result signifys that the effect is very small with a slight chance one variable increases while the other decreases.

When one variable increases, like the time spent at the company, there is a correlation that shows it will negatively impact satisfaction level.

ggplot(hr, aes(x = time_spend_company, y =satisfaction_level)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "purple") +
  labs(title = "The more time at the company, less satisfied they are",
       x = "Time Spent at Company",
       y = "Satisfaction Level") 
## `geom_smooth()` using formula = 'y ~ x'

cor.test(hr$average_montly_hours , hr$number_project)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$average_montly_hours and hr$number_project
## t = 56.219, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4039037 0.4303411
## sample estimates:
##       cor 
## 0.4172106

The P value is < 2.2e-16 with a correlation of 0.4172106 which is very relatively small, so there is a moderate positive correlation between avgerage_montly_hours and number of projects

The more hours worked on average by month correlates to a higher number of projects.

ggplot(hr, aes(x = number_project, y =average_montly_hours)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "The more projects, the higher the monthly hours",
       x = "Number of Projects",
       y = "Avg Monthly Hours")
## `geom_smooth()` using formula = 'y ~ x'

cor.test(hr$satisfaction_level , hr$last_evaluation)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$satisfaction_level and hr$last_evaluation
## t = 12.933, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08916727 0.12082195
## sample estimates:
##       cor 
## 0.1050212

The P value is < 2.2e-16 and the correlation is 0.1050212 which is very small so the correlation between the Satisfaction_level and Last_evalution is low and positive

The more satisfied an employee is, the better they perform on their last evaluation

ggplot(hr, aes(x = last_evaluation, y =satisfaction_level)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(title = "The more Satisfied, the higher the evaluation score",
       x = "Last Evaluation Score",
       y = "Satisfaction Score")
## `geom_smooth()` using formula = 'y ~ x'

cor.test(hr$Work_accident , hr$satisfaction_level)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$Work_accident and hr$satisfaction_level
## t = 7.2006, df = 14997, p-value = 6.279e-13
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.04273358 0.07463094
## sample estimates:
##        cor 
## 0.05869724

The p value is = 6.279e-13 and the correlation value is 0.05869724 which is extremely small so the correlation between Work_accident and Last_evalution is very weak and positive

The more Work accidents an employee has leads to a slight increase in their satisfaction

ggplot(hr, aes(x = Work_accident, y =satisfaction_level)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "brown") +
  labs(title = "The more Work accidents, the higher the satisfaction",
       x = "Number of Work Accidents",
       y = "Satisfaction Score")
## `geom_smooth()` using formula = 'y ~ x'