library(readr)
library(ggplot2)
options(scipen = 999) #to get rid of scientific notation for the global environment
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Correlation 1: number of projects and average monthly hours

cor.test(hr$number_project, hr$average_montly_hours)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$number_project and hr$average_montly_hours
## t = 56.219, df = 14997, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4039037 0.4303411
## sample estimates:
##       cor 
## 0.4172106
ggplot(hr, aes(x = number_project, y = average_montly_hours)) +
  geom_point(alpha = 0.3, size = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(
    title = "The more projects you're involved in, the more time you're going to spend at work.
",
    x = "Number of Projects",
    y = "Average Monthly Hours"
  )
## `geom_smooth()` using formula = 'y ~ x'

The p-value is tiny, so the correlation between number of projects and average monthly hours is significant.

The correlation is positive and moderate.

The more projects you’re involved in, the more time you’re going to spend at work.

Correlation 2: satisfaction level and years spent at company

cor.test(hr$satisfaction_level, hr$time_spend_company)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$satisfaction_level and hr$time_spend_company
## t = -12.416, df = 14997, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.11668153 -0.08499948
## sample estimates:
##        cor 
## -0.1008661
ggplot(hr, aes(x = satisfaction_level, y = time_spend_company)) +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  geom_point(alpha = 0.3, size = 0.5)  +
  labs(
    title = "Satisfaction Level May Not Be Linked to Years With Company",
    x = "Satisfaction Level at Job",
    y = "Number of Years Spent at Company"
  )
## `geom_smooth()` using formula = 'y ~ x'

The p-value is tiny, so the correlation between satisfaction level and years spent at company is significant.

The correlation is negative and extremely weak.

The more projects you’re involved in, the more time you’re going to spend at work.

Correlation 3: monthly hours and satisfaction level

cor.test(hr$average_montly_hours, hr$satisfaction_level)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$average_montly_hours and hr$satisfaction_level
## t = -2.4556, df = 14997, p-value = 0.01408
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.036040356 -0.004045605
## sample estimates:
##         cor 
## -0.02004811
ggplot(hr, aes(x = satisfaction_level, y = average_montly_hours)) +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  geom_point(alpha = 0.3, size = 0.5)  +
  labs(
    title = "Monthly Hours Do Not Affect Satisfaction Level",
    x = "Satisfaction Level at Job",
    y = "Average Monthly Hours"
  )
## `geom_smooth()` using formula = 'y ~ x'

The p-value is small, so the correlation between satisfaction level and monthly hours is significant.

The correlation is negative and very close to zero.

Average monthly hours have little to no effect on satisfaction level.

Correlation 4: last evaluation and satisfaction level

cor.test(hr$last_evaluation, hr$satisfaction_level)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$last_evaluation and hr$satisfaction_level
## t = 12.933, df = 14997, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08916727 0.12082195
## sample estimates:
##       cor 
## 0.1050212
ggplot(hr, aes(x = satisfaction_level, y = last_evaluation)) +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  geom_point(alpha = 0.3, size = 0.5)  +
  labs(
    title = "The More Satisfied Employees Are, The Better Ratings They Get",
    x = "Satisfaction Level at Job",
    y = "Last Evaluation Rating"
  )
## `geom_smooth()` using formula = 'y ~ x'

The p-value is small, so the correlation between satisfaction level and last evaluation rating is significant.

The correlation is positive and slightly weak.

The more satisfied employees are, the better ratings they get