library(readr)
library(ggplot2)
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Correlation 1

cor.test(hr$satisfaction_level , hr$last_evaluation)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$satisfaction_level and hr$last_evaluation
## t = 12.933, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08916727 0.12082195
## sample estimates:
##       cor 
## 0.1050212

The p-value is very small, therefore the correlation bewteen satisfaction level and last eval is significant

The correlation is positve but weak

Employees with higher satisfaction levels tend to have slighlty higher evals

ggplot(hr, aes(x = satisfaction_level, y = last_evaluation)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(
    title = "Higher satisfaction linked to slightly higher evaluations",
    x = "Satisfaction Level",
    y = "Last Evaluation"
  )
## `geom_smooth()` using formula = 'y ~ x'

# Correlation 2

cor.test(hr$average_montly_hours, hr$last_evaluation)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$average_montly_hours and hr$last_evaluation
## t = 44.237, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3255078 0.3538218
## sample estimates:
##       cor 
## 0.3397418

The p-value is very small, therefore the correlation between avg monthly hours and last eval is significant

The correlation is poisitve but moderate

Employyes who spend more time working each month on avg tend to get higher evals

ggplot(hr, aes(x = average_montly_hours, y = last_evaluation)) +
  geom_point(alpha = 0.3, size = 1) +
  geom_smooth(method = "lm", se = FALSE, color = "red", linewidth = 1) +
  labs(
    title = "Employees who work more hours tend to get higher evaluations",
    x = "Average Monthly Hours",
    y = "Last Evaluation"
  )
## `geom_smooth()` using formula = 'y ~ x'

# Correlation 3

cor.test(hr$satisfaction_level, hr$average_montly_hours)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$satisfaction_level and hr$average_montly_hours
## t = -2.4556, df = 14997, p-value = 0.01408
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.036040356 -0.004045605
## sample estimates:
##         cor 
## -0.02004811

The p-value is less than .05 so the correlation is significant

the correlation is negative and weak

Employees are slightly more satisfied when they work fewerer hours on average

ggplot(hr, aes(x = satisfaction_level, y = average_montly_hours)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(
    title = "More satisfied employees tend to work slightly fewer hours",
    x = "Satisfaction Level",
    y = "Average Monthly Hours"
  )
## `geom_smooth()` using formula = 'y ~ x'

# Correlation 4

cor.test(hr$time_spend_company, hr$promotion_last_5years)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$time_spend_company and hr$promotion_last_5years
## t = 8.2768, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.05148468 0.08334679
## sample estimates:
##        cor 
## 0.06743293

the p-value is very small therefor the correlation between time spent at the company and recent promotions is significant

The correlation is positive but weak

Employyes who have worked at the company for longer are slightly more likely to recieve a promotion in the last 5 years

ggplot(hr, aes(x = factor(promotion_last_5years), y = time_spend_company)) +
  geom_boxplot(fill = "lightblue") +
  labs(
    title = "Employees promoted in the last 5 years tend to have spent more years at the company",
    x = "Promotion in Last 5 Years (0 = No, 1 = Yes)",
    y = "Years at Company"
  ) +
  theme_minimal()