Assignment 7 Correlations

Adrian Fernandez

2026-03-23

library(readr)
library(ggplot2)

hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')

head(hr)
## # A tibble: 6 × 10
##   satisfaction_level last_evaluation number_project average_montly_hours
##                <dbl>           <dbl>          <dbl>                <dbl>
## 1               0.38            0.53              2                  157
## 2               0.8             0.86              5                  262
## 3               0.11            0.88              7                  272
## 4               0.72            0.87              5                  223
## 5               0.37            0.52              2                  159
## 6               0.41            0.5               2                  153
## # ℹ 6 more variables: time_spend_company <dbl>, Work_accident <dbl>,
## #   left <dbl>, promotion_last_5years <dbl>, Department <chr>, salary <chr>

Correlation 1: Satisfaction Level vs. Last Evaluation

cor.test(hr$satisfaction_level, hr$last_evaluation)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$satisfaction_level and hr$last_evaluation
## t = 12.933, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08916727 0.12082195
## sample estimates:
##       cor 
## 0.1050212

Technical Interpretation: The p-value is extremely small (well below 0.05), which means the correlation between satisfaction level and last evaluation score is statistically significant. The correlation estimate is slightly negative and weak (around -0.10), indicating a small but real inverse relationship.

Non-Technical Interpretation: Employees who receive higher performance evaluation scores tend to report slightly lower satisfaction levels.

ggplot(hr, aes(x = last_evaluation, y = satisfaction_level)) +
  geom_point(alpha = 0.3, color = "steelblue") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "High Performers Tend to Be Slightly Less Satisfied",
       x = "Last Evaluation Score",
       y = "Satisfaction Level")
## `geom_smooth()` using formula = 'y ~ x'

Correlation 2: Number of Projects vs. Average Monthly Hours

cor.test(hr$number_project, hr$average_montly_hours)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$number_project and hr$average_montly_hours
## t = 56.219, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4039037 0.4303411
## sample estimates:
##       cor 
## 0.4172106

Technical Interpretation: The p-value is extremely small (far below 0.05), a statistically significant positive correlation between number of projects and average monthly hours. The correlation estimate is moderately strong and positive (around 0.42), meaning these two variables move together.

Non-Technical Interpretation: Employees who are assigned more projects tend to work more hours each month. This makes intuitive sense a heavier workload naturally demands more time on the job.

ggplot(hr, aes(x = number_project, y = average_montly_hours)) +
  geom_point(alpha = 0.3, color = "darkorange") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "More Projects Means More Hours Worked per Month",
       x = "Number of Projects",
       y = "Average Monthly Hours")
## `geom_smooth()` using formula = 'y ~ x'

Correlation 3: Time Spent at Company vs. Last Evaluation

cor.test(hr$time_spend_company, hr$last_evaluation)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$time_spend_company and hr$last_evaluation
## t = 16.256, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1158309 0.1472844
## sample estimates:
##       cor 
## 0.1315907

Technical Interpretation: The p-value is extremely small (well below 0.05), meaning it’s statistical significance. The positive correlation estimate (around 0.13) is weak, but real suggesting that employees with more tenure receive marginally higher evaluation scores.

Non-Technical Interpretation: Employees who have been with the company longer tend to receive slightly higher performance ratings. Experience on the job appears to give workers a modest edge in how they are evaluated.

ggplot(hr, aes(x = time_spend_company, y = last_evaluation)) +
  geom_point(alpha = 0.3, color = "mediumpurple") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "More Experienced Employees Tend to Score Higher in Evaluations",
       x = "Years at Company",
       y = "Last Evaluation Score")
## `geom_smooth()` using formula = 'y ~ x'

Correlation 4: Satisfaction Level vs. Average Monthly Hours

cor.test(hr$satisfaction_level, hr$average_montly_hours)
## 
##  Pearson's product-moment correlation
## 
## data:  hr$satisfaction_level and hr$average_montly_hours
## t = -2.4556, df = 14997, p-value = 0.01408
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.036040356 -0.004045605
## sample estimates:
##         cor 
## -0.02004811

Technical Interpretation: The p-value is extremely small (well below 0.05), indicating statistical significance. The negative correlation estimate (around -0.02 to -0.10) suggests that employees who work more hours each month report lower levels of satisfaction, though the relationship is relatively weak.

Non-Technical Interpretation: Employees who work longer hours each month tend to be less satisfied with their jobs. This suggests that overwork may be contributing to employee dissatisfaction, which could ultimately drive attrition.

ggplot(hr, aes(x = average_montly_hours, y = satisfaction_level)) +
  geom_point(alpha = 0.3, color = "tomato") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Working Longer Hours Is Linked to Lower Job Satisfaction",
       x = "Average Monthly Hours",
       y = "Satisfaction Level")
## `geom_smooth()` using formula = 'y ~ x'