library(readr)
library(ggplot2)
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
correlation <- cor.test(hr$satisfaction_level , hr$average_montly_hours)
print(correlation)
##
## Pearson's product-moment correlation
##
## data: hr$satisfaction_level and hr$average_montly_hours
## t = -2.4556, df = 14997, p-value = 0.01408
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.036040356 -0.004045605
## sample estimates:
## cor
## -0.02004811
The p-value = 0.01408 tells you that the finding is not due to chance, but the magnitude of the relationship (correlation = -0.02004811) is so weak that it is practically meaningless. In this case, the low correlation suggests that there is no meaningful relationship between the variables, despite the result being statistically significant.
The number of hours worked doesn’t really seem to affect satisfaction in any meaningful way, even though the statistical test tells us there’s a very slight, almost negligible negative relationship.
ggplot(hr, aes(x = average_montly_hours, y = satisfaction_level)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Minimal Negative Impact of Hours Worked on Employee Satisfaction",
x = "Average Monthly Hours",
y = "Satisfaction Level")
## `geom_smooth()` using formula = 'y ~ x'
cor2 <- cor.test(hr$last_evaluation , hr$time_spend_company)
print(cor2)
##
## Pearson's product-moment correlation
##
## data: hr$last_evaluation and hr$time_spend_company
## t = 16.256, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1158309 0.1472844
## sample estimates:
## cor
## 0.1315907
The p-value of 6.429866e-59 is an extremely small number. This means the correlation between time_spend_company and last_evaluation is statistically significant, and there is an extremely low probability that this correlation is due to random chance.
Employees who have worked longer at the company tend to get slightly better evaluation ratings, but the relationship is not strong.
ggplot(hr, aes(x = time_spend_company, y = last_evaluation)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "More Time Spent at Company May Relate to Higher Last Evaluations ",
x = "Time Spent at Company",
y = "Last Evaluation")
## `geom_smooth()` using formula = 'y ~ x'
cor3 <- cor.test(hr$satisfaction_level , hr$number_project)
print(cor3)
##
## Pearson's product-moment correlation
##
## data: hr$satisfaction_level and hr$number_project
## t = -17.69, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1586105 -0.1272570
## sample estimates:
## cor
## -0.1429696
A p-value of 2.526836e-69 indicates statistical significance, because the p-value is much smaller than 0.05, we can confidently say that the correlation between satisfaction_level and number_project is real and not just a result of randomness in the data.
The result suggests that employees who have more projects tend to have slightly lower satisfaction, but this correlation is weak. Since the correlation is weak, it’s not a major factor in determining satisfaction levels, and other factors likely play a more important role. The statistical significance indicates that the weak relationship is likely real, but in practical terms, it doesn’t provide much value for decision-making.
ggplot(hr, aes(x = number_project , y = satisfaction_level)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "More Projects May Relate to Lower Employee Satisfaction",
x = "Number of Projects",
y = "Satisfaction Level")
## `geom_smooth()` using formula = 'y ~ x'
cor4 <- cor.test(hr$satisfaction_level , hr$time_spend_company)
print(cor4)
##
## Pearson's product-moment correlation
##
## data: hr$satisfaction_level and hr$time_spend_company
## t = -12.416, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.11668153 -0.08499948
## sample estimates:
## cor
## -0.1008661
A p-value of 3.203473e-35 indicates that the relationship between satisfaction_level and time_spend_company is statistically significant. This means the probability that the observed correlation is due to random chance is almost zero.
The weak negative correlation suggests that there is a slight tendency for satisfaction to decrease as employees stay longer at the company, but this effect is weak and may not be meaningful in real-world decision-making.
ggplot(hr, aes(x = time_spend_company, y = satisfaction_level)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "More Time Spent at Company May Relate to Lower Satisfaction",
x = "Time Spent at Company",
y = "Satisfaction Level")
## `geom_smooth()` using formula = 'y ~ x'