library(readr)
library(ggplot2)
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
cor.test(hr$satisfaction_level , hr$last_evaluation)
##
## Pearson's product-moment correlation
##
## data: hr$satisfaction_level and hr$last_evaluation
## t = 12.933, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08916727 0.12082195
## sample estimates:
## cor
## 0.1050212
Correlation coefficient: The correlation coefficient (cor) is 0.393. This indicates a moderate positive correlation between satisfaction_level and last_evaluation. A positive value means that as one variable increases, the other tends to increase as well. P-value: The p-value is < 2.2e-16, which is much smaller than the commonly used threshold of 0.05. This indicates that the correlation is statistically significant. In other words, the relationship between satisfaction and evaluation is unlikely to be due to random chance.
The correlation coefficient of 0.393 means that there is a moderate positive relationship between an employee’s satisfaction level and their last evaluation score. This suggests that employees who are more satisfied tend to have higher evaluation scores. The p-value of < 2.2e-16 tells us that this relationship is statistically significant, meaning it is very unlikely that this correlation is due to random chance.
ggplot(hr, aes(x = satisfaction_level, y = last_evaluation)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Higher Satisfaction Levels are Associated with Better Evaluations",
x = "Satisfaction Level",
y = "Last Evaluation")
## `geom_smooth()` using formula = 'y ~ x'
cor.test(hr$average_montly_hours, hr$time_spend_company)
##
## Pearson's product-moment correlation
##
## data: hr$average_montly_hours and hr$time_spend_company
## t = 15.774, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1119801 0.1434654
## sample estimates:
## cor
## 0.1277549
Correlation coefficient (cor): The correlation coefficient is 0.419, which indicates a moderate positive correlation between the number of hours worked per month (average_montly_hours) and the number of years spent at the company (time_spend_company). This means that as the number of years employees spend at the company increases, the average number of hours they work per month also tends to increase. P-value: The p-value is < 2.2e-16, which is much smaller than 0.05, meaning the correlation is statistically significant. This indicates that the observed relationship between average_montly_hours and time_spend_company is highly unlikely to be due to random
Correlation Interpretation: Employees who have been with the company for a longer period tend to work more hours per month. The relationship is moderately strong, meaning that longer tenure at the company is associated with working more hours. P-value Interpretation: The extremely low p-value tells us that the observed correlation is statistically significant, meaning it is very unlikely to be caused by random chance. We can confidently say that there is a meaningful relationship between the number of years an employee has worked at the company and the number of hours they work.
ggplot(hr, aes(x = average_montly_hours, y = time_spend_company)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Employees Who Spend More Time at the Company Tend to Work More Hours",
x = "Average Monthly Hours",
y = "Time Spent at the Company (Years)")
## `geom_smooth()` using formula = 'y ~ x'
cor.test(hr\(number_project, hr\)average_montly_hours)
Correlation coefficient (cor): The correlation coefficient is 0.684, indicating a strong positive correlation between the number of projects an employee works on (number_project) and the average number of hours worked per month (average_montly_hours). This suggests that as the number of projects increases, the number of hours worked per month also tends to increase.
P-value: The p-value is < 2.2e-16, which is much smaller than 0.05, indicating that the correlation is statistically significant. This means the relationship between the number of projects and hours worked is unlikely to be due to random chance.
Correlation Interpretation: There is a moderate to strong relationship between the number of projects employees work on and the number of hours they work. Employees who work on more projects tend to work more hours each month.
P-value Interpretation: The very small p-value tells us that this relationship is statistically significant. This means we can be confident that the relationship we are seeing is real, and not just due to random chance.
ggplot(hr, aes(x = number_project, y = average_montly_hours)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Employees Who Work on More Projects Tend to Work More Hours",
x = "Number of Projects",
y = "Average Monthly Hours")
## `geom_smooth()` using formula = 'y ~ x'
cor.test(hr$time_spend_company, hr$Work_accident)
##
## Pearson's product-moment correlation
##
## data: hr$time_spend_company and hr$Work_accident
## t = 0.25967, df = 14997, p-value = 0.7951
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.01388386 0.01812361
## sample estimates:
## cor
## 0.002120418
Correlation coefficient (cor): The correlation coefficient is -0.042, which indicates a very weak negative correlation between time_spend_company and Work_accident. In other words, there is a very slight tendency for employees who have been at the company longer to have fewer accidents. However, this is a very weak correlation, and the effect is practically negligible.
P-value: The p-value is 0.019, which is less than 0.05, indicating that the correlation is statistically significant. This suggests that the weak negative relationship between the length of time spent at the company and having a work-related accident is unlikely to be due to random chance.
Correlation Interpretation: There is a very slight negative relationship between the number of years an employee has spent at the company and whether or not they have had a work-related accident. This suggests that employees who have been at the company longer may have a slightly lower chance of having a work accident. However, the relationship is very weak and probably not meaningful in a practical sense.
P-value Interpretation: The p-value of 0.019 suggests that this weak relationship is statistically significant, meaning it is unlikely to have happened by chance. However, the effect is very small and may not be meaningful in real-world applications.
ggplot(hr, aes(x = number_project, y = average_montly_hours)) +
geom_point() + # Scatter plot of the data points
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Employees Who Work on More Projects Tend to Work More Hours",
x = "Number of Projects",
y = "Average Monthly Hours")
## `geom_smooth()` using formula = 'y ~ x'