library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

1. Perform the correlation (.5 point) Choose any two appropriate variables from the data and perform the correlation, displaying the results.

2. Interpret the results in technical terms (.5 point) For each correlation, explain what the test’s p-value means (significance).

3. Interpret the results in non-technical terms (1 point) For each correlation, what do the results mean in non-techical terms.

4. Create a plot that helps visualize the correlation (.5 point) For each correlation, create a graph to help visualize the realtionship between the two variables. The title must be the non-technical interpretation.

cor_test_result1 <- cor.test(hr$satisfaction_level, hr$last_evaluation)
ggplot(hr, aes(x = satisfaction_level, y = last_evaluation)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  ggtitle("Higher satisfaction levels are associated with higher evaluation scores") +
  xlab("Satisfaction Level") +
  ylab("Last Evaluation Score")
## `geom_smooth()` using formula = 'y ~ x'

cat("Correlation Coefficient:", cor_test_result1$estimate, "\n")
## Correlation Coefficient: 0.1050212
cat("p-value:", cor_test_result1$p.value, "\n")
## p-value: 4.704312e-38
- This p-value is far below the common significance threshold of 0.05, which indicates that the correlation between the two variables (in this case, likely last_evaluation and satisfaction_level) is highly statistically significant. This means we can reject the null hypothesis with a high degree of confidence.
- A p-value of 0 or nearly 0 indicates overwhelming evidence against the null hypothesis (the idea that there is no relationship between the two variables). This strong significance suggests that the observed correlation is highly unlikely to be due to random variation alone.
- This analysis suggests that employees who take on more projects generally perform better in their evaluations. The relationship is strong enough that it’s very unlikely this is just a coincidence, indicating a meaningful connection between the amount of work (in terms of projects) and employee performance ratings. This might encourage organizations to consider how project assignments could influence employee performance.
cor_test_result3 <- cor.test(hr$satisfaction_level, hr$number_project)
ggplot(hr, aes(x = number_project, y = satisfaction_level)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  ggtitle("Relationship between Satisfaction Level and Number of Projects") +
  xlab("Number of Projects") +
  ylab("Satisfaction Level")
## `geom_smooth()` using formula = 'y ~ x'

cat("Correlation Coefficient:", cor_test_result3$estimate, "\n")
## Correlation Coefficient: -0.1429696
cat("p-value:", cor_test_result3$p.value, "\n")
## p-value: 2.526836e-69
- This p-value is far below the conventional significance level of 0.05, which indicates that the correlation between satisfaction_level and number_project is highly statistically significant. You can confidently reject the null hypothesis, which states that there is no relationship between the two variables.
- The results suggest that there is a strong and significant relationship between how many projects employees work on and their satisfaction at work. Generally, it can be interpreted that employees who take on more projects are likely to feel more satisfied with their jobs. This insight could be useful for management in understanding employee engagement and satisfaction in relation to project workload.
cor_test_result4 <- cor.test(hr$time_spend_company, hr$satisfaction_level)
ggplot(hr, aes(x = time_spend_company, y = satisfaction_level)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  ggtitle("Relationship between Time Spent at Company and Satisfaction Level") +
  xlab("Time Spent at Company (years)") +
  ylab("Satisfaction Level")
## `geom_smooth()` using formula = 'y ~ x'

cat("Correlation Coefficient:", cor_test_result4$estimate, "\n")
## Correlation Coefficient: -0.1008661
cat("p-value:", cor_test_result4$p.value, "\n")
## p-value: 3.203473e-35
- This p-value is far below the conventional threshold of 0.05, suggesting that the correlation between time_spend_company and satisfaction_level is highly statistically significant. You can confidently reject the null hypothesis, which posits that there is no relationship between the two variables.
- The analysis reveals that spending more time at the company is associated with higher job satisfaction. The relationship is statistically significant, suggesting this trend is reliable and worth considering when thinking about employee retention and satisfaction strategies.