Assignment

library(readr)
library(ggplot2)

hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')

## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Satisfaction Level vs. Average Monthly Hours

Perform the correlation.

correlation <- cor.test(hr$satisfaction_level , hr$average_montly_hours)
print(correlation)

## 
##  Pearson's product-moment correlation
## 
## data:  hr$satisfaction_level and hr$average_montly_hours
## t = -2.4556, df = 14997, p-value = 0.01408
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.036040356 -0.004045605
## sample estimates:
##         cor 
## -0.02004811

Explain what the test’s p-value means.

The p-value = 0.01408 tells you that the finding is not due to chance, but the magnitude of the relationship (correlation = -0.02004811) is so weak that it is practically meaningless. In this case, the low correlation suggests that there is no meaningful relationship between the variables, despite the result being statistically significant.

What do the results mean in non-techical terms.

The number of hours worked doesn’t really seem to affect satisfaction in any meaningful way, even though the statistical test tells us there’s a very slight, almost negligible negative relationship.

Create a graph to help visualize the realtionship between the two variables

ggplot(hr, aes(x = average_montly_hours, y = satisfaction_level)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Minimal Negative Impact of Hours Worked on Employee Satisfaction",
       x = "Average Monthly Hours",
       y = "Satisfaction Level")

## `geom_smooth()` using formula = 'y ~ x'

Last Evaluation vs Time Spend Company

Perform the correlation

cor2 <- cor.test(hr$last_evaluation , hr$time_spend_company)
print(cor2)

## 
##  Pearson's product-moment correlation
## 
## data:  hr$last_evaluation and hr$time_spend_company
## t = 16.256, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1158309 0.1472844
## sample estimates:
##       cor 
## 0.1315907

Explain what the test’s p-value means.

The p-value of 6.429866e-59 is an extremely small number. This means the correlation between time_spend_company and last_evaluation is statistically significant, and there is an extremely low probability that this correlation is due to random chance.

What do the results mean in non-techical terms.

Employees who have worked longer at the company tend to get slightly better evaluation ratings, but the relationship is not strong.

Create a graph to help visualize the realtionship between the two variables

ggplot(hr, aes(x = time_spend_company, y = last_evaluation)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "More Time Spent at Company May Relate to Higher Last Evaluations ",
       x = "Time Spent at Company",
       y = "Last Evaluation")

## `geom_smooth()` using formula = 'y ~ x'

Satisfaction Level vs Number of Projects

Perform the Correlation

cor3 <- cor.test(hr$satisfaction_level , hr$number_project)
print(cor3)

## 
##  Pearson's product-moment correlation
## 
## data:  hr$satisfaction_level and hr$number_project
## t = -17.69, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1586105 -0.1272570
## sample estimates:
##        cor 
## -0.1429696

Explain what the test’s p-value means.

A p-value of 2.526836e-69 indicates statistical significance, because the p-value is much smaller than 0.05, we can confidently say that the correlation between satisfaction_level and number_project is real and not just a result of randomness in the data.

What do the results mean in non-techical terms.

The result suggests that employees who have more projects tend to have slightly lower satisfaction, but this correlation is weak. Since the correlation is weak, it’s not a major factor in determining satisfaction levels, and other factors likely play a more important role. The statistical significance indicates that the weak relationship is likely real, but in practical terms, it doesn’t provide much value for decision-making.

Create a graph to help visualize the realtionship between the two variables

ggplot(hr, aes(x = number_project , y = satisfaction_level)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "More Projects May Relate to Lower Employee Satisfaction",
       x = "Number of Projects",
       y = "Satisfaction Level")

## `geom_smooth()` using formula = 'y ~ x'

Satisfaction Level vs Time Spend Company

Perform the Correlation

cor4 <- cor.test(hr$satisfaction_level , hr$time_spend_company)
print(cor4)

## 
##  Pearson's product-moment correlation
## 
## data:  hr$satisfaction_level and hr$time_spend_company
## t = -12.416, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.11668153 -0.08499948
## sample estimates:
##        cor 
## -0.1008661

Explain what the test’s p-value means.

A p-value of 3.203473e-35 indicates that the relationship between satisfaction_level and time_spend_company is statistically significant. This means the probability that the observed correlation is due to random chance is almost zero.

What do the results mean in non-techical terms.

The weak negative correlation suggests that there is a slight tendency for satisfaction to decrease as employees stay longer at the company, but this effect is weak and may not be meaningful in real-world decision-making.

Create a graph to help visualize the realtionship between the two variables

ggplot(hr, aes(x = time_spend_company, y = satisfaction_level)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "More Time Spent at Company May Relate to Lower Satisfaction",
       x = "Time Spent at Company",
       y = "Satisfaction Level")

## `geom_smooth()` using formula = 'y ~ x'

Assignment_7

2024-10-28