library(readr)
library(ggplot2)
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#1Perform the correlation (.5 point) Choose any two appropriate variables from the data and perform the correlation, displaying the results.
#2Interpret the results in technical terms (.5 point) For each correlation, explain what the test’s p-value means (significance).
#3Interpret the results in non-technical terms (1 point) For each correlation, what do the results mean in non-techical terms.
#4Create a plot that helps visualize the correlation (.5 point) For each correlation, create a graph to help visualize the realtionship between the two variables. The title must be the non-technical interpretation.
1. Avg. Monthly Hrs. vs. Satisfaction Level
cor.test(hr$average_montly_hours, hr$satisfaction_level)
##
## Pearson's product-moment correlation
##
## data: hr$average_montly_hours and hr$satisfaction_level
## t = -2.4556, df = 14997, p-value = 0.01408
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.036040356 -0.004045605
## sample estimates:
## cor
## -0.02004811
The current p-value is pretty small, making the correlation between satisfaction level and avg. monthly hrs not significant./ I can not make any assumptions as the relationship is not a notable one. The number of avg. monthly hrs worked is has no true affect on satisfaction level.
ggplot(hr, aes(x = average_montly_hours, y = satisfaction_level)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Satisfaction level is not affected by the number of average monthly hours worked",
x = "Average Monthly Hours",
y = "Satisfaction Level")
## `geom_smooth()` using formula = 'y ~ x'
#2 Average Monthly Hours vs. # of Projects
cor.test(hr$number_project, hr$average_montly_hours)
##
## Pearson's product-moment correlation
##
## data: hr$number_project and hr$average_montly_hours
## t = 56.219, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4039037 0.4303411
## sample estimates:
## cor
## 0.4172106
If an employee has more projects, they will work more hours.
The correlation of (0.4) is moderately strong, and positive
Once again, the P-Value is super small, meaning there is a correlation between these two variables, average monthly hours worked and # of projects
ggplot(hr, aes(x = number_project, y = average_montly_hours)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "orange") +
labs(title = "More projects typically mean employees work longer average monthly hours",
x = "Number of Projects",
y = "Average Monthly Hours")
## `geom_smooth()` using formula = 'y ~ x'
#3- Last Evaluation and Satisfaction lvl
cor.test(hr$last_evaluation, hr$satisfaction_level)
##
## Pearson's product-moment correlation
##
## data: hr$last_evaluation and hr$satisfaction_level
## t = 12.933, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08916727 0.12082195
## sample estimates:
## cor
## 0.1050212
The P-Value is again very small. As the evaluation scores increase in value so do the employee satisfaction levels. The correlation is positive and weak
ggplot(hr, aes(x = last_evaluation, y = satisfaction_level)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = 'green') +
labs(title = "Better last evaluation scores increase employee satisfaction levels",
x = "Last Evaluation",
y = "Satisfaction Level")
## `geom_smooth()` using formula = 'y ~ x'
#4 # of Projects and Time Spent at Company
cor.test(hr$number_project, hr$time_spend_company)
##
## Pearson's product-moment correlation
##
## data: hr$number_project and hr$time_spend_company
## t = 24.579, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1813532 0.2121217
## sample estimates:
## cor
## 0.1967859
The P-Value is super small, meaning there is a correlation between number of projects being done by an employee and the time they have spent at the company
Correlation is (0.2), which is positive and weak
Employees that have to do less projects, spend more time at the company
ggplot(hr, aes(x = number_project, y = time_spend_company)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "black") +
labs(title = "Employees with fewer projects tend to stay at company slightly longer",
x = "Number of Projects",
y = "Time Spend at Company")
## `geom_smooth()` using formula = 'y ~ x'