R Markdown
library(readr)
library(ggplot2)
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
1. Perform the correlation (.5 point) Choose any two appropriate
variables from the data and perform the correlation, displaying the
results.
2. Interpret the results in technical terms (.5 point) For each
correlation, explain what the test’s p-value means (significance).
3. Interpret the results in non-technical terms (1 point) For each
correlation, what do the results mean in non-techical terms.
4. Create a plot that helps visualize the correlation (.5 point) For
each correlation, create a graph to help visualize the realtionship
between the two variables. The title must be the non-technical
interpretation.
cor.test(hr$time_spend_company , hr$satisfaction_level)
##
## Pearson's product-moment correlation
##
## data: hr$time_spend_company and hr$satisfaction_level
## t = -12.416, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.11668153 -0.08499948
## sample estimates:
## cor
## -0.1008661
The p-value is less than 2.2e-16, which is very small. It is a weak
negative correlation, the cor is -0.1008661.
The result signifys that the effect is very small with a slight
chance one variable increases while the other decreases.
When one variable increases, like the time spent at the company,
there is a correlation that shows it will negatively impact satisfaction
level.
ggplot(hr, aes(x = time_spend_company, y =satisfaction_level)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "purple") +
labs(title = "The more time at the company, less satisfied they are",
x = "Time Spent at Company",
y = "Satisfaction Level")
## `geom_smooth()` using formula = 'y ~ x'

cor.test(hr$average_montly_hours , hr$number_project)
##
## Pearson's product-moment correlation
##
## data: hr$average_montly_hours and hr$number_project
## t = 56.219, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4039037 0.4303411
## sample estimates:
## cor
## 0.4172106
The P value is < 2.2e-16 with a correlation of 0.4172106 which is
very relatively small, so there is a moderate positive correlation
between avgerage_montly_hours and number of projects
The more hours worked on average by month correlates to a higher
number of projects.
ggplot(hr, aes(x = number_project, y =average_montly_hours)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "The more projects, the higher the monthly hours",
x = "Number of Projects",
y = "Avg Monthly Hours")
## `geom_smooth()` using formula = 'y ~ x'

cor.test(hr$satisfaction_level , hr$last_evaluation)
##
## Pearson's product-moment correlation
##
## data: hr$satisfaction_level and hr$last_evaluation
## t = 12.933, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08916727 0.12082195
## sample estimates:
## cor
## 0.1050212
The P value is < 2.2e-16 and the correlation is 0.1050212 which
is very small so the correlation between the Satisfaction_level and
Last_evalution is low and positive
The p value is = 6.279e-13 and the correlation value is 0.05869724
which is extremely small so the correlation between Work_accident and
Last_evalution is very weak and positive
The more Work accidents an employee has leads to a slight increase
in their satisfaction
ggplot(hr, aes(x = Work_accident, y =satisfaction_level)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "brown") +
labs(title = "The more Work accidents, the higher the satisfaction",
x = "Number of Work Accidents",
y = "Satisfaction Score")
## `geom_smooth()` using formula = 'y ~ x'
