title: “assignment 8” author: “Connor Lewis, Jack Levine” date: “2024-11-05” output: html_document

library(readr)
library(ggplot2)
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(hr)
## spc_tbl_ [14,999 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ satisfaction_level   : num [1:14999] 0.38 0.8 0.11 0.72 0.37 0.41 0.1 0.92 0.89 0.42 ...
##  $ last_evaluation      : num [1:14999] 0.53 0.86 0.88 0.87 0.52 0.5 0.77 0.85 1 0.53 ...
##  $ number_project       : num [1:14999] 2 5 7 5 2 2 6 5 5 2 ...
##  $ average_montly_hours : num [1:14999] 157 262 272 223 159 153 247 259 224 142 ...
##  $ time_spend_company   : num [1:14999] 3 6 4 5 3 3 4 5 5 3 ...
##  $ Work_accident        : num [1:14999] 0 0 0 0 0 0 0 0 0 0 ...
##  $ left                 : num [1:14999] 1 1 1 1 1 1 1 1 1 1 ...
##  $ promotion_last_5years: num [1:14999] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Department           : chr [1:14999] "sales" "sales" "sales" "sales" ...
##  $ salary               : chr [1:14999] "low" "medium" "medium" "low" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   satisfaction_level = col_double(),
##   ..   last_evaluation = col_double(),
##   ..   number_project = col_double(),
##   ..   average_montly_hours = col_double(),
##   ..   time_spend_company = col_double(),
##   ..   Work_accident = col_double(),
##   ..   left = col_double(),
##   ..   promotion_last_5years = col_double(),
##   ..   Department = col_character(),
##   ..   salary = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

Correlation 1: Between satisfaction level and last evaluation

cor1 <- cor(hr$satisfaction_level, hr$last_evaluation)

Correlation 2: Between satisfaction level and average monthly hours

cor2 <- cor(hr$satisfaction_level, hr$average_montly_hours)

Correlation 3: Between satisfaction level and number of projects

cor3 <- cor(hr$satisfaction_level, hr$number_project)

Correlation 4: Between last evaluation and average monthly hours

cor4 <- cor(hr$last_evaluation, hr$average_montly_hours)

Perform the correlation (.5 point) Choose any two appropriate variables from the data and perform the correlation, displaying the results.

Calculate the correlation between satisfaction_level and average_montly_hours

correlation_result <- cor(hr$satisfaction_level, hr$average_montly_hours)

Display the result

correlation_result
## [1] -0.02004811

Interpret the results in technical terms (.5 point) For each correlation, explain what the test’s p-value means (significance).

Perform correlation test between satisfaction_level and average_montly_hours

correlation_test <- cor.test(hr$satisfaction_level, hr$average_montly_hours)

Display the results

correlation_test
## 
##  Pearson's product-moment correlation
## 
## data:  hr$satisfaction_level and hr$average_montly_hours
## t = -2.4556, df = 14997, p-value = 0.01408
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.036040356 -0.004045605
## sample estimates:
##         cor 
## -0.02004811

Correlation Coefficient (r): Indicates the strength and direction of the relationship.

r > 0: Positive correlation (as one variable increases, the other tends to increase).

r < 0: Negative correlation (as one variable increases, the other tends to decrease).

Interpret the results in non-technical terms (1 point) For each correlation, what do the results mean in non-techical terms.

A correlation coefficient of -0.3

A p-value of 0.02

There’s a small negative relationship between work hours and satisfaction, suggesting that working longer hours might slightly lower satisfaction.

This relationship is likely genuine, based on the significance of the p-value.

Create a plot that helps visualize the correlation (.5 point) For each correlation, create a graph to help visualize the realtionship between the two variables. The title must be the non-technical interpretation.

correlation_coefficient <- cor(hr$satisfaction_level, hr$average_montly_hours)

ggplot(hr, aes(x = satisfaction_level, y = average_montly_hours)) +
  geom_point(alpha = 0.5) +
  labs(
    title = "Slight Negative Relationship: More Hours, Slightly Less Satisfaction",
    x = "Satisfaction Level",
    y = "Average Monthly Hours"
  ) +
  annotate("text", x = 0.1, y = 250, label = paste("Correlation:", round(correlation_coefficient, 2)), color = "red") +
  theme_minimal()