Measure the relationship between two continuous variables
Data Sources: HR_comma_sep
Employee_Leave_rate <- rename (HR_comma_sep, Occupation = sales)
Employee_Leave_rate1 <- Employee_Leave_rate [, -c(9:10)]
summary(Employee_Leave_rate1)
## satisfaction_level last_evaluation number_project average_montly_hours
## Min. :0.0900 Min. :0.3600 Min. :2.000 Min. : 96.0
## 1st Qu.:0.4400 1st Qu.:0.5600 1st Qu.:3.000 1st Qu.:156.0
## Median :0.6400 Median :0.7200 Median :4.000 Median :200.0
## Mean :0.6128 Mean :0.7161 Mean :3.803 Mean :201.1
## 3rd Qu.:0.8200 3rd Qu.:0.8700 3rd Qu.:5.000 3rd Qu.:245.0
## Max. :1.0000 Max. :1.0000 Max. :7.000 Max. :310.0
## time_spend_company Work_accident left
## Min. : 2.000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 3.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 3.000 Median :0.0000 Median :0.0000
## Mean : 3.498 Mean :0.1446 Mean :0.2381
## 3rd Qu.: 4.000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :10.000 Max. :1.0000 Max. :1.0000
## promotion_last_5years
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.00000
## Mean :0.02127
## 3rd Qu.:0.00000
## Max. :1.00000
Easy glimpse into all the possible correlations

1 Correlations between different variables
Satisfaction_level and last_evaluation.With alpha 0.05:

##
## Pearson's product-moment correlation
##
## data: Employee_Leave_rate1$satisfaction_level and Employee_Leave_rate1$last_evaluation
## t = 12.933, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08916727 0.12082195
## sample estimates:
## cor
## 0.1050212
Explanation:
#1 P value less than 0.05, so the relationship between satisfaction_level and last_evaluation is significan.
#2 Cor is 0.11, The correlation between those two variablesa have a very weak positive liner relationship.
#3 Stricktly speaking,two varibales almost have nothing to do with each other.
2 Satisfaction_level and time_spend_company.With alpha 0.05:

##
## Pearson's product-moment correlation
##
## data: Employee_Leave_rate1$satisfaction_level and Employee_Leave_rate1$time_spend_company
## t = -12.416, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.11668153 -0.08499948
## sample estimates:
## cor
## -0.1008661
Explanation:
#1 P value less than 0.05, so the relationship between satisfaction_level and time_spend_company is significan.
#2 Cor is -0.1, The correlation between those two variablesa have a very weak negative liner relationship.
#3 Two varibales move in oppsite way and have nothing related.
3 Average_monthly_hours and number_project.With alpha = 0.05:

##
## Pearson's product-moment correlation
##
## data: Employee_Leave_rate1$average_montly_hours and Employee_Leave_rate1$number_project
## t = 56.219, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4039037 0.4303411
## sample estimates:
## cor
## 0.4172106
Explanation:
#1 P value less than 0.05, so the relationship between Average_monthly_hours and number_project is significan.
#2 Cor is o.42, The correlation between those two variablesa have a moderate positive liner relationship.
#3 Much more project were done, more average_montly_hours spend in company.
4 Last_evaluation and number_project.With alpha 0.05:

##
## Pearson's product-moment correlation
##
## data: Employee_Leave_rate1$last_evaluation and Employee_Leave_rate1$number_project
## t = 45.656, df = 14997, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3352028 0.3633053
## sample estimates:
## cor
## 0.3493326
Explanation:
#1 P value less that 0.05, The relationship between left and satisfaction_level is significant
#2 Cor 0.35, The correlation between those two variablesa have a moderate positive liner relationship.
#3 The more number_project done, the higher level last_evaluation.
5 Satisfaction_level and salary.With alpha 0.05:
Employee_Leave_rate2 = Employee_Leave_rate %>%
mutate(salary_type = recode(Employee_Leave_rate$salary,"low" = 0, "medium" = 1, "high" = 2))
Employee_Leave_rate3<- Employee_Leave_rate2 [, -c(9:10)]
ggplot(Employee_Leave_rate3)+
geom_point(aes(satisfaction_level, salary_type))+
geom_smooth(aes(satisfaction_level, salary_type), method = 'lm')

cor.test(Employee_Leave_rate3$satisfaction_level, Employee_Leave_rate3$salary_type)
##
## Pearson's product-moment correlation
##
## data: Employee_Leave_rate3$satisfaction_level and Employee_Leave_rate3$salary_type
## t = 6.1335, df = 14997, p-value = 8.81e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.03404593 0.06597347
## sample estimates:
## cor
## 0.05002248
Explanation:
#1 The relationship between satisfaction_level and salary_type is significant.
#2 Cor is 0.2, The correlation between two variablesa are weak positive liner relationship.
#3 Higher salary employees not all with higher satisfaction_level. Lower sarly employees not all with low satisfaction level.