The greatest assets of any company are the people that make up an organization. This project aims to analyze a given set of data that is comprised of 1,471 rows and 35 columns. The 1,471 rows each indicate an individual employee and the 35 columns are different variables that describe the employee. The data reflects how a Human Resources department might use this data to understand their employees more thoroughly. We in particular will be analyzing different variables that relate to employee attrition, meaning variables that affect whether or not an employee stays with a company or leaves. This information is highly valuable to organizations because high employee turnover can lead to a loss of time, resources, and money and can also indicate problems within individual organizations. In particular, we will look at employee income, age, and the number of companies they have worked for in the past. Using these variables can help any organization understand what causes employees to leave or better yet, stay with their organization.
To complete this project, we will use R Studio, Watson Analytics, Tableau, and Excel to analyze the given data set. In particular, in R studio we will be using correlation tables, descriptive analytics that identify maximum and minimum values, predictive analytics, and linear regression models that identify R squared and adjusted R squared including quadratic regression. In Watson Analytics, we will be using predictive analytics including decision trees, word clouds, and target charts. In Tableau we will graphically show the factors we have analyzed and provide a visual of the results. The data set presented was in Excel and we will be using Excel to clean the data by changing variables such as attrition, which is nonnumeric, to numeric variables that can be used in R Studio and Watson Analytics.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.4
## ✔ tidyr 0.7.2 ✔ stringr 1.2.0
## ✔ readr 1.1.1 ✔ forcats 0.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(corrplot)
## corrplot 0.84 loaded
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
There is no formal description of the data set given. The data set looks at a group of 1,471 employees and 35 variables that are unique to that employee. These variables include age, attrition, travel, department, distance from home, education, gender, monthly income, number of companies worked for, job involvement, jobs satisfaction, marital status, work/life balance, percentage salary hike and a number of other variables.
After reviewing the data, we believe that the most important factors that lead to employee attrition are income level, age of the employee, and the number of companies that employees have worked for. We will use tools such as R studio, and Watson Analytics to determine what the strongest factors of employee attrition are and compare those to the ones hypothesized. We will also analyze any trends in the data that impact employee attrition and whether or not these variables are related to each other.
The dataset for this project is composed of data regrarding employee attrition and general data collected by HR, including job satisfaction, marital status, education field, and total working years just to name a few. These are all factors that we believe could impact attrition/turnover rates. It cannot be inferred that the all the data was collected at the same company, however it does appear that all the data was collected from employees who work in the life sciences or medical industry.
Load dataset
mydata = read.csv(file = "data/employee_attrition.csv")
head(mydata)
## Age Attrition BusinessTravel DailyRate Department
## 1 41 Yes Travel_Rarely 1102 Sales
## 2 49 No Travel_Frequently 279 Research & Development
## 3 37 Yes Travel_Rarely 1373 Research & Development
## 4 33 No Travel_Frequently 1392 Research & Development
## 5 27 No Travel_Rarely 591 Research & Development
## 6 32 No Travel_Frequently 1005 Research & Development
## DistanceFromHome Education EducationField EmployeeCount EmployeeNumber
## 1 1 2 Life Sciences 1 1
## 2 8 1 Life Sciences 1 2
## 3 2 2 Other 1 4
## 4 3 4 Life Sciences 1 5
## 5 2 1 Medical 1 7
## 6 2 2 Life Sciences 1 8
## EnvironmentSatisfaction Gender HourlyRate JobInvolvement JobLevel
## 1 2 Female 94 3 2
## 2 3 Male 61 2 2
## 3 4 Male 92 2 1
## 4 4 Female 56 3 1
## 5 1 Male 40 3 1
## 6 4 Male 79 3 1
## JobRole JobSatisfaction MaritalStatus MonthlyIncome
## 1 Sales Executive 4 Single 5993
## 2 Research Scientist 2 Married 5130
## 3 Laboratory Technician 3 Single 2090
## 4 Research Scientist 3 Married 2909
## 5 Laboratory Technician 2 Married 3468
## 6 Laboratory Technician 4 Single 3068
## MonthlyRate NumCompaniesWorked Over18 OverTime PercentSalaryHike
## 1 19479 8 Y Yes 11
## 2 24907 1 Y No 23
## 3 2396 6 Y Yes 15
## 4 23159 1 Y Yes 11
## 5 16632 9 Y No 12
## 6 11864 0 Y No 13
## PerformanceRating RelationshipSatisfaction StandardHours
## 1 3 1 80
## 2 4 4 80
## 3 3 2 80
## 4 3 3 80
## 5 3 4 80
## 6 3 3 80
## StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance
## 1 0 8 0 1
## 2 1 10 3 3
## 3 0 7 3 3
## 4 0 8 3 3
## 5 1 6 3 3
## 6 0 8 2 2
## YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion
## 1 6 4 0
## 2 10 7 1
## 3 0 0 0
## 4 8 7 3
## 5 2 2 2
## 6 7 7 3
## YearsWithCurrManager
## 1 5
## 2 7
## 3 0
## 4 0
## 5 2
## 6 6
Summary
summary(mydata)
## Age Attrition BusinessTravel DailyRate
## Min. :18.00 No :1233 Non-Travel : 150 Min. : 102.0
## 1st Qu.:30.00 Yes: 237 Travel_Frequently: 277 1st Qu.: 465.0
## Median :36.00 Travel_Rarely :1043 Median : 802.0
## Mean :36.92 Mean : 802.5
## 3rd Qu.:43.00 3rd Qu.:1157.0
## Max. :60.00 Max. :1499.0
##
## Department DistanceFromHome Education
## Human Resources : 63 Min. : 1.000 Min. :1.000
## Research & Development:961 1st Qu.: 2.000 1st Qu.:2.000
## Sales :446 Median : 7.000 Median :3.000
## Mean : 9.193 Mean :2.913
## 3rd Qu.:14.000 3rd Qu.:4.000
## Max. :29.000 Max. :5.000
##
## EducationField EmployeeCount EmployeeNumber
## Human Resources : 27 Min. :1 Min. : 1.0
## Life Sciences :606 1st Qu.:1 1st Qu.: 491.2
## Marketing :159 Median :1 Median :1020.5
## Medical :464 Mean :1 Mean :1024.9
## Other : 82 3rd Qu.:1 3rd Qu.:1555.8
## Technical Degree:132 Max. :1 Max. :2068.0
##
## EnvironmentSatisfaction Gender HourlyRate JobInvolvement
## Min. :1.000 Female:588 Min. : 30.00 Min. :1.00
## 1st Qu.:2.000 Male :882 1st Qu.: 48.00 1st Qu.:2.00
## Median :3.000 Median : 66.00 Median :3.00
## Mean :2.722 Mean : 65.89 Mean :2.73
## 3rd Qu.:4.000 3rd Qu.: 83.75 3rd Qu.:3.00
## Max. :4.000 Max. :100.00 Max. :4.00
##
## JobLevel JobRole JobSatisfaction
## Min. :1.000 Sales Executive :326 Min. :1.000
## 1st Qu.:1.000 Research Scientist :292 1st Qu.:2.000
## Median :2.000 Laboratory Technician :259 Median :3.000
## Mean :2.064 Manufacturing Director :145 Mean :2.729
## 3rd Qu.:3.000 Healthcare Representative:131 3rd Qu.:4.000
## Max. :5.000 Manager :102 Max. :4.000
## (Other) :215
## MaritalStatus MonthlyIncome MonthlyRate NumCompaniesWorked
## Divorced:327 Min. : 1009 Min. : 2094 Min. :0.000
## Married :673 1st Qu.: 2911 1st Qu.: 8047 1st Qu.:1.000
## Single :470 Median : 4919 Median :14236 Median :2.000
## Mean : 6503 Mean :14313 Mean :2.693
## 3rd Qu.: 8379 3rd Qu.:20462 3rd Qu.:4.000
## Max. :19999 Max. :26999 Max. :9.000
##
## Over18 OverTime PercentSalaryHike PerformanceRating
## Y:1470 No :1054 Min. :11.00 Min. :3.000
## Yes: 416 1st Qu.:12.00 1st Qu.:3.000
## Median :14.00 Median :3.000
## Mean :15.21 Mean :3.154
## 3rd Qu.:18.00 3rd Qu.:3.000
## Max. :25.00 Max. :4.000
##
## RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears
## Min. :1.000 Min. :80 Min. :0.0000 Min. : 0.00
## 1st Qu.:2.000 1st Qu.:80 1st Qu.:0.0000 1st Qu.: 6.00
## Median :3.000 Median :80 Median :1.0000 Median :10.00
## Mean :2.712 Mean :80 Mean :0.7939 Mean :11.28
## 3rd Qu.:4.000 3rd Qu.:80 3rd Qu.:1.0000 3rd Qu.:15.00
## Max. :4.000 Max. :80 Max. :3.0000 Max. :40.00
##
## TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole
## Min. :0.000 Min. :1.000 Min. : 0.000 Min. : 0.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 3.000 1st Qu.: 2.000
## Median :3.000 Median :3.000 Median : 5.000 Median : 3.000
## Mean :2.799 Mean :2.761 Mean : 7.008 Mean : 4.229
## 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.: 9.000 3rd Qu.: 7.000
## Max. :6.000 Max. :4.000 Max. :40.000 Max. :18.000
##
## YearsSinceLastPromotion YearsWithCurrManager
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 2.000
## Median : 1.000 Median : 3.000
## Mean : 2.188 Mean : 4.123
## 3rd Qu.: 3.000 3rd Qu.: 7.000
## Max. :15.000 Max. :17.000
##
Extracting Variables
title = mydata
age = mydata$Age
attrition = mydata$Attrition
monthlyincome = mydata$MonthlyIncome
numcompaniesworked = mydata$NumCompaniesWorked
The original dataset is complete and cleaned. There are no variables missing and there are no undesired special characters. In terms of attrition, 1233 employees responded “No” while 237 responded “Yes”. The age of employees ranges from 18 years to 60 years, with the mean age of employees being 36.92 years. Montly income varies from $1009 to $19999, with the mean being $6503. In terms of the number of companies worked for, this set ranges from zero companies, indicating their current company is the only company they have worked for, to nine companies.
Although this dataset is already cleaned and there are no missing variables or unwanted characters, in order for use to be able to analyze attrition we need to change the “yes” and “no” responses to numeric values. In doing this, we assigned “1” to all of the “yes” responses and “0” to all of the “no” responses. In doing this, we will now be able to compare attrition with other numeric variables in the dataset.
We used the replace function in Excel to change the “yes” and “no” responses for employee attrition to numeric values.
mydata = read.csv(file = "data/clean_employee_attrition.csv")
head(mydata)
## Age Attrition BusinessTravel DailyRate Department
## 1 41 1 Travel_Rarely 1102 Sales
## 2 49 0 Travel_Frequently 279 Research & Development
## 3 37 1 Travel_Rarely 1373 Research & Development
## 4 33 0 Travel_Frequently 1392 Research & Development
## 5 27 0 Travel_Rarely 591 Research & Development
## 6 32 0 Travel_Frequently 1005 Research & Development
## DistanceFromHome Education EducationField EmployeeCount EmployeeNumber
## 1 1 2 Life Sciences 1 1
## 2 8 1 Life Sciences 1 2
## 3 2 2 Other 1 4
## 4 3 4 Life Sciences 1 5
## 5 2 1 Medical 1 7
## 6 2 2 Life Sciences 1 8
## EnvironmentSatisfaction Gender HourlyRate JobInvolvement JobLevel
## 1 2 Female 94 3 2
## 2 3 Male 61 2 2
## 3 4 Male 92 2 1
## 4 4 Female 56 3 1
## 5 1 Male 40 3 1
## 6 4 Male 79 3 1
## JobRole JobSatisfaction MaritalStatus MonthlyIncome
## 1 Sales Executive 4 Single 5993
## 2 Research Scientist 2 Married 5130
## 3 Laboratory Technician 3 Single 2090
## 4 Research Scientist 3 Married 2909
## 5 Laboratory Technician 2 Married 3468
## 6 Laboratory Technician 4 Single 3068
## MonthlyRate NumCompaniesWorked Over18 OverTime PercentSalaryHike
## 1 19479 8 Y Yes 11
## 2 24907 1 Y No 23
## 3 2396 6 Y Yes 15
## 4 23159 1 Y Yes 11
## 5 16632 9 Y No 12
## 6 11864 0 Y No 13
## PerformanceRating RelationshipSatisfaction StandardHours
## 1 3 1 80
## 2 4 4 80
## 3 3 2 80
## 4 3 3 80
## 5 3 4 80
## 6 3 3 80
## StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance
## 1 0 8 0 1
## 2 1 10 3 3
## 3 0 7 3 3
## 4 0 8 3 3
## 5 1 6 3 3
## 6 0 8 2 2
## YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion
## 1 6 4 0
## 2 10 7 1
## 3 0 0 0
## 4 8 7 3
## 5 2 2 2
## 6 7 7 3
## YearsWithCurrManager
## 1 5
## 2 7
## 3 0
## 4 0
## 5 2
## 6 6
Extracting Variables
title = mydata
age = mydata$Age
attrition = mydata$Attrition
monthlyincome = mydata$MonthlyIncome
numcompaniesworked = mydata$NumCompaniesWorked
summary(mydata)
## Age Attrition BusinessTravel
## Min. :18.00 Min. :0.0000 Non-Travel : 150
## 1st Qu.:30.00 1st Qu.:0.0000 Travel_Frequently: 277
## Median :36.00 Median :0.0000 Travel_Rarely :1043
## Mean :36.92 Mean :0.1612
## 3rd Qu.:43.00 3rd Qu.:0.0000
## Max. :60.00 Max. :1.0000
##
## DailyRate Department DistanceFromHome
## Min. : 102.0 Human Resources : 63 Min. : 1.000
## 1st Qu.: 465.0 Research & Development:961 1st Qu.: 2.000
## Median : 802.0 Sales :446 Median : 7.000
## Mean : 802.5 Mean : 9.193
## 3rd Qu.:1157.0 3rd Qu.:14.000
## Max. :1499.0 Max. :29.000
##
## Education EducationField EmployeeCount EmployeeNumber
## Min. :1.000 Human Resources : 27 Min. :1 Min. : 1.0
## 1st Qu.:2.000 Life Sciences :606 1st Qu.:1 1st Qu.: 491.2
## Median :3.000 Marketing :159 Median :1 Median :1020.5
## Mean :2.913 Medical :464 Mean :1 Mean :1024.9
## 3rd Qu.:4.000 Other : 82 3rd Qu.:1 3rd Qu.:1555.8
## Max. :5.000 Technical Degree:132 Max. :1 Max. :2068.0
##
## EnvironmentSatisfaction Gender HourlyRate JobInvolvement
## Min. :1.000 Female:588 Min. : 30.00 Min. :1.00
## 1st Qu.:2.000 Male :882 1st Qu.: 48.00 1st Qu.:2.00
## Median :3.000 Median : 66.00 Median :3.00
## Mean :2.722 Mean : 65.89 Mean :2.73
## 3rd Qu.:4.000 3rd Qu.: 83.75 3rd Qu.:3.00
## Max. :4.000 Max. :100.00 Max. :4.00
##
## JobLevel JobRole JobSatisfaction
## Min. :1.000 Sales Executive :326 Min. :1.000
## 1st Qu.:1.000 Research Scientist :292 1st Qu.:2.000
## Median :2.000 Laboratory Technician :259 Median :3.000
## Mean :2.064 Manufacturing Director :145 Mean :2.729
## 3rd Qu.:3.000 Healthcare Representative:131 3rd Qu.:4.000
## Max. :5.000 Manager :102 Max. :4.000
## (Other) :215
## MaritalStatus MonthlyIncome MonthlyRate NumCompaniesWorked
## Divorced:327 Min. : 1009 Min. : 2094 Min. :0.000
## Married :673 1st Qu.: 2911 1st Qu.: 8047 1st Qu.:1.000
## Single :470 Median : 4919 Median :14236 Median :2.000
## Mean : 6503 Mean :14313 Mean :2.693
## 3rd Qu.: 8379 3rd Qu.:20462 3rd Qu.:4.000
## Max. :19999 Max. :26999 Max. :9.000
##
## Over18 OverTime PercentSalaryHike PerformanceRating
## Y:1470 No :1054 Min. :11.00 Min. :3.000
## Yes: 416 1st Qu.:12.00 1st Qu.:3.000
## Median :14.00 Median :3.000
## Mean :15.21 Mean :3.154
## 3rd Qu.:18.00 3rd Qu.:3.000
## Max. :25.00 Max. :4.000
##
## RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears
## Min. :1.000 Min. :80 Min. :0.0000 Min. : 0.00
## 1st Qu.:2.000 1st Qu.:80 1st Qu.:0.0000 1st Qu.: 6.00
## Median :3.000 Median :80 Median :1.0000 Median :10.00
## Mean :2.712 Mean :80 Mean :0.7939 Mean :11.28
## 3rd Qu.:4.000 3rd Qu.:80 3rd Qu.:1.0000 3rd Qu.:15.00
## Max. :4.000 Max. :80 Max. :3.0000 Max. :40.00
##
## TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole
## Min. :0.000 Min. :1.000 Min. : 0.000 Min. : 0.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 3.000 1st Qu.: 2.000
## Median :3.000 Median :3.000 Median : 5.000 Median : 3.000
## Mean :2.799 Mean :2.761 Mean : 7.008 Mean : 4.229
## 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.: 9.000 3rd Qu.: 7.000
## Max. :6.000 Max. :4.000 Max. :40.000 Max. :18.000
##
## YearsSinceLastPromotion YearsWithCurrManager
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 2.000
## Median : 1.000 Median : 3.000
## Mean : 2.188 Mean : 4.123
## 3rd Qu.: 3.000 3rd Qu.: 7.000
## Max. :15.000 Max. :17.000
##
The minimum age in this dataset is 18 and the maximum age is 60. The minimum monthly income was 1009 and the maximum was 19999. The minimum number of companies worked for is 0 while the maximum is 9.
Age
min(age)
## [1] 18
max(age)
## [1] 60
Monthly Income
min(monthlyincome)
## [1] 1009
max(monthlyincome)
## [1] 19999
Number Companies Worked
min(numcompaniesworked)
## [1] 0
max(numcompaniesworked)
## [1] 9
Age and Monthly Income
data_corr = cor(age, monthlyincome)
data_corr
## [1] 0.4978546
Age and Number Companies Worked
data_corr = cor(age, numcompaniesworked)
data_corr
## [1] 0.2996348
Number Comapnies Worked and Monthly Income
data_corr = cor(numcompaniesworked, monthlyincome)
data_corr
## [1] 0.1495152
corr = cor(mydata[c(1, 2, 4, 6, 7, 10, 11, 13, 14, 15, 17, 19, 20, 21, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34)])
corr
## Age Attrition DailyRate
## Age 1.000000000 -0.159205007 0.0106609426
## Attrition -0.159205007 1.000000000 -0.0566519919
## DailyRate 0.010660943 -0.056651992 1.0000000000
## DistanceFromHome -0.001686120 0.077923583 -0.0049853374
## Education 0.208033731 -0.031372820 -0.0168064332
## EmployeeNumber -0.010145467 -0.010577243 -0.0509904337
## EnvironmentSatisfaction 0.010146428 -0.103368978 0.0183548543
## HourlyRate 0.024286543 -0.006845550 0.0233814215
## JobInvolvement 0.029819959 -0.130015957 0.0461348740
## JobLevel 0.509604228 -0.169104751 0.0029663349
## JobSatisfaction -0.004891877 -0.103481126 0.0305710078
## MonthlyIncome 0.497854567 -0.159839582 0.0077070589
## MonthlyRate 0.028051167 0.015170213 -0.0321816015
## NumCompaniesWorked 0.299634758 0.043493739 0.0381534343
## PercentSalaryHike 0.003633585 -0.013478202 0.0227036775
## PerformanceRating 0.001903896 0.002888752 0.0004732963
## RelationshipSatisfaction 0.053534720 -0.045872279 0.0078460310
## StockOptionLevel 0.037509712 -0.137144919 0.0421427964
## TotalWorkingYears 0.680380536 -0.171063246 0.0145147387
## TrainingTimesLastYear -0.019620819 -0.059477799 0.0024525427
## WorkLifeBalance -0.021490028 -0.063939047 -0.0378480510
## YearsAtCompany 0.311308770 -0.134392214 -0.0340547676
## YearsInCurrentRole 0.212901056 -0.160545004 0.0099320150
## YearsSinceLastPromotion 0.216513368 -0.033018775 -0.0332289848
## DistanceFromHome Education EmployeeNumber
## Age -0.001686120 0.208033731 -0.010145467
## Attrition 0.077923583 -0.031372820 -0.010577243
## DailyRate -0.004985337 -0.016806433 -0.050990434
## DistanceFromHome 1.000000000 0.021041826 0.032916407
## Education 0.021041826 1.000000000 0.042070093
## EmployeeNumber 0.032916407 0.042070093 1.000000000
## EnvironmentSatisfaction -0.016075327 -0.027128313 0.017620802
## HourlyRate 0.031130586 0.016774829 0.035179212
## JobInvolvement 0.008783280 0.042437634 -0.006887923
## JobLevel 0.005302731 0.101588886 -0.018519194
## JobSatisfaction -0.003668839 -0.011296117 -0.046246735
## MonthlyIncome -0.017014445 0.094960677 -0.014828516
## MonthlyRate 0.027472864 -0.026084197 0.012648229
## NumCompaniesWorked -0.029250804 0.126316560 -0.001251032
## PercentSalaryHike 0.040235377 -0.011110941 -0.012943996
## PerformanceRating 0.027109618 -0.024538791 -0.020358825
## RelationshipSatisfaction 0.006557475 -0.009118377 -0.069861411
## StockOptionLevel 0.044871999 0.018422220 0.062226693
## TotalWorkingYears 0.004628426 0.148279697 -0.014365198
## TrainingTimesLastYear -0.036942234 -0.025100241 0.023603170
## WorkLifeBalance -0.026556004 0.009819189 0.010308641
## YearsAtCompany 0.009507720 0.069113696 -0.011240464
## YearsInCurrentRole 0.018844999 0.060235554 -0.008416312
## YearsSinceLastPromotion 0.010028836 0.054254334 -0.009019064
## EnvironmentSatisfaction HourlyRate
## Age 0.010146428 0.024286543
## Attrition -0.103368978 -0.006845550
## DailyRate 0.018354854 0.023381422
## DistanceFromHome -0.016075327 0.031130586
## Education -0.027128313 0.016774829
## EmployeeNumber 0.017620802 0.035179212
## EnvironmentSatisfaction 1.000000000 -0.049856956
## HourlyRate -0.049856956 1.000000000
## JobInvolvement -0.008277598 0.042860641
## JobLevel 0.001211699 -0.027853486
## JobSatisfaction -0.006784353 -0.071334624
## MonthlyIncome -0.006259088 -0.015794304
## MonthlyRate 0.037599623 -0.015296750
## NumCompaniesWorked 0.012594323 0.022156883
## PercentSalaryHike -0.031701195 -0.009061986
## PerformanceRating -0.029547952 -0.002171697
## RelationshipSatisfaction 0.007665384 0.001330453
## StockOptionLevel 0.003432158 0.050263399
## TotalWorkingYears -0.002693070 -0.002333682
## TrainingTimesLastYear -0.019359308 -0.008547685
## WorkLifeBalance 0.027627295 -0.004607234
## YearsAtCompany 0.001457549 -0.019581616
## YearsInCurrentRole 0.018007460 -0.024106220
## YearsSinceLastPromotion 0.016193606 -0.026715586
## JobInvolvement JobLevel JobSatisfaction
## Age 0.029819959 0.509604228 -0.0048918771
## Attrition -0.130015957 -0.169104751 -0.1034811261
## DailyRate 0.046134874 0.002966335 0.0305710078
## DistanceFromHome 0.008783280 0.005302731 -0.0036688392
## Education 0.042437634 0.101588886 -0.0112961167
## EmployeeNumber -0.006887923 -0.018519194 -0.0462467349
## EnvironmentSatisfaction -0.008277598 0.001211699 -0.0067843526
## HourlyRate 0.042860641 -0.027853486 -0.0713346244
## JobInvolvement 1.000000000 -0.012629883 -0.0214759103
## JobLevel -0.012629883 1.000000000 -0.0019437080
## JobSatisfaction -0.021475910 -0.001943708 1.0000000000
## MonthlyIncome -0.015271491 0.950299913 -0.0071567424
## MonthlyRate -0.016322079 0.039562951 0.0006439169
## NumCompaniesWorked 0.015012413 0.142501124 -0.0556994260
## PercentSalaryHike -0.017204572 -0.034730492 0.0200020394
## PerformanceRating -0.029071333 -0.021222082 0.0022971971
## RelationshipSatisfaction 0.034296821 0.021641511 -0.0124535932
## StockOptionLevel 0.021522640 0.013983911 0.0106902261
## TotalWorkingYears -0.005533182 0.782207805 -0.0201850727
## TrainingTimesLastYear -0.015337826 -0.018190550 -0.0057793350
## WorkLifeBalance -0.014616593 0.037817746 -0.0194587102
## YearsAtCompany -0.021355427 0.534738687 -0.0038026279
## YearsInCurrentRole 0.008716963 0.389446733 -0.0023047852
## YearsSinceLastPromotion -0.024184292 0.353885347 -0.0182135678
## MonthlyIncome MonthlyRate NumCompaniesWorked
## Age 0.497854567 0.0280511671 0.299634758
## Attrition -0.159839582 0.0151702125 0.043493739
## DailyRate 0.007707059 -0.0321816015 0.038153434
## DistanceFromHome -0.017014445 0.0274728635 -0.029250804
## Education 0.094960677 -0.0260841972 0.126316560
## EmployeeNumber -0.014828516 0.0126482292 -0.001251032
## EnvironmentSatisfaction -0.006259088 0.0375996229 0.012594323
## HourlyRate -0.015794304 -0.0152967496 0.022156883
## JobInvolvement -0.015271491 -0.0163220791 0.015012413
## JobLevel 0.950299913 0.0395629510 0.142501124
## JobSatisfaction -0.007156742 0.0006439169 -0.055699426
## MonthlyIncome 1.000000000 0.0348136261 0.149515216
## MonthlyRate 0.034813626 1.0000000000 0.017521353
## NumCompaniesWorked 0.149515216 0.0175213534 1.000000000
## PercentSalaryHike -0.027268586 -0.0064293459 -0.010238309
## PerformanceRating -0.017120138 -0.0098114285 -0.014094873
## RelationshipSatisfaction 0.025873436 -0.0040853293 0.052733049
## StockOptionLevel 0.005407677 -0.0343228302 0.030075475
## TotalWorkingYears 0.772893246 0.0264424712 0.237638590
## TrainingTimesLastYear -0.021736277 0.0014668806 -0.066054072
## WorkLifeBalance 0.030683082 0.0079631575 -0.008365685
## YearsAtCompany 0.514284826 -0.0236551067 -0.118421340
## YearsInCurrentRole 0.363817667 -0.0128148744 -0.090753934
## YearsSinceLastPromotion 0.344977638 0.0015667995 -0.036813892
## PercentSalaryHike PerformanceRating
## Age 0.003633585 0.0019038955
## Attrition -0.013478202 0.0028887517
## DailyRate 0.022703677 0.0004732963
## DistanceFromHome 0.040235377 0.0271096185
## Education -0.011110941 -0.0245387912
## EmployeeNumber -0.012943996 -0.0203588251
## EnvironmentSatisfaction -0.031701195 -0.0295479523
## HourlyRate -0.009061986 -0.0021716974
## JobInvolvement -0.017204572 -0.0290713334
## JobLevel -0.034730492 -0.0212220821
## JobSatisfaction 0.020002039 0.0022971971
## MonthlyIncome -0.027268586 -0.0171201382
## MonthlyRate -0.006429346 -0.0098114285
## NumCompaniesWorked -0.010238309 -0.0140948728
## PercentSalaryHike 1.000000000 0.7735499964
## PerformanceRating 0.773549996 1.0000000000
## RelationshipSatisfaction -0.040490081 -0.0313514554
## StockOptionLevel 0.007527748 0.0035064716
## TotalWorkingYears -0.020608488 0.0067436679
## TrainingTimesLastYear -0.005221012 -0.0155788817
## WorkLifeBalance -0.003279636 0.0025723613
## YearsAtCompany -0.035991262 0.0034351261
## YearsInCurrentRole -0.001520027 0.0349862604
## YearsSinceLastPromotion -0.022154313 0.0178960661
## RelationshipSatisfaction StockOptionLevel
## Age 0.053534720 0.037509712
## Attrition -0.045872279 -0.137144919
## DailyRate 0.007846031 0.042142796
## DistanceFromHome 0.006557475 0.044871999
## Education -0.009118377 0.018422220
## EmployeeNumber -0.069861411 0.062226693
## EnvironmentSatisfaction 0.007665384 0.003432158
## HourlyRate 0.001330453 0.050263399
## JobInvolvement 0.034296821 0.021522640
## JobLevel 0.021641511 0.013983911
## JobSatisfaction -0.012453593 0.010690226
## MonthlyIncome 0.025873436 0.005407677
## MonthlyRate -0.004085329 -0.034322830
## NumCompaniesWorked 0.052733049 0.030075475
## PercentSalaryHike -0.040490081 0.007527748
## PerformanceRating -0.031351455 0.003506472
## RelationshipSatisfaction 1.000000000 -0.045952491
## StockOptionLevel -0.045952491 1.000000000
## TotalWorkingYears 0.024054292 0.010135969
## TrainingTimesLastYear 0.002496526 0.011274070
## WorkLifeBalance 0.019604406 0.004128730
## YearsAtCompany 0.019366787 0.015058008
## YearsInCurrentRole -0.015122915 0.050817873
## YearsSinceLastPromotion 0.033492502 0.014352185
## TotalWorkingYears TrainingTimesLastYear
## Age 0.680380536 -0.019620819
## Attrition -0.171063246 -0.059477799
## DailyRate 0.014514739 0.002452543
## DistanceFromHome 0.004628426 -0.036942234
## Education 0.148279697 -0.025100241
## EmployeeNumber -0.014365198 0.023603170
## EnvironmentSatisfaction -0.002693070 -0.019359308
## HourlyRate -0.002333682 -0.008547685
## JobInvolvement -0.005533182 -0.015337826
## JobLevel 0.782207805 -0.018190550
## JobSatisfaction -0.020185073 -0.005779335
## MonthlyIncome 0.772893246 -0.021736277
## MonthlyRate 0.026442471 0.001466881
## NumCompaniesWorked 0.237638590 -0.066054072
## PercentSalaryHike -0.020608488 -0.005221012
## PerformanceRating 0.006743668 -0.015578882
## RelationshipSatisfaction 0.024054292 0.002496526
## StockOptionLevel 0.010135969 0.011274070
## TotalWorkingYears 1.000000000 -0.035661571
## TrainingTimesLastYear -0.035661571 1.000000000
## WorkLifeBalance 0.001007646 0.028072207
## YearsAtCompany 0.628133155 0.003568666
## YearsInCurrentRole 0.460364638 -0.005737504
## YearsSinceLastPromotion 0.404857759 -0.002066536
## WorkLifeBalance YearsAtCompany YearsInCurrentRole
## Age -0.021490028 0.311308770 0.212901056
## Attrition -0.063939047 -0.134392214 -0.160545004
## DailyRate -0.037848051 -0.034054768 0.009932015
## DistanceFromHome -0.026556004 0.009507720 0.018844999
## Education 0.009819189 0.069113696 0.060235554
## EmployeeNumber 0.010308641 -0.011240464 -0.008416312
## EnvironmentSatisfaction 0.027627295 0.001457549 0.018007460
## HourlyRate -0.004607234 -0.019581616 -0.024106220
## JobInvolvement -0.014616593 -0.021355427 0.008716963
## JobLevel 0.037817746 0.534738687 0.389446733
## JobSatisfaction -0.019458710 -0.003802628 -0.002304785
## MonthlyIncome 0.030683082 0.514284826 0.363817667
## MonthlyRate 0.007963158 -0.023655107 -0.012814874
## NumCompaniesWorked -0.008365685 -0.118421340 -0.090753934
## PercentSalaryHike -0.003279636 -0.035991262 -0.001520027
## PerformanceRating 0.002572361 0.003435126 0.034986260
## RelationshipSatisfaction 0.019604406 0.019366787 -0.015122915
## StockOptionLevel 0.004128730 0.015058008 0.050817873
## TotalWorkingYears 0.001007646 0.628133155 0.460364638
## TrainingTimesLastYear 0.028072207 0.003568666 -0.005737504
## WorkLifeBalance 1.000000000 0.012089185 0.049856498
## YearsAtCompany 0.012089185 1.000000000 0.758753737
## YearsInCurrentRole 0.049856498 0.758753737 1.000000000
## YearsSinceLastPromotion 0.008941249 0.618408865 0.548056248
## YearsSinceLastPromotion
## Age 0.216513368
## Attrition -0.033018775
## DailyRate -0.033228985
## DistanceFromHome 0.010028836
## Education 0.054254334
## EmployeeNumber -0.009019064
## EnvironmentSatisfaction 0.016193606
## HourlyRate -0.026715586
## JobInvolvement -0.024184292
## JobLevel 0.353885347
## JobSatisfaction -0.018213568
## MonthlyIncome 0.344977638
## MonthlyRate 0.001566800
## NumCompaniesWorked -0.036813892
## PercentSalaryHike -0.022154313
## PerformanceRating 0.017896066
## RelationshipSatisfaction 0.033492502
## StockOptionLevel 0.014352185
## TotalWorkingYears 0.404857759
## TrainingTimesLastYear -0.002066536
## WorkLifeBalance 0.008941249
## YearsAtCompany 0.618408865
## YearsInCurrentRole 0.548056248
## YearsSinceLastPromotion 1.000000000
corrplot(corr)
This correlation table shows that there is a correlation between variables in the data set. Job level and Age are strongly correlated, as well as Job Level and monthly income, as well as total working years and monthly income. Age and total working yeares are strongly correlated. There is a slight correlation between attrition and monthly income as well as age. There is no correlation between attrition and number of companies worked for.
Age vs. Monthly Income
p = qplot( x = age, y = monthlyincome, data = mydata ) + geom_point()
p
p + geom_smooth(method="lm" )
Age vs. Number Companies Worked
p = qplot( x = age, y = numcompaniesworked, data = mydata ) + geom_point()
p
p + geom_smooth(method="lm" )
Monthly Income vs. Number Companies Worked
p = qplot( x = monthlyincome, y = numcompaniesworked, data = mydata ) + geom_point()
p
p + geom_smooth(method="lm" )
Age vs. Monthly Income
linear_model = lm( age ~ monthlyincome, data = mydata )
summary(linear_model)
##
## Call:
## lm(formula = age ~ monthlyincome, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.218 -5.703 -1.044 4.855 26.255
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.064e+01 3.526e-01 86.91 <2e-16 ***
## monthlyincome 9.660e-04 4.392e-05 22.00 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.925 on 1468 degrees of freedom
## Multiple R-squared: 0.2479, Adjusted R-squared: 0.2473
## F-statistic: 483.8 on 1 and 1468 DF, p-value: < 2.2e-16
Age vs. Number Companies Worked
linear_model = lm( age ~ numcompaniesworked, data = mydata )
summary(linear_model)
##
## Call:
## lm(formula = age ~ numcompaniesworked, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.835 -6.068 -1.068 5.453 26.027
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.97265 0.33445 101.58 <2e-16 ***
## numcompaniesworked 1.09578 0.09106 12.03 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.719 on 1468 degrees of freedom
## Multiple R-squared: 0.08978, Adjusted R-squared: 0.08916
## F-statistic: 144.8 on 1 and 1468 DF, p-value: < 2.2e-16
Monthly Income vs. Number Companies Worked
linear_model = lm( monthlyincome ~ numcompaniesworked, data = mydata )
summary(linear_model)
##
## Call:
## lm(formula = monthlyincome ~ numcompaniesworked, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6171 -3325 -1578 1960 14255
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5744.02 178.63 32.156 < 2e-16 ***
## numcompaniesworked 281.79 48.64 5.794 8.41e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4657 on 1468 degrees of freedom
## Multiple R-squared: 0.02235, Adjusted R-squared: 0.02169
## F-statistic: 33.57 on 1 and 1468 DF, p-value: 8.41e-09
For age vs. monthly income, this model provides an R-squared value of 0.2479 and an adjusted R-squared value of 0.2473. Given these values, this model does not predict that age and monthly income are closely correlated. For both age vs. number companies worked and monthly income vs. number companies work, the R-squared and adjusted R-squared values are also very low. Therefore, it seems unlikely that these variables are correlated to each other.
Age vs. Monthly Income
Age = mydata$Age
Age2 = mydata$Age^2
quad_model = lm(monthlyincome ~ Age + Age2, data = mydata)
summary(quad_model)
##
## Call:
## lm(formula = monthlyincome ~ Age + Age2, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9757.6 -2634.2 -684.7 1814.4 12478.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3999.1498 1583.5766 -2.525 0.011662 *
## Age 312.8211 83.9530 3.726 0.000202 ***
## Age2 -0.7247 1.0711 -0.677 0.498781
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4085 on 1467 degrees of freedom
## Multiple R-squared: 0.2481, Adjusted R-squared: 0.2471
## F-statistic: 242 on 2 and 1467 DF, p-value: < 2.2e-16
Age vs. Number Companies Worked
Age = mydata$Age
Age2 = mydata$Age^2
quad_model = lm(numcompaniesworked ~ Age + Age2, data = mydata)
summary(quad_model)
##
## Call:
## lm(formula = numcompaniesworked ~ Age + Age2, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1924 -1.6139 -0.7232 1.1223 7.5088
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.8665358 0.9235252 -2.021 0.043451 *
## Age 0.1658550 0.0489605 3.388 0.000724 ***
## Age2 -0.0010812 0.0006247 -1.731 0.083686 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.382 on 1467 degrees of freedom
## Multiple R-squared: 0.09164, Adjusted R-squared: 0.0904
## F-statistic: 74 on 2 and 1467 DF, p-value: < 2.2e-16
Monthly Income vs. Number Companies Worked
NumCompaniesWorked = mydata$NumCompaniesWorked
NumCompaniesWorked2 = mydata$NumCompaniesWorked^2
quad_model = lm(monthlyincome ~ NumCompaniesWorked + NumCompaniesWorked2, data = mydata)
summary(quad_model)
##
## Call:
## lm(formula = monthlyincome ~ NumCompaniesWorked + NumCompaniesWorked2,
## data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5708 -3229 -1545 2016 14880
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5119.26 237.13 21.589 < 2e-16 ***
## NumCompaniesWorked 909.17 164.89 5.514 4.14e-08 ***
## NumCompaniesWorked2 -78.94 19.83 -3.980 7.22e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4633 on 1467 degrees of freedom
## Multiple R-squared: 0.0328, Adjusted R-squared: 0.03148
## F-statistic: 24.87 on 2 and 1467 DF, p-value: 2.379e-11
The Quadratic Model is a better fit for the data because the R-Squared and Adjusted R-squared are all higher in the quadratic model vs the linear model. This indicates that the quadratic model is showing a higher coorelation between the variables presented. For example, as age of the employee increases, so does the number of companies they work for. This has to be the case, so the higher coorelation value of the quadratic model is more accurate than the linear model. For Age and Income, as time passes, inherently so does the income of the employees. And lastly, for Number of Companies Worked For and Income, we are assuming that income is increasing. Because of these assumptioms, the higher R values of the Quadratic model are more accurate.
knitr::include_graphics('imgs/screenshot1.png')
This chart shows that Monthly Income and Age are not coorelated. This makes sense because people who are older do not necessarily make more money than their younger counterparts. Therefore, there is no coorelation between the two.
knitr::include_graphics('imgs/screenshot2.png')
This chart shows that as the people who responded “no” to attrition, meaning that they were not worn out by their job and therefore stayed, collectively worked at far more companies than people who responded “yes” and left their jobs. This supports our hypothesis in saying that number of comapnies worked for is a strong predictor of attrition, because people who have worked for more companies are more likely to leave their current position.
knitr::include_graphics('imgs/screenshot3.png')
This chart shows that for people who leave their current roles more often, they make more monthly income than people who stay. This means, that people who leave their positions are leaving to get promoted, a pay raise, or a better paying firm.