Task #1: About this Project

1A) Brief Description of Project

The greatest assets of any company are the people that make up an organization. This project aims to analyze a given set of data that is comprised of 1,471 rows and 35 columns. The 1,471 rows each indicate an individual employee and the 35 columns are different variables that describe the employee. The data reflects how a Human Resources department might use this data to understand their employees more thoroughly. We in particular will be analyzing different variables that relate to employee attrition, meaning variables that affect whether or not an employee stays with a company or leaves. This information is highly valuable to organizations because high employee turnover can lead to a loss of time, resources, and money and can also indicate problems within individual organizations. In particular, we will look at employee income, age, and the number of companies they have worked for in the past. Using these variables can help any organization understand what causes employees to leave or better yet, stay with their organization.

1B) Resources and Packages to be Used

To complete this project, we will use R Studio, Watson Analytics, Tableau, and Excel to analyze the given data set. In particular, in R studio we will be using correlation tables, descriptive analytics that identify maximum and minimum values, predictive analytics, and linear regression models that identify R squared and adjusted R squared including quadratic regression. In Watson Analytics, we will be using predictive analytics including decision trees, word clouds, and target charts. In Tableau we will graphically show the factors we have analyzed and provide a visual of the results. The data set presented was in Excel and we will be using Excel to clean the data by changing variables such as attrition, which is nonnumeric, to numeric variables that can be used in R Studio and Watson Analytics.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.4.2     ✔ dplyr   0.7.4
## ✔ tidyr   0.7.2     ✔ stringr 1.2.0
## ✔ readr   1.1.1     ✔ forcats 0.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(corrplot)
## corrplot 0.84 loaded
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

1C) Data Description

There is no formal description of the data set given. The data set looks at a group of 1,471 employees and 35 variables that are unique to that employee. These variables include age, attrition, travel, department, distance from home, education, gender, monthly income, number of companies worked for, job involvement, jobs satisfaction, marital status, work/life balance, percentage salary hike and a number of other variables.

1D) Hypothesis

After reviewing the data, we believe that the most important factors that lead to employee attrition are income level, age of the employee, and the number of companies that employees have worked for. We will use tools such as R studio, and Watson Analytics to determine what the strongest factors of employee attrition are and compare those to the ones hypothesized. We will also analyze any trends in the data that impact employee attrition and whether or not these variables are related to each other.

Task #2: Data Collection

2A) Description of key metrics of the HR industry

The dataset for this project is composed of data regrarding employee attrition and general data collected by HR, including job satisfaction, marital status, education field, and total working years just to name a few. These are all factors that we believe could impact attrition/turnover rates. It cannot be inferred that the all the data was collected at the same company, however it does appear that all the data was collected from employees who work in the life sciences or medical industry.

2B) Load and explore dataset

Load dataset

mydata = read.csv(file = "data/employee_attrition.csv")
head(mydata)
##   Age Attrition    BusinessTravel DailyRate             Department
## 1  41       Yes     Travel_Rarely      1102                  Sales
## 2  49        No Travel_Frequently       279 Research & Development
## 3  37       Yes     Travel_Rarely      1373 Research & Development
## 4  33        No Travel_Frequently      1392 Research & Development
## 5  27        No     Travel_Rarely       591 Research & Development
## 6  32        No Travel_Frequently      1005 Research & Development
##   DistanceFromHome Education EducationField EmployeeCount EmployeeNumber
## 1                1         2  Life Sciences             1              1
## 2                8         1  Life Sciences             1              2
## 3                2         2          Other             1              4
## 4                3         4  Life Sciences             1              5
## 5                2         1        Medical             1              7
## 6                2         2  Life Sciences             1              8
##   EnvironmentSatisfaction Gender HourlyRate JobInvolvement JobLevel
## 1                       2 Female         94              3        2
## 2                       3   Male         61              2        2
## 3                       4   Male         92              2        1
## 4                       4 Female         56              3        1
## 5                       1   Male         40              3        1
## 6                       4   Male         79              3        1
##                 JobRole JobSatisfaction MaritalStatus MonthlyIncome
## 1       Sales Executive               4        Single          5993
## 2    Research Scientist               2       Married          5130
## 3 Laboratory Technician               3        Single          2090
## 4    Research Scientist               3       Married          2909
## 5 Laboratory Technician               2       Married          3468
## 6 Laboratory Technician               4        Single          3068
##   MonthlyRate NumCompaniesWorked Over18 OverTime PercentSalaryHike
## 1       19479                  8      Y      Yes                11
## 2       24907                  1      Y       No                23
## 3        2396                  6      Y      Yes                15
## 4       23159                  1      Y      Yes                11
## 5       16632                  9      Y       No                12
## 6       11864                  0      Y       No                13
##   PerformanceRating RelationshipSatisfaction StandardHours
## 1                 3                        1            80
## 2                 4                        4            80
## 3                 3                        2            80
## 4                 3                        3            80
## 5                 3                        4            80
## 6                 3                        3            80
##   StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance
## 1                0                 8                     0               1
## 2                1                10                     3               3
## 3                0                 7                     3               3
## 4                0                 8                     3               3
## 5                1                 6                     3               3
## 6                0                 8                     2               2
##   YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion
## 1              6                  4                       0
## 2             10                  7                       1
## 3              0                  0                       0
## 4              8                  7                       3
## 5              2                  2                       2
## 6              7                  7                       3
##   YearsWithCurrManager
## 1                    5
## 2                    7
## 3                    0
## 4                    0
## 5                    2
## 6                    6

Summary

summary(mydata)
##       Age        Attrition            BusinessTravel   DailyRate     
##  Min.   :18.00   No :1233   Non-Travel       : 150   Min.   : 102.0  
##  1st Qu.:30.00   Yes: 237   Travel_Frequently: 277   1st Qu.: 465.0  
##  Median :36.00              Travel_Rarely    :1043   Median : 802.0  
##  Mean   :36.92                                       Mean   : 802.5  
##  3rd Qu.:43.00                                       3rd Qu.:1157.0  
##  Max.   :60.00                                       Max.   :1499.0  
##                                                                      
##                   Department  DistanceFromHome   Education    
##  Human Resources       : 63   Min.   : 1.000   Min.   :1.000  
##  Research & Development:961   1st Qu.: 2.000   1st Qu.:2.000  
##  Sales                 :446   Median : 7.000   Median :3.000  
##                               Mean   : 9.193   Mean   :2.913  
##                               3rd Qu.:14.000   3rd Qu.:4.000  
##                               Max.   :29.000   Max.   :5.000  
##                                                               
##           EducationField EmployeeCount EmployeeNumber  
##  Human Resources : 27    Min.   :1     Min.   :   1.0  
##  Life Sciences   :606    1st Qu.:1     1st Qu.: 491.2  
##  Marketing       :159    Median :1     Median :1020.5  
##  Medical         :464    Mean   :1     Mean   :1024.9  
##  Other           : 82    3rd Qu.:1     3rd Qu.:1555.8  
##  Technical Degree:132    Max.   :1     Max.   :2068.0  
##                                                        
##  EnvironmentSatisfaction    Gender      HourlyRate     JobInvolvement
##  Min.   :1.000           Female:588   Min.   : 30.00   Min.   :1.00  
##  1st Qu.:2.000           Male  :882   1st Qu.: 48.00   1st Qu.:2.00  
##  Median :3.000                        Median : 66.00   Median :3.00  
##  Mean   :2.722                        Mean   : 65.89   Mean   :2.73  
##  3rd Qu.:4.000                        3rd Qu.: 83.75   3rd Qu.:3.00  
##  Max.   :4.000                        Max.   :100.00   Max.   :4.00  
##                                                                      
##     JobLevel                          JobRole    JobSatisfaction
##  Min.   :1.000   Sales Executive          :326   Min.   :1.000  
##  1st Qu.:1.000   Research Scientist       :292   1st Qu.:2.000  
##  Median :2.000   Laboratory Technician    :259   Median :3.000  
##  Mean   :2.064   Manufacturing Director   :145   Mean   :2.729  
##  3rd Qu.:3.000   Healthcare Representative:131   3rd Qu.:4.000  
##  Max.   :5.000   Manager                  :102   Max.   :4.000  
##                  (Other)                  :215                  
##   MaritalStatus MonthlyIncome    MonthlyRate    NumCompaniesWorked
##  Divorced:327   Min.   : 1009   Min.   : 2094   Min.   :0.000     
##  Married :673   1st Qu.: 2911   1st Qu.: 8047   1st Qu.:1.000     
##  Single  :470   Median : 4919   Median :14236   Median :2.000     
##                 Mean   : 6503   Mean   :14313   Mean   :2.693     
##                 3rd Qu.: 8379   3rd Qu.:20462   3rd Qu.:4.000     
##                 Max.   :19999   Max.   :26999   Max.   :9.000     
##                                                                   
##  Over18   OverTime   PercentSalaryHike PerformanceRating
##  Y:1470   No :1054   Min.   :11.00     Min.   :3.000    
##           Yes: 416   1st Qu.:12.00     1st Qu.:3.000    
##                      Median :14.00     Median :3.000    
##                      Mean   :15.21     Mean   :3.154    
##                      3rd Qu.:18.00     3rd Qu.:3.000    
##                      Max.   :25.00     Max.   :4.000    
##                                                         
##  RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears
##  Min.   :1.000            Min.   :80    Min.   :0.0000   Min.   : 0.00    
##  1st Qu.:2.000            1st Qu.:80    1st Qu.:0.0000   1st Qu.: 6.00    
##  Median :3.000            Median :80    Median :1.0000   Median :10.00    
##  Mean   :2.712            Mean   :80    Mean   :0.7939   Mean   :11.28    
##  3rd Qu.:4.000            3rd Qu.:80    3rd Qu.:1.0000   3rd Qu.:15.00    
##  Max.   :4.000            Max.   :80    Max.   :3.0000   Max.   :40.00    
##                                                                           
##  TrainingTimesLastYear WorkLifeBalance YearsAtCompany   YearsInCurrentRole
##  Min.   :0.000         Min.   :1.000   Min.   : 0.000   Min.   : 0.000    
##  1st Qu.:2.000         1st Qu.:2.000   1st Qu.: 3.000   1st Qu.: 2.000    
##  Median :3.000         Median :3.000   Median : 5.000   Median : 3.000    
##  Mean   :2.799         Mean   :2.761   Mean   : 7.008   Mean   : 4.229    
##  3rd Qu.:3.000         3rd Qu.:3.000   3rd Qu.: 9.000   3rd Qu.: 7.000    
##  Max.   :6.000         Max.   :4.000   Max.   :40.000   Max.   :18.000    
##                                                                           
##  YearsSinceLastPromotion YearsWithCurrManager
##  Min.   : 0.000          Min.   : 0.000      
##  1st Qu.: 0.000          1st Qu.: 2.000      
##  Median : 1.000          Median : 3.000      
##  Mean   : 2.188          Mean   : 4.123      
##  3rd Qu.: 3.000          3rd Qu.: 7.000      
##  Max.   :15.000          Max.   :17.000      
## 

Extracting Variables

title = mydata
age = mydata$Age
attrition = mydata$Attrition
monthlyincome = mydata$MonthlyIncome
numcompaniesworked = mydata$NumCompaniesWorked

2C) Descriptive Statistics of Data

The original dataset is complete and cleaned. There are no variables missing and there are no undesired special characters. In terms of attrition, 1233 employees responded “No” while 237 responded “Yes”. The age of employees ranges from 18 years to 60 years, with the mean age of employees being 36.92 years. Montly income varies from $1009 to $19999, with the mean being $6503. In terms of the number of companies worked for, this set ranges from zero companies, indicating their current company is the only company they have worked for, to nine companies.

Task #3: Data Preparation

3A)

Although this dataset is already cleaned and there are no missing variables or unwanted characters, in order for use to be able to analyze attrition we need to change the “yes” and “no” responses to numeric values. In doing this, we assigned “1” to all of the “yes” responses and “0” to all of the “no” responses. In doing this, we will now be able to compare attrition with other numeric variables in the dataset.

3B) Clean the Data

We used the replace function in Excel to change the “yes” and “no” responses for employee attrition to numeric values.

3C) Save Clean Dataset

mydata = read.csv(file = "data/clean_employee_attrition.csv")
head(mydata)
##   Age Attrition    BusinessTravel DailyRate             Department
## 1  41         1     Travel_Rarely      1102                  Sales
## 2  49         0 Travel_Frequently       279 Research & Development
## 3  37         1     Travel_Rarely      1373 Research & Development
## 4  33         0 Travel_Frequently      1392 Research & Development
## 5  27         0     Travel_Rarely       591 Research & Development
## 6  32         0 Travel_Frequently      1005 Research & Development
##   DistanceFromHome Education EducationField EmployeeCount EmployeeNumber
## 1                1         2  Life Sciences             1              1
## 2                8         1  Life Sciences             1              2
## 3                2         2          Other             1              4
## 4                3         4  Life Sciences             1              5
## 5                2         1        Medical             1              7
## 6                2         2  Life Sciences             1              8
##   EnvironmentSatisfaction Gender HourlyRate JobInvolvement JobLevel
## 1                       2 Female         94              3        2
## 2                       3   Male         61              2        2
## 3                       4   Male         92              2        1
## 4                       4 Female         56              3        1
## 5                       1   Male         40              3        1
## 6                       4   Male         79              3        1
##                 JobRole JobSatisfaction MaritalStatus MonthlyIncome
## 1       Sales Executive               4        Single          5993
## 2    Research Scientist               2       Married          5130
## 3 Laboratory Technician               3        Single          2090
## 4    Research Scientist               3       Married          2909
## 5 Laboratory Technician               2       Married          3468
## 6 Laboratory Technician               4        Single          3068
##   MonthlyRate NumCompaniesWorked Over18 OverTime PercentSalaryHike
## 1       19479                  8      Y      Yes                11
## 2       24907                  1      Y       No                23
## 3        2396                  6      Y      Yes                15
## 4       23159                  1      Y      Yes                11
## 5       16632                  9      Y       No                12
## 6       11864                  0      Y       No                13
##   PerformanceRating RelationshipSatisfaction StandardHours
## 1                 3                        1            80
## 2                 4                        4            80
## 3                 3                        2            80
## 4                 3                        3            80
## 5                 3                        4            80
## 6                 3                        3            80
##   StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance
## 1                0                 8                     0               1
## 2                1                10                     3               3
## 3                0                 7                     3               3
## 4                0                 8                     3               3
## 5                1                 6                     3               3
## 6                0                 8                     2               2
##   YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion
## 1              6                  4                       0
## 2             10                  7                       1
## 3              0                  0                       0
## 4              8                  7                       3
## 5              2                  2                       2
## 6              7                  7                       3
##   YearsWithCurrManager
## 1                    5
## 2                    7
## 3                    0
## 4                    0
## 5                    2
## 6                    6

Extracting Variables

title = mydata
age = mydata$Age
attrition = mydata$Attrition
monthlyincome = mydata$MonthlyIncome
numcompaniesworked = mydata$NumCompaniesWorked

Task #4: Data Analysis: Descriptive Statistics, Correlation

4A) Descriptive Statistics

summary(mydata)
##       Age          Attrition                BusinessTravel
##  Min.   :18.00   Min.   :0.0000   Non-Travel       : 150  
##  1st Qu.:30.00   1st Qu.:0.0000   Travel_Frequently: 277  
##  Median :36.00   Median :0.0000   Travel_Rarely    :1043  
##  Mean   :36.92   Mean   :0.1612                           
##  3rd Qu.:43.00   3rd Qu.:0.0000                           
##  Max.   :60.00   Max.   :1.0000                           
##                                                           
##    DailyRate                       Department  DistanceFromHome
##  Min.   : 102.0   Human Resources       : 63   Min.   : 1.000  
##  1st Qu.: 465.0   Research & Development:961   1st Qu.: 2.000  
##  Median : 802.0   Sales                 :446   Median : 7.000  
##  Mean   : 802.5                                Mean   : 9.193  
##  3rd Qu.:1157.0                                3rd Qu.:14.000  
##  Max.   :1499.0                                Max.   :29.000  
##                                                                
##    Education              EducationField EmployeeCount EmployeeNumber  
##  Min.   :1.000   Human Resources : 27    Min.   :1     Min.   :   1.0  
##  1st Qu.:2.000   Life Sciences   :606    1st Qu.:1     1st Qu.: 491.2  
##  Median :3.000   Marketing       :159    Median :1     Median :1020.5  
##  Mean   :2.913   Medical         :464    Mean   :1     Mean   :1024.9  
##  3rd Qu.:4.000   Other           : 82    3rd Qu.:1     3rd Qu.:1555.8  
##  Max.   :5.000   Technical Degree:132    Max.   :1     Max.   :2068.0  
##                                                                        
##  EnvironmentSatisfaction    Gender      HourlyRate     JobInvolvement
##  Min.   :1.000           Female:588   Min.   : 30.00   Min.   :1.00  
##  1st Qu.:2.000           Male  :882   1st Qu.: 48.00   1st Qu.:2.00  
##  Median :3.000                        Median : 66.00   Median :3.00  
##  Mean   :2.722                        Mean   : 65.89   Mean   :2.73  
##  3rd Qu.:4.000                        3rd Qu.: 83.75   3rd Qu.:3.00  
##  Max.   :4.000                        Max.   :100.00   Max.   :4.00  
##                                                                      
##     JobLevel                          JobRole    JobSatisfaction
##  Min.   :1.000   Sales Executive          :326   Min.   :1.000  
##  1st Qu.:1.000   Research Scientist       :292   1st Qu.:2.000  
##  Median :2.000   Laboratory Technician    :259   Median :3.000  
##  Mean   :2.064   Manufacturing Director   :145   Mean   :2.729  
##  3rd Qu.:3.000   Healthcare Representative:131   3rd Qu.:4.000  
##  Max.   :5.000   Manager                  :102   Max.   :4.000  
##                  (Other)                  :215                  
##   MaritalStatus MonthlyIncome    MonthlyRate    NumCompaniesWorked
##  Divorced:327   Min.   : 1009   Min.   : 2094   Min.   :0.000     
##  Married :673   1st Qu.: 2911   1st Qu.: 8047   1st Qu.:1.000     
##  Single  :470   Median : 4919   Median :14236   Median :2.000     
##                 Mean   : 6503   Mean   :14313   Mean   :2.693     
##                 3rd Qu.: 8379   3rd Qu.:20462   3rd Qu.:4.000     
##                 Max.   :19999   Max.   :26999   Max.   :9.000     
##                                                                   
##  Over18   OverTime   PercentSalaryHike PerformanceRating
##  Y:1470   No :1054   Min.   :11.00     Min.   :3.000    
##           Yes: 416   1st Qu.:12.00     1st Qu.:3.000    
##                      Median :14.00     Median :3.000    
##                      Mean   :15.21     Mean   :3.154    
##                      3rd Qu.:18.00     3rd Qu.:3.000    
##                      Max.   :25.00     Max.   :4.000    
##                                                         
##  RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears
##  Min.   :1.000            Min.   :80    Min.   :0.0000   Min.   : 0.00    
##  1st Qu.:2.000            1st Qu.:80    1st Qu.:0.0000   1st Qu.: 6.00    
##  Median :3.000            Median :80    Median :1.0000   Median :10.00    
##  Mean   :2.712            Mean   :80    Mean   :0.7939   Mean   :11.28    
##  3rd Qu.:4.000            3rd Qu.:80    3rd Qu.:1.0000   3rd Qu.:15.00    
##  Max.   :4.000            Max.   :80    Max.   :3.0000   Max.   :40.00    
##                                                                           
##  TrainingTimesLastYear WorkLifeBalance YearsAtCompany   YearsInCurrentRole
##  Min.   :0.000         Min.   :1.000   Min.   : 0.000   Min.   : 0.000    
##  1st Qu.:2.000         1st Qu.:2.000   1st Qu.: 3.000   1st Qu.: 2.000    
##  Median :3.000         Median :3.000   Median : 5.000   Median : 3.000    
##  Mean   :2.799         Mean   :2.761   Mean   : 7.008   Mean   : 4.229    
##  3rd Qu.:3.000         3rd Qu.:3.000   3rd Qu.: 9.000   3rd Qu.: 7.000    
##  Max.   :6.000         Max.   :4.000   Max.   :40.000   Max.   :18.000    
##                                                                           
##  YearsSinceLastPromotion YearsWithCurrManager
##  Min.   : 0.000          Min.   : 0.000      
##  1st Qu.: 0.000          1st Qu.: 2.000      
##  Median : 1.000          Median : 3.000      
##  Mean   : 2.188          Mean   : 4.123      
##  3rd Qu.: 3.000          3rd Qu.: 7.000      
##  Max.   :15.000          Max.   :17.000      
## 

The minimum age in this dataset is 18 and the maximum age is 60. The minimum monthly income was 1009 and the maximum was 19999. The minimum number of companies worked for is 0 while the maximum is 9.

4B) Minimums and Maximums

Age

min(age)
## [1] 18
max(age)
## [1] 60

Monthly Income

min(monthlyincome)
## [1] 1009
max(monthlyincome)
## [1] 19999

Number Companies Worked

min(numcompaniesworked)
## [1] 0
max(numcompaniesworked)
## [1] 9

4C) Correlation Values

Age and Monthly Income

data_corr = cor(age, monthlyincome)
data_corr
## [1] 0.4978546

Age and Number Companies Worked

data_corr = cor(age, numcompaniesworked)
data_corr
## [1] 0.2996348

Number Comapnies Worked and Monthly Income

data_corr = cor(numcompaniesworked, monthlyincome)
data_corr
## [1] 0.1495152

4D) Correlation Table

corr = cor(mydata[c(1, 2, 4, 6, 7, 10, 11, 13, 14, 15, 17, 19, 20, 21, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34)])
corr
##                                   Age    Attrition     DailyRate
## Age                       1.000000000 -0.159205007  0.0106609426
## Attrition                -0.159205007  1.000000000 -0.0566519919
## DailyRate                 0.010660943 -0.056651992  1.0000000000
## DistanceFromHome         -0.001686120  0.077923583 -0.0049853374
## Education                 0.208033731 -0.031372820 -0.0168064332
## EmployeeNumber           -0.010145467 -0.010577243 -0.0509904337
## EnvironmentSatisfaction   0.010146428 -0.103368978  0.0183548543
## HourlyRate                0.024286543 -0.006845550  0.0233814215
## JobInvolvement            0.029819959 -0.130015957  0.0461348740
## JobLevel                  0.509604228 -0.169104751  0.0029663349
## JobSatisfaction          -0.004891877 -0.103481126  0.0305710078
## MonthlyIncome             0.497854567 -0.159839582  0.0077070589
## MonthlyRate               0.028051167  0.015170213 -0.0321816015
## NumCompaniesWorked        0.299634758  0.043493739  0.0381534343
## PercentSalaryHike         0.003633585 -0.013478202  0.0227036775
## PerformanceRating         0.001903896  0.002888752  0.0004732963
## RelationshipSatisfaction  0.053534720 -0.045872279  0.0078460310
## StockOptionLevel          0.037509712 -0.137144919  0.0421427964
## TotalWorkingYears         0.680380536 -0.171063246  0.0145147387
## TrainingTimesLastYear    -0.019620819 -0.059477799  0.0024525427
## WorkLifeBalance          -0.021490028 -0.063939047 -0.0378480510
## YearsAtCompany            0.311308770 -0.134392214 -0.0340547676
## YearsInCurrentRole        0.212901056 -0.160545004  0.0099320150
## YearsSinceLastPromotion   0.216513368 -0.033018775 -0.0332289848
##                          DistanceFromHome    Education EmployeeNumber
## Age                          -0.001686120  0.208033731   -0.010145467
## Attrition                     0.077923583 -0.031372820   -0.010577243
## DailyRate                    -0.004985337 -0.016806433   -0.050990434
## DistanceFromHome              1.000000000  0.021041826    0.032916407
## Education                     0.021041826  1.000000000    0.042070093
## EmployeeNumber                0.032916407  0.042070093    1.000000000
## EnvironmentSatisfaction      -0.016075327 -0.027128313    0.017620802
## HourlyRate                    0.031130586  0.016774829    0.035179212
## JobInvolvement                0.008783280  0.042437634   -0.006887923
## JobLevel                      0.005302731  0.101588886   -0.018519194
## JobSatisfaction              -0.003668839 -0.011296117   -0.046246735
## MonthlyIncome                -0.017014445  0.094960677   -0.014828516
## MonthlyRate                   0.027472864 -0.026084197    0.012648229
## NumCompaniesWorked           -0.029250804  0.126316560   -0.001251032
## PercentSalaryHike             0.040235377 -0.011110941   -0.012943996
## PerformanceRating             0.027109618 -0.024538791   -0.020358825
## RelationshipSatisfaction      0.006557475 -0.009118377   -0.069861411
## StockOptionLevel              0.044871999  0.018422220    0.062226693
## TotalWorkingYears             0.004628426  0.148279697   -0.014365198
## TrainingTimesLastYear        -0.036942234 -0.025100241    0.023603170
## WorkLifeBalance              -0.026556004  0.009819189    0.010308641
## YearsAtCompany                0.009507720  0.069113696   -0.011240464
## YearsInCurrentRole            0.018844999  0.060235554   -0.008416312
## YearsSinceLastPromotion       0.010028836  0.054254334   -0.009019064
##                          EnvironmentSatisfaction   HourlyRate
## Age                                  0.010146428  0.024286543
## Attrition                           -0.103368978 -0.006845550
## DailyRate                            0.018354854  0.023381422
## DistanceFromHome                    -0.016075327  0.031130586
## Education                           -0.027128313  0.016774829
## EmployeeNumber                       0.017620802  0.035179212
## EnvironmentSatisfaction              1.000000000 -0.049856956
## HourlyRate                          -0.049856956  1.000000000
## JobInvolvement                      -0.008277598  0.042860641
## JobLevel                             0.001211699 -0.027853486
## JobSatisfaction                     -0.006784353 -0.071334624
## MonthlyIncome                       -0.006259088 -0.015794304
## MonthlyRate                          0.037599623 -0.015296750
## NumCompaniesWorked                   0.012594323  0.022156883
## PercentSalaryHike                   -0.031701195 -0.009061986
## PerformanceRating                   -0.029547952 -0.002171697
## RelationshipSatisfaction             0.007665384  0.001330453
## StockOptionLevel                     0.003432158  0.050263399
## TotalWorkingYears                   -0.002693070 -0.002333682
## TrainingTimesLastYear               -0.019359308 -0.008547685
## WorkLifeBalance                      0.027627295 -0.004607234
## YearsAtCompany                       0.001457549 -0.019581616
## YearsInCurrentRole                   0.018007460 -0.024106220
## YearsSinceLastPromotion              0.016193606 -0.026715586
##                          JobInvolvement     JobLevel JobSatisfaction
## Age                         0.029819959  0.509604228   -0.0048918771
## Attrition                  -0.130015957 -0.169104751   -0.1034811261
## DailyRate                   0.046134874  0.002966335    0.0305710078
## DistanceFromHome            0.008783280  0.005302731   -0.0036688392
## Education                   0.042437634  0.101588886   -0.0112961167
## EmployeeNumber             -0.006887923 -0.018519194   -0.0462467349
## EnvironmentSatisfaction    -0.008277598  0.001211699   -0.0067843526
## HourlyRate                  0.042860641 -0.027853486   -0.0713346244
## JobInvolvement              1.000000000 -0.012629883   -0.0214759103
## JobLevel                   -0.012629883  1.000000000   -0.0019437080
## JobSatisfaction            -0.021475910 -0.001943708    1.0000000000
## MonthlyIncome              -0.015271491  0.950299913   -0.0071567424
## MonthlyRate                -0.016322079  0.039562951    0.0006439169
## NumCompaniesWorked          0.015012413  0.142501124   -0.0556994260
## PercentSalaryHike          -0.017204572 -0.034730492    0.0200020394
## PerformanceRating          -0.029071333 -0.021222082    0.0022971971
## RelationshipSatisfaction    0.034296821  0.021641511   -0.0124535932
## StockOptionLevel            0.021522640  0.013983911    0.0106902261
## TotalWorkingYears          -0.005533182  0.782207805   -0.0201850727
## TrainingTimesLastYear      -0.015337826 -0.018190550   -0.0057793350
## WorkLifeBalance            -0.014616593  0.037817746   -0.0194587102
## YearsAtCompany             -0.021355427  0.534738687   -0.0038026279
## YearsInCurrentRole          0.008716963  0.389446733   -0.0023047852
## YearsSinceLastPromotion    -0.024184292  0.353885347   -0.0182135678
##                          MonthlyIncome   MonthlyRate NumCompaniesWorked
## Age                        0.497854567  0.0280511671        0.299634758
## Attrition                 -0.159839582  0.0151702125        0.043493739
## DailyRate                  0.007707059 -0.0321816015        0.038153434
## DistanceFromHome          -0.017014445  0.0274728635       -0.029250804
## Education                  0.094960677 -0.0260841972        0.126316560
## EmployeeNumber            -0.014828516  0.0126482292       -0.001251032
## EnvironmentSatisfaction   -0.006259088  0.0375996229        0.012594323
## HourlyRate                -0.015794304 -0.0152967496        0.022156883
## JobInvolvement            -0.015271491 -0.0163220791        0.015012413
## JobLevel                   0.950299913  0.0395629510        0.142501124
## JobSatisfaction           -0.007156742  0.0006439169       -0.055699426
## MonthlyIncome              1.000000000  0.0348136261        0.149515216
## MonthlyRate                0.034813626  1.0000000000        0.017521353
## NumCompaniesWorked         0.149515216  0.0175213534        1.000000000
## PercentSalaryHike         -0.027268586 -0.0064293459       -0.010238309
## PerformanceRating         -0.017120138 -0.0098114285       -0.014094873
## RelationshipSatisfaction   0.025873436 -0.0040853293        0.052733049
## StockOptionLevel           0.005407677 -0.0343228302        0.030075475
## TotalWorkingYears          0.772893246  0.0264424712        0.237638590
## TrainingTimesLastYear     -0.021736277  0.0014668806       -0.066054072
## WorkLifeBalance            0.030683082  0.0079631575       -0.008365685
## YearsAtCompany             0.514284826 -0.0236551067       -0.118421340
## YearsInCurrentRole         0.363817667 -0.0128148744       -0.090753934
## YearsSinceLastPromotion    0.344977638  0.0015667995       -0.036813892
##                          PercentSalaryHike PerformanceRating
## Age                            0.003633585      0.0019038955
## Attrition                     -0.013478202      0.0028887517
## DailyRate                      0.022703677      0.0004732963
## DistanceFromHome               0.040235377      0.0271096185
## Education                     -0.011110941     -0.0245387912
## EmployeeNumber                -0.012943996     -0.0203588251
## EnvironmentSatisfaction       -0.031701195     -0.0295479523
## HourlyRate                    -0.009061986     -0.0021716974
## JobInvolvement                -0.017204572     -0.0290713334
## JobLevel                      -0.034730492     -0.0212220821
## JobSatisfaction                0.020002039      0.0022971971
## MonthlyIncome                 -0.027268586     -0.0171201382
## MonthlyRate                   -0.006429346     -0.0098114285
## NumCompaniesWorked            -0.010238309     -0.0140948728
## PercentSalaryHike              1.000000000      0.7735499964
## PerformanceRating              0.773549996      1.0000000000
## RelationshipSatisfaction      -0.040490081     -0.0313514554
## StockOptionLevel               0.007527748      0.0035064716
## TotalWorkingYears             -0.020608488      0.0067436679
## TrainingTimesLastYear         -0.005221012     -0.0155788817
## WorkLifeBalance               -0.003279636      0.0025723613
## YearsAtCompany                -0.035991262      0.0034351261
## YearsInCurrentRole            -0.001520027      0.0349862604
## YearsSinceLastPromotion       -0.022154313      0.0178960661
##                          RelationshipSatisfaction StockOptionLevel
## Age                                   0.053534720      0.037509712
## Attrition                            -0.045872279     -0.137144919
## DailyRate                             0.007846031      0.042142796
## DistanceFromHome                      0.006557475      0.044871999
## Education                            -0.009118377      0.018422220
## EmployeeNumber                       -0.069861411      0.062226693
## EnvironmentSatisfaction               0.007665384      0.003432158
## HourlyRate                            0.001330453      0.050263399
## JobInvolvement                        0.034296821      0.021522640
## JobLevel                              0.021641511      0.013983911
## JobSatisfaction                      -0.012453593      0.010690226
## MonthlyIncome                         0.025873436      0.005407677
## MonthlyRate                          -0.004085329     -0.034322830
## NumCompaniesWorked                    0.052733049      0.030075475
## PercentSalaryHike                    -0.040490081      0.007527748
## PerformanceRating                    -0.031351455      0.003506472
## RelationshipSatisfaction              1.000000000     -0.045952491
## StockOptionLevel                     -0.045952491      1.000000000
## TotalWorkingYears                     0.024054292      0.010135969
## TrainingTimesLastYear                 0.002496526      0.011274070
## WorkLifeBalance                       0.019604406      0.004128730
## YearsAtCompany                        0.019366787      0.015058008
## YearsInCurrentRole                   -0.015122915      0.050817873
## YearsSinceLastPromotion               0.033492502      0.014352185
##                          TotalWorkingYears TrainingTimesLastYear
## Age                            0.680380536          -0.019620819
## Attrition                     -0.171063246          -0.059477799
## DailyRate                      0.014514739           0.002452543
## DistanceFromHome               0.004628426          -0.036942234
## Education                      0.148279697          -0.025100241
## EmployeeNumber                -0.014365198           0.023603170
## EnvironmentSatisfaction       -0.002693070          -0.019359308
## HourlyRate                    -0.002333682          -0.008547685
## JobInvolvement                -0.005533182          -0.015337826
## JobLevel                       0.782207805          -0.018190550
## JobSatisfaction               -0.020185073          -0.005779335
## MonthlyIncome                  0.772893246          -0.021736277
## MonthlyRate                    0.026442471           0.001466881
## NumCompaniesWorked             0.237638590          -0.066054072
## PercentSalaryHike             -0.020608488          -0.005221012
## PerformanceRating              0.006743668          -0.015578882
## RelationshipSatisfaction       0.024054292           0.002496526
## StockOptionLevel               0.010135969           0.011274070
## TotalWorkingYears              1.000000000          -0.035661571
## TrainingTimesLastYear         -0.035661571           1.000000000
## WorkLifeBalance                0.001007646           0.028072207
## YearsAtCompany                 0.628133155           0.003568666
## YearsInCurrentRole             0.460364638          -0.005737504
## YearsSinceLastPromotion        0.404857759          -0.002066536
##                          WorkLifeBalance YearsAtCompany YearsInCurrentRole
## Age                         -0.021490028    0.311308770        0.212901056
## Attrition                   -0.063939047   -0.134392214       -0.160545004
## DailyRate                   -0.037848051   -0.034054768        0.009932015
## DistanceFromHome            -0.026556004    0.009507720        0.018844999
## Education                    0.009819189    0.069113696        0.060235554
## EmployeeNumber               0.010308641   -0.011240464       -0.008416312
## EnvironmentSatisfaction      0.027627295    0.001457549        0.018007460
## HourlyRate                  -0.004607234   -0.019581616       -0.024106220
## JobInvolvement              -0.014616593   -0.021355427        0.008716963
## JobLevel                     0.037817746    0.534738687        0.389446733
## JobSatisfaction             -0.019458710   -0.003802628       -0.002304785
## MonthlyIncome                0.030683082    0.514284826        0.363817667
## MonthlyRate                  0.007963158   -0.023655107       -0.012814874
## NumCompaniesWorked          -0.008365685   -0.118421340       -0.090753934
## PercentSalaryHike           -0.003279636   -0.035991262       -0.001520027
## PerformanceRating            0.002572361    0.003435126        0.034986260
## RelationshipSatisfaction     0.019604406    0.019366787       -0.015122915
## StockOptionLevel             0.004128730    0.015058008        0.050817873
## TotalWorkingYears            0.001007646    0.628133155        0.460364638
## TrainingTimesLastYear        0.028072207    0.003568666       -0.005737504
## WorkLifeBalance              1.000000000    0.012089185        0.049856498
## YearsAtCompany               0.012089185    1.000000000        0.758753737
## YearsInCurrentRole           0.049856498    0.758753737        1.000000000
## YearsSinceLastPromotion      0.008941249    0.618408865        0.548056248
##                          YearsSinceLastPromotion
## Age                                  0.216513368
## Attrition                           -0.033018775
## DailyRate                           -0.033228985
## DistanceFromHome                     0.010028836
## Education                            0.054254334
## EmployeeNumber                      -0.009019064
## EnvironmentSatisfaction              0.016193606
## HourlyRate                          -0.026715586
## JobInvolvement                      -0.024184292
## JobLevel                             0.353885347
## JobSatisfaction                     -0.018213568
## MonthlyIncome                        0.344977638
## MonthlyRate                          0.001566800
## NumCompaniesWorked                  -0.036813892
## PercentSalaryHike                   -0.022154313
## PerformanceRating                    0.017896066
## RelationshipSatisfaction             0.033492502
## StockOptionLevel                     0.014352185
## TotalWorkingYears                    0.404857759
## TrainingTimesLastYear               -0.002066536
## WorkLifeBalance                      0.008941249
## YearsAtCompany                       0.618408865
## YearsInCurrentRole                   0.548056248
## YearsSinceLastPromotion              1.000000000
corrplot(corr)

This correlation table shows that there is a correlation between variables in the data set. Job level and Age are strongly correlated, as well as Job Level and monthly income, as well as total working years and monthly income. Age and total working yeares are strongly correlated. There is a slight correlation between attrition and monthly income as well as age. There is no correlation between attrition and number of companies worked for.

4E) Predictive Model

Age vs. Monthly Income

p = qplot( x = age, y = monthlyincome, data = mydata ) + geom_point()
p

p + geom_smooth(method="lm" )

Age vs. Number Companies Worked

p = qplot( x = age, y = numcompaniesworked, data = mydata ) + geom_point()
p

p + geom_smooth(method="lm" )

Monthly Income vs. Number Companies Worked

p = qplot( x = monthlyincome, y = numcompaniesworked, data = mydata ) + geom_point()
p

p + geom_smooth(method="lm" )

Task #5: Visual Analytics

5A) Predictive Mondel: Linear

Age vs. Monthly Income

linear_model = lm( age ~ monthlyincome, data = mydata )
summary(linear_model)
## 
## Call:
## lm(formula = age ~ monthlyincome, data = mydata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.218  -5.703  -1.044   4.855  26.255 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.064e+01  3.526e-01   86.91   <2e-16 ***
## monthlyincome 9.660e-04  4.392e-05   22.00   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.925 on 1468 degrees of freedom
## Multiple R-squared:  0.2479, Adjusted R-squared:  0.2473 
## F-statistic: 483.8 on 1 and 1468 DF,  p-value: < 2.2e-16

Age vs. Number Companies Worked

linear_model = lm( age ~ numcompaniesworked, data = mydata )
summary(linear_model)
## 
## Call:
## lm(formula = age ~ numcompaniesworked, data = mydata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.835  -6.068  -1.068   5.453  26.027 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        33.97265    0.33445  101.58   <2e-16 ***
## numcompaniesworked  1.09578    0.09106   12.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.719 on 1468 degrees of freedom
## Multiple R-squared:  0.08978,    Adjusted R-squared:  0.08916 
## F-statistic: 144.8 on 1 and 1468 DF,  p-value: < 2.2e-16

Monthly Income vs. Number Companies Worked

linear_model = lm( monthlyincome ~ numcompaniesworked, data = mydata )
summary(linear_model)
## 
## Call:
## lm(formula = monthlyincome ~ numcompaniesworked, data = mydata)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -6171  -3325  -1578   1960  14255 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         5744.02     178.63  32.156  < 2e-16 ***
## numcompaniesworked   281.79      48.64   5.794 8.41e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4657 on 1468 degrees of freedom
## Multiple R-squared:  0.02235,    Adjusted R-squared:  0.02169 
## F-statistic: 33.57 on 1 and 1468 DF,  p-value: 8.41e-09

5B) Efficiency of the Linear Model

For age vs. monthly income, this model provides an R-squared value of 0.2479 and an adjusted R-squared value of 0.2473. Given these values, this model does not predict that age and monthly income are closely correlated. For both age vs. number companies worked and monthly income vs. number companies work, the R-squared and adjusted R-squared values are also very low. Therefore, it seems unlikely that these variables are correlated to each other.

5C) Predictive Model: Quadratic

Age vs. Monthly Income

Age = mydata$Age
Age2 = mydata$Age^2
quad_model = lm(monthlyincome ~ Age + Age2, data = mydata)
summary(quad_model)
## 
## Call:
## lm(formula = monthlyincome ~ Age + Age2, data = mydata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9757.6 -2634.2  -684.7  1814.4 12478.8 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3999.1498  1583.5766  -2.525 0.011662 *  
## Age           312.8211    83.9530   3.726 0.000202 ***
## Age2           -0.7247     1.0711  -0.677 0.498781    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4085 on 1467 degrees of freedom
## Multiple R-squared:  0.2481, Adjusted R-squared:  0.2471 
## F-statistic:   242 on 2 and 1467 DF,  p-value: < 2.2e-16

Age vs. Number Companies Worked

Age = mydata$Age
Age2 = mydata$Age^2
quad_model = lm(numcompaniesworked ~ Age + Age2, data = mydata)
summary(quad_model)
## 
## Call:
## lm(formula = numcompaniesworked ~ Age + Age2, data = mydata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1924 -1.6139 -0.7232  1.1223  7.5088 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.8665358  0.9235252  -2.021 0.043451 *  
## Age          0.1658550  0.0489605   3.388 0.000724 ***
## Age2        -0.0010812  0.0006247  -1.731 0.083686 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.382 on 1467 degrees of freedom
## Multiple R-squared:  0.09164,    Adjusted R-squared:  0.0904 
## F-statistic:    74 on 2 and 1467 DF,  p-value: < 2.2e-16

Monthly Income vs. Number Companies Worked

NumCompaniesWorked = mydata$NumCompaniesWorked
NumCompaniesWorked2 = mydata$NumCompaniesWorked^2
quad_model = lm(monthlyincome ~ NumCompaniesWorked + NumCompaniesWorked2, data = mydata)
summary(quad_model)
## 
## Call:
## lm(formula = monthlyincome ~ NumCompaniesWorked + NumCompaniesWorked2, 
##     data = mydata)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -5708  -3229  -1545   2016  14880 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          5119.26     237.13  21.589  < 2e-16 ***
## NumCompaniesWorked    909.17     164.89   5.514 4.14e-08 ***
## NumCompaniesWorked2   -78.94      19.83  -3.980 7.22e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4633 on 1467 degrees of freedom
## Multiple R-squared:  0.0328, Adjusted R-squared:  0.03148 
## F-statistic: 24.87 on 2 and 1467 DF,  p-value: 2.379e-11

5D) Efficiency of Quadratic Model

The Quadratic Model is a better fit for the data because the R-Squared and Adjusted R-squared are all higher in the quadratic model vs the linear model. This indicates that the quadratic model is showing a higher coorelation between the variables presented. For example, as age of the employee increases, so does the number of companies they work for. This has to be the case, so the higher coorelation value of the quadratic model is more accurate than the linear model. For Age and Income, as time passes, inherently so does the income of the employees. And lastly, for Number of Companies Worked For and Income, we are assuming that income is increasing. Because of these assumptioms, the higher R values of the Quadratic model are more accurate.

5E) Predictive Model (Watson Analytics)

knitr::include_graphics('imgs/screenshot1.png')

This chart shows that Monthly Income and Age are not coorelated. This makes sense because people who are older do not necessarily make more money than their younger counterparts. Therefore, there is no coorelation between the two.

knitr::include_graphics('imgs/screenshot2.png')

This chart shows that as the people who responded “no” to attrition, meaning that they were not worn out by their job and therefore stayed, collectively worked at far more companies than people who responded “yes” and left their jobs. This supports our hypothesis in saying that number of comapnies worked for is a strong predictor of attrition, because people who have worked for more companies are more likely to leave their current position.

knitr::include_graphics('imgs/screenshot3.png')

This chart shows that for people who leave their current roles more often, they make more monthly income than people who stay. This means, that people who leave their positions are leaving to get promoted, a pay raise, or a better paying firm.