Summary of the data

Employee_Leave_reason <- HR_comma_sep
summary(Employee_Leave_reason)
##  satisfaction_level last_evaluation  number_project  average_montly_hours
##  Min.   :0.0900     Min.   :0.3600   Min.   :2.000   Min.   : 96.0       
##  1st Qu.:0.4400     1st Qu.:0.5600   1st Qu.:3.000   1st Qu.:156.0       
##  Median :0.6400     Median :0.7200   Median :4.000   Median :200.0       
##  Mean   :0.6128     Mean   :0.7161   Mean   :3.803   Mean   :201.1       
##  3rd Qu.:0.8200     3rd Qu.:0.8700   3rd Qu.:5.000   3rd Qu.:245.0       
##  Max.   :1.0000     Max.   :1.0000   Max.   :7.000   Max.   :310.0       
##  time_spend_company Work_accident         left       
##  Min.   : 2.000     Min.   :0.0000   Min.   :0.0000  
##  1st Qu.: 3.000     1st Qu.:0.0000   1st Qu.:0.0000  
##  Median : 3.000     Median :0.0000   Median :0.0000  
##  Mean   : 3.498     Mean   :0.1446   Mean   :0.2381  
##  3rd Qu.: 4.000     3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :10.000     Max.   :1.0000   Max.   :1.0000  
##  promotion_last_5years    sales              salary         
##  Min.   :0.00000       Length:14999       Length:14999      
##  1st Qu.:0.00000       Class :character   Class :character  
##  Median :0.00000       Mode  :character   Mode  :character  
##  Mean   :0.02127                                            
##  3rd Qu.:0.00000                                            
##  Max.   :1.00000

Data Analysis

### Category Analysis: 

###1.Based on the Frequecny graph, sales workers have the highest prcentage (27.6%) to leave the company.It probably becasue of salary and commision or sales have more connection, they can have much better opportunity. Other job category are in the middle (Marketing, Account,it, Product,etc), almost 90% of the them are low and medium slary workers.                            ###2.Managements have the least percentage,higher level workers are more stable in their work. Moverover,higher salary people only have (8.25%) a small portion, which corresponing to management occupation. 
Employee_Leave_reason1 <- rename (Employee_Leave_reason, Occupation = sales)
freq(Employee_Leave_reason1)

##     Occupation frequency percentage cumulative_perc
## 1        sales      4140      27.60           27.60
## 2    technical      2720      18.13           45.73
## 3      support      2229      14.86           60.59
## 4           IT      1227       8.18           68.77
## 5  product_mng       902       6.01           74.78
## 6    marketing       858       5.72           80.50
## 7        RandD       787       5.25           85.75
## 8   accounting       767       5.11           90.86
## 9           hr       739       4.93           95.79
## 10  management       630       4.20          100.00

##   salary frequency percentage cumulative_perc
## 1    low      7316      48.78           48.78
## 2 medium      6446      42.98           91.76
## 3   high      1237       8.25          100.00
## [1] "Variables processed: Occupation, salary"
##  Continuous variables Analysis:

### 1. Satisfaction Level:it's negative skewness and platykurtic, data are light-tailed or lack of outliers, that implies most of workers's satisfication level are over than average 0.613.
### 2. Last_evaluation:it's fairly symmetrical and platykurtic, Kurtosis only 1.8, extreme values are less than that of the normal distribution.
### 3. Nmber_Project:It's positive skew and Platyurtic, skewness less than 0.5 and kurtosis less than 3.That implies that most workers finished the number of projects less than average 3.38. 
### 4.Average_monthly_hours: it's fairly symmetrica and platykurtic, which indicates that most workers are spend around 201.05 hours.
### 5.time_spend_company It's highly skewed and leptokurtic. Most workers spend approximately 3.498 per day,but Kurtosis grate than 3 implies outliers some poeple spend spend 7.5 or even 10 hours perday.
###6. work_acident and promotion_last_5years: both of them are Leptokurtic and highly skewed
### left: highly skewed and Platykurtic
### Based on the data, Occupation and salary are category, others are numeric.
plot_num(Employee_Leave_reason1)

profiling_num(Employee_Leave_reason1)
##                variable         mean    std_dev variation_coef   p_01
## 1    satisfaction_level   0.61283352  0.2486307      0.4057067   0.09
## 2       last_evaluation   0.71610174  0.1711691      0.2390290   0.39
## 3        number_project   3.80305354  1.2325924      0.3241060   2.00
## 4  average_montly_hours 201.05033669 49.9430994      0.2484109 104.00
## 5    time_spend_company   3.49823322  1.4601362      0.4173925   2.00
## 6         Work_accident   0.14460964  0.3517186      2.4321930   0.00
## 7                  left   0.23808254  0.4259241      1.7889766   0.00
## 8 promotion_last_5years   0.02126808  0.1442815      6.7839426   0.00
##     p_05   p_25   p_50   p_75   p_95   p_99    skewness  kurtosis   iqr
## 1   0.11   0.44   0.64   0.82   0.96   0.99 -0.47631270  2.328965  0.38
## 2   0.46   0.56   0.72   0.87   0.98   1.00 -0.02661909  1.760973  0.31
## 3   2.00   3.00   4.00   5.00   6.00   7.00  0.33767184  2.504287  2.00
## 4 130.00 156.00 200.00 245.00 275.00 301.00  0.05283670  1.864997 89.00
## 5   2.00   3.00   3.00   4.00   6.00  10.00  1.85313370  7.771220  1.00
## 6   0.00   0.00   0.00   0.00   1.00   1.00  2.02094660  5.084225  0.00
## 7   0.00   0.00   0.00   0.00   1.00   1.00  1.22991957  2.512702  0.00
## 8   0.00   0.00   0.00   0.00   0.00   1.00  6.63630462 45.040539  0.00
##       range_98     range_80
## 1 [0.09, 0.99] [0.21, 0.92]
## 2    [0.39, 1] [0.49, 0.95]
## 3       [2, 7]       [2, 5]
## 4   [104, 301]   [137, 267]
## 5      [2, 10]       [2, 5]
## 6       [0, 1]       [0, 1]
## 7       [0, 1]       [0, 1]
## 8       [0, 1]       [0, 0]