Project Title:HR analysis of an it company

NAME: nissy joy bollampally

EMAIL: nissyammu777@gmail.com

COLLEGE / COMPANY: nit trichy

This data set gives us details of the employees of an it company and from the data we can study what factors effects the job satisfaction , monthly income of an employee.

reading the dataset into R

hr.df <- read.csv(paste("project.csv", sep=""))
View(hr.df)

summarising the data

library(psych)
describe(hr.df)
##                          vars    n     mean      sd  median  trimmed
## ï..Age                      1 1470    36.92    9.14    36.0    36.47
## Attrition*                  2 1470     1.16    0.37     1.0     1.08
## BusinessTravel*             3 1470     2.61    0.67     3.0     2.76
## DailyRate                   4 1470   802.49  403.51   802.0   803.83
## Department*                 5 1470     2.26    0.53     2.0     2.25
## DistanceFromHome            6 1470     9.19    8.11     7.0     8.08
## Education                   7 1470     2.91    1.02     3.0     2.98
## EducationField*             8 1470     3.25    1.33     3.0     3.10
## EmployeeCount               9 1470     1.00    0.00     1.0     1.00
## EmployeeNumber             10 1470  1024.87  602.02  1020.5  1023.40
## EnvironmentSatisfaction    11 1470     2.72    1.09     3.0     2.78
## Gender*                    12 1470     1.60    0.49     2.0     1.62
## HourlyRate                 13 1470    65.89   20.33    66.0    66.02
## JobInvolvement             14 1470     2.73    0.71     3.0     2.74
## JobLevel                   15 1470     2.06    1.11     2.0     1.90
## JobRole*                   16 1470     5.46    2.46     6.0     5.61
## JobSatisfaction            17 1470     2.73    1.10     3.0     2.79
## MaritalStatus*             18 1470     2.10    0.73     2.0     2.12
## MonthlyIncome              19 1470  6502.93 4707.96  4919.0  5667.24
## MonthlyRate                20 1470 14313.10 7117.79 14235.5 14286.48
## NumCompaniesWorked         21 1470     2.69    2.50     2.0     2.36
## Over18*                    22 1470     1.00    0.00     1.0     1.00
## OverTime*                  23 1470     1.28    0.45     1.0     1.23
## PercentSalaryHike          24 1470    15.21    3.66    14.0    14.80
## PerformanceRating          25 1470     3.15    0.36     3.0     3.07
## RelationshipSatisfaction   26 1470     2.71    1.08     3.0     2.77
## StandardHours              27 1470    80.00    0.00    80.0    80.00
## StockOptionLevel           28 1470     0.79    0.85     1.0     0.67
## TotalWorkingYears          29 1470    11.28    7.78    10.0    10.37
## TrainingTimesLastYear      30 1470     2.80    1.29     3.0     2.72
## WorkLifeBalance            31 1470     2.76    0.71     3.0     2.77
## YearsAtCompany             32 1470     7.01    6.13     5.0     5.99
## YearsInCurrentRole         33 1470     4.23    3.62     3.0     3.85
## YearsSinceLastPromotion    34 1470     2.19    3.22     1.0     1.48
## YearsWithCurrManager       35 1470     4.12    3.57     3.0     3.77
##                              mad  min   max range  skew kurtosis     se
## ï..Age                      8.90   18    60    42  0.41    -0.41   0.24
## Attrition*                  0.00    1     2     1  1.84     1.39   0.01
## BusinessTravel*             0.00    1     3     2 -1.44     0.69   0.02
## DailyRate                 510.01  102  1499  1397  0.00    -1.21  10.52
## Department*                 0.00    1     3     2  0.17    -0.40   0.01
## DistanceFromHome            7.41    1    29    28  0.96    -0.23   0.21
## Education                   1.48    1     5     4 -0.29    -0.56   0.03
## EducationField*             1.48    1     6     5  0.55    -0.69   0.03
## EmployeeCount               0.00    1     1     0   NaN      NaN   0.00
## EmployeeNumber            790.97    1  2068  2067  0.02    -1.23  15.70
## EnvironmentSatisfaction     1.48    1     4     3 -0.32    -1.20   0.03
## Gender*                     0.00    1     2     1 -0.41    -1.83   0.01
## HourlyRate                 26.69   30   100    70 -0.03    -1.20   0.53
## JobInvolvement              0.00    1     4     3 -0.50     0.26   0.02
## JobLevel                    1.48    1     5     4  1.02     0.39   0.03
## JobRole*                    2.97    1     9     8 -0.36    -1.20   0.06
## JobSatisfaction             1.48    1     4     3 -0.33    -1.22   0.03
## MaritalStatus*              1.48    1     3     2 -0.15    -1.12   0.02
## MonthlyIncome            3260.24 1009 19999 18990  1.37     0.99 122.79
## MonthlyRate              9201.76 2094 26999 24905  0.02    -1.22 185.65
## NumCompaniesWorked          1.48    0     9     9  1.02     0.00   0.07
## Over18*                     0.00    1     1     0   NaN      NaN   0.00
## OverTime*                   0.00    1     2     1  0.96    -1.07   0.01
## PercentSalaryHike           2.97   11    25    14  0.82    -0.31   0.10
## PerformanceRating           0.00    3     4     1  1.92     1.68   0.01
## RelationshipSatisfaction    1.48    1     4     3 -0.30    -1.19   0.03
## StandardHours               0.00   80    80     0   NaN      NaN   0.00
## StockOptionLevel            1.48    0     3     3  0.97     0.35   0.02
## TotalWorkingYears           5.93    0    40    40  1.11     0.91   0.20
## TrainingTimesLastYear       1.48    0     6     6  0.55     0.48   0.03
## WorkLifeBalance             0.00    1     4     3 -0.55     0.41   0.02
## YearsAtCompany              4.45    0    40    40  1.76     3.91   0.16
## YearsInCurrentRole          4.45    0    18    18  0.92     0.47   0.09
## YearsSinceLastPromotion     1.48    0    15    15  1.98     3.59   0.08
## YearsWithCurrManager        4.45    0    17    17  0.83     0.16   0.09
summary(hr.df)
##      ï..Age      Attrition            BusinessTravel   DailyRate     
##  Min.   :18.00   No :1233   Non-Travel       : 150   Min.   : 102.0  
##  1st Qu.:30.00   Yes: 237   Travel_Frequently: 277   1st Qu.: 465.0  
##  Median :36.00              Travel_Rarely    :1043   Median : 802.0  
##  Mean   :36.92                                       Mean   : 802.5  
##  3rd Qu.:43.00                                       3rd Qu.:1157.0  
##  Max.   :60.00                                       Max.   :1499.0  
##                                                                      
##                   Department  DistanceFromHome   Education    
##  Human Resources       : 63   Min.   : 1.000   Min.   :1.000  
##  Research & Development:961   1st Qu.: 2.000   1st Qu.:2.000  
##  Sales                 :446   Median : 7.000   Median :3.000  
##                               Mean   : 9.193   Mean   :2.913  
##                               3rd Qu.:14.000   3rd Qu.:4.000  
##                               Max.   :29.000   Max.   :5.000  
##                                                               
##           EducationField EmployeeCount EmployeeNumber  
##  Human Resources : 27    Min.   :1     Min.   :   1.0  
##  Life Sciences   :606    1st Qu.:1     1st Qu.: 491.2  
##  Marketing       :159    Median :1     Median :1020.5  
##  Medical         :464    Mean   :1     Mean   :1024.9  
##  Other           : 82    3rd Qu.:1     3rd Qu.:1555.8  
##  Technical Degree:132    Max.   :1     Max.   :2068.0  
##                                                        
##  EnvironmentSatisfaction    Gender      HourlyRate     JobInvolvement
##  Min.   :1.000           Female:588   Min.   : 30.00   Min.   :1.00  
##  1st Qu.:2.000           Male  :882   1st Qu.: 48.00   1st Qu.:2.00  
##  Median :3.000                        Median : 66.00   Median :3.00  
##  Mean   :2.722                        Mean   : 65.89   Mean   :2.73  
##  3rd Qu.:4.000                        3rd Qu.: 83.75   3rd Qu.:3.00  
##  Max.   :4.000                        Max.   :100.00   Max.   :4.00  
##                                                                      
##     JobLevel                          JobRole    JobSatisfaction
##  Min.   :1.000   Sales Executive          :326   Min.   :1.000  
##  1st Qu.:1.000   Research Scientist       :292   1st Qu.:2.000  
##  Median :2.000   Laboratory Technician    :259   Median :3.000  
##  Mean   :2.064   Manufacturing Director   :145   Mean   :2.729  
##  3rd Qu.:3.000   Healthcare Representative:131   3rd Qu.:4.000  
##  Max.   :5.000   Manager                  :102   Max.   :4.000  
##                  (Other)                  :215                  
##   MaritalStatus MonthlyIncome    MonthlyRate    NumCompaniesWorked
##  Divorced:327   Min.   : 1009   Min.   : 2094   Min.   :0.000     
##  Married :673   1st Qu.: 2911   1st Qu.: 8047   1st Qu.:1.000     
##  Single  :470   Median : 4919   Median :14236   Median :2.000     
##                 Mean   : 6503   Mean   :14313   Mean   :2.693     
##                 3rd Qu.: 8379   3rd Qu.:20462   3rd Qu.:4.000     
##                 Max.   :19999   Max.   :26999   Max.   :9.000     
##                                                                   
##  Over18   OverTime   PercentSalaryHike PerformanceRating
##  Y:1470   No :1054   Min.   :11.00     Min.   :3.000    
##           Yes: 416   1st Qu.:12.00     1st Qu.:3.000    
##                      Median :14.00     Median :3.000    
##                      Mean   :15.21     Mean   :3.154    
##                      3rd Qu.:18.00     3rd Qu.:3.000    
##                      Max.   :25.00     Max.   :4.000    
##                                                         
##  RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears
##  Min.   :1.000            Min.   :80    Min.   :0.0000   Min.   : 0.00    
##  1st Qu.:2.000            1st Qu.:80    1st Qu.:0.0000   1st Qu.: 6.00    
##  Median :3.000            Median :80    Median :1.0000   Median :10.00    
##  Mean   :2.712            Mean   :80    Mean   :0.7939   Mean   :11.28    
##  3rd Qu.:4.000            3rd Qu.:80    3rd Qu.:1.0000   3rd Qu.:15.00    
##  Max.   :4.000            Max.   :80    Max.   :3.0000   Max.   :40.00    
##                                                                           
##  TrainingTimesLastYear WorkLifeBalance YearsAtCompany   YearsInCurrentRole
##  Min.   :0.000         Min.   :1.000   Min.   : 0.000   Min.   : 0.000    
##  1st Qu.:2.000         1st Qu.:2.000   1st Qu.: 3.000   1st Qu.: 2.000    
##  Median :3.000         Median :3.000   Median : 5.000   Median : 3.000    
##  Mean   :2.799         Mean   :2.761   Mean   : 7.008   Mean   : 4.229    
##  3rd Qu.:3.000         3rd Qu.:3.000   3rd Qu.: 9.000   3rd Qu.: 7.000    
##  Max.   :6.000         Max.   :4.000   Max.   :40.000   Max.   :18.000    
##                                                                           
##  YearsSinceLastPromotion YearsWithCurrManager
##  Min.   : 0.000          Min.   : 0.000      
##  1st Qu.: 0.000          1st Qu.: 2.000      
##  Median : 1.000          Median : 3.000      
##  Mean   : 2.188          Mean   : 4.123      
##  3rd Qu.: 3.000          3rd Qu.: 7.000      
##  Max.   :15.000          Max.   :17.000      
## 

tables to understand the different variables in the dataset.

aggregate((hr.df$MonthlyIncome~hr.df$JobSatisfaction), FUN=mean)
##   hr.df$JobSatisfaction hr.df$MonthlyIncome
## 1                     1            6561.571
## 2                     2            6527.329
## 3                     3            6480.495
## 4                     4            6472.732
aggregate((hr.df$MonthlyIncome~hr.df$Education), FUN=mean)
##   hr.df$Education hr.df$MonthlyIncome
## 1               1            5640.571
## 2               2            6226.645
## 3               3            6517.264
## 4               4            6832.402
## 5               5            8277.646
aggregate((hr.df$MonthlyIncome~hr.df$Gender), FUN=mean)
##   hr.df$Gender hr.df$MonthlyIncome
## 1       Female            6686.566
## 2         Male            6380.508
aggregate((hr.df$JobSatisfaction~hr.df$MaritalStatus), FUN=mean)
##   hr.df$MaritalStatus hr.df$JobSatisfaction
## 1            Divorced              2.697248
## 2             Married              2.716196
## 3              Single              2.768085
aggregate((hr.df$MonthlyIncome~hr.df$PerformanceRating), FUN=mean)
##   hr.df$PerformanceRating hr.df$MonthlyIncome
## 1                       3            6537.274
## 2                       4            6313.894
mytable<- xtabs(~JobSatisfaction+MaritalStatus,data=hr.df)
mytable
##                MaritalStatus
## JobSatisfaction Divorced Married Single
##               1       70     130     89
##               2       61     131     88
##               3       94     212    136
##               4      102     200    157
mytable1<- xtabs(~JobSatisfaction+JobRole,data=hr.df)
mytable1
##                JobRole
## JobSatisfaction Healthcare Representative Human Resources
##               1                        26              10
##               2                        19              16
##               3                        43              13
##               4                        43              13
##                JobRole
## JobSatisfaction Laboratory Technician Manager Manufacturing Director
##               1                    56      21                     26
##               2                    48      21                     32
##               3                    75      27                     49
##               4                    80      33                     38
##                JobRole
## JobSatisfaction Research Director Research Scientist Sales Executive
##               1                15                 54              69
##               2                16                 53              54
##               3                27                 90              91
##               4                22                 95             112
##                JobRole
## JobSatisfaction Sales Representative
##               1                   12
##               2                   21
##               3                   27
##               4                   23
mytable2<- xtabs(~JobSatisfaction+OverTime,data=hr.df)
mytable2
##                OverTime
## JobSatisfaction  No Yes
##               1 205  84
##               2 211  69
##               3 321 121
##               4 317 142
mytable3<- xtabs(~JobSatisfaction+TotalWorkingYears,data=hr.df)
mytable3
##                TotalWorkingYears
## JobSatisfaction  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
##               1  0 16  4  4 10 19 27 11 18 28 43  9  4  6  3  9  8 10  8
##               2  1 23  7  7 16 12 26 17 21 12 28  8 11  9  7  7  9  4  5
##               3  5 20 11 10 21 35 34 27 28 26 69  8 10 11  8 12  6 10  7
##               4  5 22  9 21 16 22 38 26 36 30 62 11 23 10 13 12 14  9  7
##                TotalWorkingYears
## JobSatisfaction 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
##               1  3  8  7  7  2  7  3  1  1  3  0  0  3  2  1  2  0  1  1
##               2  4  3  6  5  3  1  2  3  1  4  5  1  1  1  2  1  2  2  2
##               3  8 15 13  6  9  3  4  5  4  4  3  2  1  2  2  1  0  1  0
##               4  7  4  8  3  8  7  5  5  1  3  2  4  4  4  2  1  1  2  1
##                TotalWorkingYears
## JobSatisfaction 38 40
##               1  0  0
##               2  0  1
##               3  1  0
##               4  0  1
mytable4<- xtabs(~JobSatisfaction+YearsSinceLastPromotion,data=hr.df)
mytable4
##                YearsSinceLastPromotion
## JobSatisfaction   0   1   2   3   4   5   6   7   8   9  10  11  12  13
##               1 117  72  25   8   7   9  11  17   3   4   0   5   2   3
##               2 109  69  28  12  13   8   5  18   3   1   2   4   2   1
##               3 180 105  50  13  19  16   7  19   5   6   2   7   3   2
##               4 175 111  56  19  22  12   9  22   7   6   2   8   3   4
##                YearsSinceLastPromotion
## JobSatisfaction  14  15
##               1   3   3
##               2   2   3
##               3   3   5
##               4   1   2

histograms to visualise different variables

hist(hr.df$JobSatisfaction,breaks=20,main = "distribution of job satisfaction",col="pink")

hist(hr.df$MonthlyIncome,breaks=20,main = "distribution of job satisfaction",col="pink")

hist(hr.df$JobInvolvement,breaks=20,main = "distribution of job satisfaction",col="pink")

hist(hr.df$JobLevel,breaks=20,main = "distribution of job satisfaction",col="pink")

hist(hr.df$PerformanceRating,breaks=5,main = "distribution of job satisfaction",col="pink")

hist(hr.df$TotalWorkingYears,breaks=5,main = "distribution of job satisfaction",col="pink")

hist(hr.df$YearsAtCompany,breaks=5,main = "distribution of job satisfaction",col="pink")

scatterplots

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplot(ï..Age~MonthlyIncome,     data=hr.df,
            spread=FALSE,
            main="Scatter plot of salary vs age",
            xlab="age",
            ylab="salary")

scatterplot(Gender~MonthlyIncome,     data=hr.df,
            spread=FALSE,
            main="Scatter plot of salary vs sex",
            xlab="sex",
            ylab="salary")
## Warning in Ops.factor(x[floor(d)], x[ceiling(d)]): '+' not meaningful for
## factors
## Warning in smoother(.x, .y, col = col[2], log.x = logged("x"), log.y =
## logged("y"), : could not fit smooth
## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors

scatterplot(TotalWorkingYears~MonthlyIncome,     data=hr.df,
            spread=FALSE,
            main="Scatter plot of salary vs total working experience",
            xlab="working experience",
            ylab="salary")

boxplot(hr.df$MonthlyIncome~hr.df$TotalWorkingYears)

boxplot(hr.df$MonthlyIncome~hr.df$ï..Age)

boxplot(hr.df$MonthlyIncome~hr.df$PerformanceRating)

boxplot(hr.df$MonthlyIncome~hr.df$MaritalStatus)

boxplot(hr.df$MonthlyIncome~hr.df$JobSatisfaction)

boxplot(hr.df$MonthlyIncome~hr.df$JobInvolvement)

boxplot(hr.df$MonthlyIncome~hr.df$JobRole)

boxplot(hr.df$MonthlyIncome~hr.df$JobLevel)

boxplot(hr.df$MonthlyIncome~hr.df$Gender)

T-tests

 t.test(JobSatisfaction~OverTime,data=hr.df)
## 
##  Welch Two Sample t-test
## 
## data:  JobSatisfaction by OverTime
## t = -0.92914, df = 741.84, p-value = 0.3531
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.18695896  0.06683964
## sample estimates:
##  mean in group No mean in group Yes 
##          2.711575          2.771635

from the above t-test we can infer that there is no significant realtion between overtime and job satisfaction since the p-vale is >0.05.

 t.test(JobSatisfaction~Gender,data=hr.df)
## 
##  Welch Two Sample t-test
## 
## data:  JobSatisfaction by Gender
## t = -1.2773, df = 1266.6, p-value = 0.2017
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.18976672  0.04010685
## sample estimates:
## mean in group Female   mean in group Male 
##             2.683673             2.758503

from the abocve test we can infer that there is no realtion between gender and one’s job satisfaction.

cor.test(hr.df$JobSatisfaction,hr.df$MonthlyIncome)
## 
##  Pearson's product-moment correlation
## 
## data:  hr.df$JobSatisfaction and hr.df$MonthlyIncome
## t = -0.27421, df = 1468, p-value = 0.784
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.05826288  0.04398681
## sample estimates:
##          cor 
## -0.007156742

from the above test we can infer that there is no significant correlation between monthaly income and job sarisfaction since the p-value of the test is >0.05.

cor.test(hr.df$ï..Age,hr.df$MonthlyIncome)
## 
##  Pearson's product-moment correlation
## 
## data:  hr.df$ï..Age and hr.df$MonthlyIncome
## t = 21.995, df = 1468, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4583951 0.5353551
## sample estimates:
##       cor 
## 0.4978546

from the above test we can infer that there is significant corrrelation between montly income and age of the employee since the p-vale of the test is <0.05.

cor.test(hr.df$TotalWorkingYears,hr.df$MonthlyIncome)
## 
##  Pearson's product-moment correlation
## 
## data:  hr.df$TotalWorkingYears and hr.df$MonthlyIncome
## t = 46.669, df = 1468, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7514606 0.7926965
## sample estimates:
##       cor 
## 0.7728932

from the above test we can infer that there is significant corrrelation between montly income and total working years since the p-vale of the test is <0.05.

cor.test(hr.df$Education,hr.df$MonthlyIncome)
## 
##  Pearson's product-moment correlation
## 
## data:  hr.df$Education and hr.df$MonthlyIncome
## t = 3.6549, df = 1468, p-value = 0.0002664
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.04404707 0.14538229
## sample estimates:
##        cor 
## 0.09496068

from the above test we can infer that there is significant corrrelation between montly income and education since the p-vale of the test is <0.05.

REGRESSION MODEL-1

fit <- lm(MonthlyIncome ~ ï..Age+Education+JobInvolvement+JobLevel+JobSatisfaction+PerformanceRating+TotalWorkingYears  ,data=hr.df)
fit
## 
## Call:
## lm(formula = MonthlyIncome ~ ï..Age + Education + JobInvolvement + 
##     JobLevel + JobSatisfaction + PerformanceRating + TotalWorkingYears, 
##     data = hr.df)
## 
## Coefficients:
##       (Intercept)             ï..Age          Education  
##         -1515.160             -6.894            -24.801  
##    JobInvolvement           JobLevel    JobSatisfaction  
##           -19.712           3784.821            -16.527  
## PerformanceRating  TotalWorkingYears  
##            13.002             52.408
summary(fit)
## 
## Call:
## lm(formula = MonthlyIncome ~ ï..Age + Education + JobInvolvement + 
##     JobLevel + JobSatisfaction + PerformanceRating + TotalWorkingYears, 
##     data = hr.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5465.8  -920.7    79.0   785.0  3889.9 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -1515.160    425.779  -3.559 0.000385 ***
## ï..Age               -6.894      5.728  -1.204 0.228933    
## Education           -24.801     37.833  -0.656 0.512213    
## JobInvolvement      -19.712     53.328  -0.370 0.711702    
## JobLevel           3784.821     55.031  68.776  < 2e-16 ***
## JobSatisfaction     -16.527     34.351  -0.481 0.630508    
## PerformanceRating    13.002    105.085   0.124 0.901549    
## TotalWorkingYears    52.408      9.192   5.701 1.44e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1451 on 1462 degrees of freedom
## Multiple R-squared:  0.9055, Adjusted R-squared:  0.905 
## F-statistic:  2001 on 7 and 1462 DF,  p-value: < 2.2e-16
coefficients(fit)
##       (Intercept)            ï..Age         Education    JobInvolvement 
##      -1515.160223         -6.894373        -24.801441        -19.712164 
##          JobLevel   JobSatisfaction PerformanceRating TotalWorkingYears 
##       3784.820884        -16.526687         13.001700         52.408062

REGRESSION MODEL-2

fit1 <- lm(MonthlyIncome ~ ï..Age+Education+JobLevel+TotalWorkingYears  ,data=hr.df)
fit1
## 
## Call:
## lm(formula = MonthlyIncome ~ ï..Age + Education + JobLevel + 
##     TotalWorkingYears, data = hr.df)
## 
## Coefficients:
##       (Intercept)             ï..Age          Education  
##         -1568.480             -7.019            -25.265  
##          JobLevel  TotalWorkingYears  
##          3784.140             52.654
summary(fit1)
## 
## Call:
## lm(formula = MonthlyIncome ~ ï..Age + Education + JobLevel + 
##     TotalWorkingYears, data = hr.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5446.0  -919.2    75.0   785.0  3877.9 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -1568.480    195.595  -8.019 2.16e-15 ***
## ï..Age               -7.019      5.718  -1.228    0.220    
## Education           -25.265     37.759  -0.669    0.504    
## JobLevel           3784.140     54.913  68.912  < 2e-16 ***
## TotalWorkingYears    52.654      9.172   5.741 1.14e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1450 on 1465 degrees of freedom
## Multiple R-squared:  0.9055, Adjusted R-squared:  0.9052 
## F-statistic:  3508 on 4 and 1465 DF,  p-value: < 2.2e-16
coefficients(fit1)
##       (Intercept)            ï..Age         Education          JobLevel 
##      -1568.480036         -7.018903        -25.264873       3784.139537 
## TotalWorkingYears 
##         52.653732

REGRESSION MODEL-3

fit2 <- lm(MonthlyIncome ~ JobLevel+TotalWorkingYears  ,data=hr.df)
fit2
## 
## Call:
## lm(formula = MonthlyIncome ~ JobLevel + TotalWorkingYears, data = hr.df)
## 
## Coefficients:
##       (Intercept)           JobLevel  TotalWorkingYears  
##          -1835.86            3788.38              46.08
summary(fit2)
## 
## Call:
## lm(formula = MonthlyIncome ~ JobLevel + TotalWorkingYears, data = hr.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5425.2  -924.7    83.0   791.2  3917.5 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -1835.862     80.019 -22.943  < 2e-16 ***
## JobLevel           3788.378     54.843  69.077  < 2e-16 ***
## TotalWorkingYears    46.082      7.802   5.906 4.34e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1450 on 1467 degrees of freedom
## Multiple R-squared:  0.9053, Adjusted R-squared:  0.9052 
## F-statistic:  7014 on 2 and 1467 DF,  p-value: < 2.2e-16
coefficients(fit2)
##       (Intercept)          JobLevel TotalWorkingYears 
##       -1835.86170        3788.37847          46.08199

INFERENCES FROM THE DATA:

1.monthly income depends upon factors such as education,gender, total working years of an employee,age.

2.job satisfaction of an emplyee is insignificant of thier monthly income.

3.from the above three regression models the best one is the second one due to its high r-squared value of 0.9055.

4.from the best regression model we can write the equation for the monthly income of an employee as:

monthlyincome=-1568.480036-7.018903(age)-25.264873(education)+3784.139537(joblevel)+52.653732(totalworkingyears).

REFRENCES:

1.https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset

2.Professor Sameer Mathur at the Indian Institute of Management (IIM), Lucknow

3.https://www.udemy.com/data-science-and-analytics-using-r/learn/v4/content