Project Title:HR analysis of an it company
NAME: nissy joy bollampally
COLLEGE / COMPANY: nit trichy
This data set gives us details of the employees of an it company and from the data we can study what factors effects the job satisfaction , monthly income of an employee.
reading the dataset into R
hr.df <- read.csv(paste("project.csv", sep=""))
View(hr.df)
summarising the data
library(psych)
describe(hr.df)
## vars n mean sd median trimmed
## ï..Age 1 1470 36.92 9.14 36.0 36.47
## Attrition* 2 1470 1.16 0.37 1.0 1.08
## BusinessTravel* 3 1470 2.61 0.67 3.0 2.76
## DailyRate 4 1470 802.49 403.51 802.0 803.83
## Department* 5 1470 2.26 0.53 2.0 2.25
## DistanceFromHome 6 1470 9.19 8.11 7.0 8.08
## Education 7 1470 2.91 1.02 3.0 2.98
## EducationField* 8 1470 3.25 1.33 3.0 3.10
## EmployeeCount 9 1470 1.00 0.00 1.0 1.00
## EmployeeNumber 10 1470 1024.87 602.02 1020.5 1023.40
## EnvironmentSatisfaction 11 1470 2.72 1.09 3.0 2.78
## Gender* 12 1470 1.60 0.49 2.0 1.62
## HourlyRate 13 1470 65.89 20.33 66.0 66.02
## JobInvolvement 14 1470 2.73 0.71 3.0 2.74
## JobLevel 15 1470 2.06 1.11 2.0 1.90
## JobRole* 16 1470 5.46 2.46 6.0 5.61
## JobSatisfaction 17 1470 2.73 1.10 3.0 2.79
## MaritalStatus* 18 1470 2.10 0.73 2.0 2.12
## MonthlyIncome 19 1470 6502.93 4707.96 4919.0 5667.24
## MonthlyRate 20 1470 14313.10 7117.79 14235.5 14286.48
## NumCompaniesWorked 21 1470 2.69 2.50 2.0 2.36
## Over18* 22 1470 1.00 0.00 1.0 1.00
## OverTime* 23 1470 1.28 0.45 1.0 1.23
## PercentSalaryHike 24 1470 15.21 3.66 14.0 14.80
## PerformanceRating 25 1470 3.15 0.36 3.0 3.07
## RelationshipSatisfaction 26 1470 2.71 1.08 3.0 2.77
## StandardHours 27 1470 80.00 0.00 80.0 80.00
## StockOptionLevel 28 1470 0.79 0.85 1.0 0.67
## TotalWorkingYears 29 1470 11.28 7.78 10.0 10.37
## TrainingTimesLastYear 30 1470 2.80 1.29 3.0 2.72
## WorkLifeBalance 31 1470 2.76 0.71 3.0 2.77
## YearsAtCompany 32 1470 7.01 6.13 5.0 5.99
## YearsInCurrentRole 33 1470 4.23 3.62 3.0 3.85
## YearsSinceLastPromotion 34 1470 2.19 3.22 1.0 1.48
## YearsWithCurrManager 35 1470 4.12 3.57 3.0 3.77
## mad min max range skew kurtosis se
## ï..Age 8.90 18 60 42 0.41 -0.41 0.24
## Attrition* 0.00 1 2 1 1.84 1.39 0.01
## BusinessTravel* 0.00 1 3 2 -1.44 0.69 0.02
## DailyRate 510.01 102 1499 1397 0.00 -1.21 10.52
## Department* 0.00 1 3 2 0.17 -0.40 0.01
## DistanceFromHome 7.41 1 29 28 0.96 -0.23 0.21
## Education 1.48 1 5 4 -0.29 -0.56 0.03
## EducationField* 1.48 1 6 5 0.55 -0.69 0.03
## EmployeeCount 0.00 1 1 0 NaN NaN 0.00
## EmployeeNumber 790.97 1 2068 2067 0.02 -1.23 15.70
## EnvironmentSatisfaction 1.48 1 4 3 -0.32 -1.20 0.03
## Gender* 0.00 1 2 1 -0.41 -1.83 0.01
## HourlyRate 26.69 30 100 70 -0.03 -1.20 0.53
## JobInvolvement 0.00 1 4 3 -0.50 0.26 0.02
## JobLevel 1.48 1 5 4 1.02 0.39 0.03
## JobRole* 2.97 1 9 8 -0.36 -1.20 0.06
## JobSatisfaction 1.48 1 4 3 -0.33 -1.22 0.03
## MaritalStatus* 1.48 1 3 2 -0.15 -1.12 0.02
## MonthlyIncome 3260.24 1009 19999 18990 1.37 0.99 122.79
## MonthlyRate 9201.76 2094 26999 24905 0.02 -1.22 185.65
## NumCompaniesWorked 1.48 0 9 9 1.02 0.00 0.07
## Over18* 0.00 1 1 0 NaN NaN 0.00
## OverTime* 0.00 1 2 1 0.96 -1.07 0.01
## PercentSalaryHike 2.97 11 25 14 0.82 -0.31 0.10
## PerformanceRating 0.00 3 4 1 1.92 1.68 0.01
## RelationshipSatisfaction 1.48 1 4 3 -0.30 -1.19 0.03
## StandardHours 0.00 80 80 0 NaN NaN 0.00
## StockOptionLevel 1.48 0 3 3 0.97 0.35 0.02
## TotalWorkingYears 5.93 0 40 40 1.11 0.91 0.20
## TrainingTimesLastYear 1.48 0 6 6 0.55 0.48 0.03
## WorkLifeBalance 0.00 1 4 3 -0.55 0.41 0.02
## YearsAtCompany 4.45 0 40 40 1.76 3.91 0.16
## YearsInCurrentRole 4.45 0 18 18 0.92 0.47 0.09
## YearsSinceLastPromotion 1.48 0 15 15 1.98 3.59 0.08
## YearsWithCurrManager 4.45 0 17 17 0.83 0.16 0.09
summary(hr.df)
## ï..Age Attrition BusinessTravel DailyRate
## Min. :18.00 No :1233 Non-Travel : 150 Min. : 102.0
## 1st Qu.:30.00 Yes: 237 Travel_Frequently: 277 1st Qu.: 465.0
## Median :36.00 Travel_Rarely :1043 Median : 802.0
## Mean :36.92 Mean : 802.5
## 3rd Qu.:43.00 3rd Qu.:1157.0
## Max. :60.00 Max. :1499.0
##
## Department DistanceFromHome Education
## Human Resources : 63 Min. : 1.000 Min. :1.000
## Research & Development:961 1st Qu.: 2.000 1st Qu.:2.000
## Sales :446 Median : 7.000 Median :3.000
## Mean : 9.193 Mean :2.913
## 3rd Qu.:14.000 3rd Qu.:4.000
## Max. :29.000 Max. :5.000
##
## EducationField EmployeeCount EmployeeNumber
## Human Resources : 27 Min. :1 Min. : 1.0
## Life Sciences :606 1st Qu.:1 1st Qu.: 491.2
## Marketing :159 Median :1 Median :1020.5
## Medical :464 Mean :1 Mean :1024.9
## Other : 82 3rd Qu.:1 3rd Qu.:1555.8
## Technical Degree:132 Max. :1 Max. :2068.0
##
## EnvironmentSatisfaction Gender HourlyRate JobInvolvement
## Min. :1.000 Female:588 Min. : 30.00 Min. :1.00
## 1st Qu.:2.000 Male :882 1st Qu.: 48.00 1st Qu.:2.00
## Median :3.000 Median : 66.00 Median :3.00
## Mean :2.722 Mean : 65.89 Mean :2.73
## 3rd Qu.:4.000 3rd Qu.: 83.75 3rd Qu.:3.00
## Max. :4.000 Max. :100.00 Max. :4.00
##
## JobLevel JobRole JobSatisfaction
## Min. :1.000 Sales Executive :326 Min. :1.000
## 1st Qu.:1.000 Research Scientist :292 1st Qu.:2.000
## Median :2.000 Laboratory Technician :259 Median :3.000
## Mean :2.064 Manufacturing Director :145 Mean :2.729
## 3rd Qu.:3.000 Healthcare Representative:131 3rd Qu.:4.000
## Max. :5.000 Manager :102 Max. :4.000
## (Other) :215
## MaritalStatus MonthlyIncome MonthlyRate NumCompaniesWorked
## Divorced:327 Min. : 1009 Min. : 2094 Min. :0.000
## Married :673 1st Qu.: 2911 1st Qu.: 8047 1st Qu.:1.000
## Single :470 Median : 4919 Median :14236 Median :2.000
## Mean : 6503 Mean :14313 Mean :2.693
## 3rd Qu.: 8379 3rd Qu.:20462 3rd Qu.:4.000
## Max. :19999 Max. :26999 Max. :9.000
##
## Over18 OverTime PercentSalaryHike PerformanceRating
## Y:1470 No :1054 Min. :11.00 Min. :3.000
## Yes: 416 1st Qu.:12.00 1st Qu.:3.000
## Median :14.00 Median :3.000
## Mean :15.21 Mean :3.154
## 3rd Qu.:18.00 3rd Qu.:3.000
## Max. :25.00 Max. :4.000
##
## RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears
## Min. :1.000 Min. :80 Min. :0.0000 Min. : 0.00
## 1st Qu.:2.000 1st Qu.:80 1st Qu.:0.0000 1st Qu.: 6.00
## Median :3.000 Median :80 Median :1.0000 Median :10.00
## Mean :2.712 Mean :80 Mean :0.7939 Mean :11.28
## 3rd Qu.:4.000 3rd Qu.:80 3rd Qu.:1.0000 3rd Qu.:15.00
## Max. :4.000 Max. :80 Max. :3.0000 Max. :40.00
##
## TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole
## Min. :0.000 Min. :1.000 Min. : 0.000 Min. : 0.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 3.000 1st Qu.: 2.000
## Median :3.000 Median :3.000 Median : 5.000 Median : 3.000
## Mean :2.799 Mean :2.761 Mean : 7.008 Mean : 4.229
## 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.: 9.000 3rd Qu.: 7.000
## Max. :6.000 Max. :4.000 Max. :40.000 Max. :18.000
##
## YearsSinceLastPromotion YearsWithCurrManager
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 2.000
## Median : 1.000 Median : 3.000
## Mean : 2.188 Mean : 4.123
## 3rd Qu.: 3.000 3rd Qu.: 7.000
## Max. :15.000 Max. :17.000
##
tables to understand the different variables in the dataset.
aggregate((hr.df$MonthlyIncome~hr.df$JobSatisfaction), FUN=mean)
## hr.df$JobSatisfaction hr.df$MonthlyIncome
## 1 1 6561.571
## 2 2 6527.329
## 3 3 6480.495
## 4 4 6472.732
aggregate((hr.df$MonthlyIncome~hr.df$Education), FUN=mean)
## hr.df$Education hr.df$MonthlyIncome
## 1 1 5640.571
## 2 2 6226.645
## 3 3 6517.264
## 4 4 6832.402
## 5 5 8277.646
aggregate((hr.df$MonthlyIncome~hr.df$Gender), FUN=mean)
## hr.df$Gender hr.df$MonthlyIncome
## 1 Female 6686.566
## 2 Male 6380.508
aggregate((hr.df$JobSatisfaction~hr.df$MaritalStatus), FUN=mean)
## hr.df$MaritalStatus hr.df$JobSatisfaction
## 1 Divorced 2.697248
## 2 Married 2.716196
## 3 Single 2.768085
aggregate((hr.df$MonthlyIncome~hr.df$PerformanceRating), FUN=mean)
## hr.df$PerformanceRating hr.df$MonthlyIncome
## 1 3 6537.274
## 2 4 6313.894
mytable<- xtabs(~JobSatisfaction+MaritalStatus,data=hr.df)
mytable
## MaritalStatus
## JobSatisfaction Divorced Married Single
## 1 70 130 89
## 2 61 131 88
## 3 94 212 136
## 4 102 200 157
mytable1<- xtabs(~JobSatisfaction+JobRole,data=hr.df)
mytable1
## JobRole
## JobSatisfaction Healthcare Representative Human Resources
## 1 26 10
## 2 19 16
## 3 43 13
## 4 43 13
## JobRole
## JobSatisfaction Laboratory Technician Manager Manufacturing Director
## 1 56 21 26
## 2 48 21 32
## 3 75 27 49
## 4 80 33 38
## JobRole
## JobSatisfaction Research Director Research Scientist Sales Executive
## 1 15 54 69
## 2 16 53 54
## 3 27 90 91
## 4 22 95 112
## JobRole
## JobSatisfaction Sales Representative
## 1 12
## 2 21
## 3 27
## 4 23
mytable2<- xtabs(~JobSatisfaction+OverTime,data=hr.df)
mytable2
## OverTime
## JobSatisfaction No Yes
## 1 205 84
## 2 211 69
## 3 321 121
## 4 317 142
mytable3<- xtabs(~JobSatisfaction+TotalWorkingYears,data=hr.df)
mytable3
## TotalWorkingYears
## JobSatisfaction 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## 1 0 16 4 4 10 19 27 11 18 28 43 9 4 6 3 9 8 10 8
## 2 1 23 7 7 16 12 26 17 21 12 28 8 11 9 7 7 9 4 5
## 3 5 20 11 10 21 35 34 27 28 26 69 8 10 11 8 12 6 10 7
## 4 5 22 9 21 16 22 38 26 36 30 62 11 23 10 13 12 14 9 7
## TotalWorkingYears
## JobSatisfaction 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
## 1 3 8 7 7 2 7 3 1 1 3 0 0 3 2 1 2 0 1 1
## 2 4 3 6 5 3 1 2 3 1 4 5 1 1 1 2 1 2 2 2
## 3 8 15 13 6 9 3 4 5 4 4 3 2 1 2 2 1 0 1 0
## 4 7 4 8 3 8 7 5 5 1 3 2 4 4 4 2 1 1 2 1
## TotalWorkingYears
## JobSatisfaction 38 40
## 1 0 0
## 2 0 1
## 3 1 0
## 4 0 1
mytable4<- xtabs(~JobSatisfaction+YearsSinceLastPromotion,data=hr.df)
mytable4
## YearsSinceLastPromotion
## JobSatisfaction 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 1 117 72 25 8 7 9 11 17 3 4 0 5 2 3
## 2 109 69 28 12 13 8 5 18 3 1 2 4 2 1
## 3 180 105 50 13 19 16 7 19 5 6 2 7 3 2
## 4 175 111 56 19 22 12 9 22 7 6 2 8 3 4
## YearsSinceLastPromotion
## JobSatisfaction 14 15
## 1 3 3
## 2 2 3
## 3 3 5
## 4 1 2
histograms to visualise different variables
hist(hr.df$JobSatisfaction,breaks=20,main = "distribution of job satisfaction",col="pink")

hist(hr.df$MonthlyIncome,breaks=20,main = "distribution of job satisfaction",col="pink")

hist(hr.df$JobInvolvement,breaks=20,main = "distribution of job satisfaction",col="pink")

hist(hr.df$JobLevel,breaks=20,main = "distribution of job satisfaction",col="pink")

hist(hr.df$PerformanceRating,breaks=5,main = "distribution of job satisfaction",col="pink")

hist(hr.df$TotalWorkingYears,breaks=5,main = "distribution of job satisfaction",col="pink")

hist(hr.df$YearsAtCompany,breaks=5,main = "distribution of job satisfaction",col="pink")

scatterplots
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplot(ï..Age~MonthlyIncome, data=hr.df,
spread=FALSE,
main="Scatter plot of salary vs age",
xlab="age",
ylab="salary")

scatterplot(Gender~MonthlyIncome, data=hr.df,
spread=FALSE,
main="Scatter plot of salary vs sex",
xlab="sex",
ylab="salary")
## Warning in Ops.factor(x[floor(d)], x[ceiling(d)]): '+' not meaningful for
## factors
## Warning in smoother(.x, .y, col = col[2], log.x = logged("x"), log.y =
## logged("y"), : could not fit smooth
## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors

scatterplot(TotalWorkingYears~MonthlyIncome, data=hr.df,
spread=FALSE,
main="Scatter plot of salary vs total working experience",
xlab="working experience",
ylab="salary")

boxplot(hr.df$MonthlyIncome~hr.df$TotalWorkingYears)

boxplot(hr.df$MonthlyIncome~hr.df$ï..Age)

boxplot(hr.df$MonthlyIncome~hr.df$PerformanceRating)

boxplot(hr.df$MonthlyIncome~hr.df$MaritalStatus)

boxplot(hr.df$MonthlyIncome~hr.df$JobSatisfaction)

boxplot(hr.df$MonthlyIncome~hr.df$JobInvolvement)

boxplot(hr.df$MonthlyIncome~hr.df$JobRole)

boxplot(hr.df$MonthlyIncome~hr.df$JobLevel)

boxplot(hr.df$MonthlyIncome~hr.df$Gender)

T-tests
t.test(JobSatisfaction~OverTime,data=hr.df)
##
## Welch Two Sample t-test
##
## data: JobSatisfaction by OverTime
## t = -0.92914, df = 741.84, p-value = 0.3531
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.18695896 0.06683964
## sample estimates:
## mean in group No mean in group Yes
## 2.711575 2.771635
from the above t-test we can infer that there is no significant realtion between overtime and job satisfaction since the p-vale is >0.05.
t.test(JobSatisfaction~Gender,data=hr.df)
##
## Welch Two Sample t-test
##
## data: JobSatisfaction by Gender
## t = -1.2773, df = 1266.6, p-value = 0.2017
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.18976672 0.04010685
## sample estimates:
## mean in group Female mean in group Male
## 2.683673 2.758503
from the abocve test we can infer that there is no realtion between gender and one’s job satisfaction.
cor.test(hr.df$JobSatisfaction,hr.df$MonthlyIncome)
##
## Pearson's product-moment correlation
##
## data: hr.df$JobSatisfaction and hr.df$MonthlyIncome
## t = -0.27421, df = 1468, p-value = 0.784
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.05826288 0.04398681
## sample estimates:
## cor
## -0.007156742
from the above test we can infer that there is no significant correlation between monthaly income and job sarisfaction since the p-value of the test is >0.05.
cor.test(hr.df$ï..Age,hr.df$MonthlyIncome)
##
## Pearson's product-moment correlation
##
## data: hr.df$ï..Age and hr.df$MonthlyIncome
## t = 21.995, df = 1468, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4583951 0.5353551
## sample estimates:
## cor
## 0.4978546
from the above test we can infer that there is significant corrrelation between montly income and age of the employee since the p-vale of the test is <0.05.
cor.test(hr.df$TotalWorkingYears,hr.df$MonthlyIncome)
##
## Pearson's product-moment correlation
##
## data: hr.df$TotalWorkingYears and hr.df$MonthlyIncome
## t = 46.669, df = 1468, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7514606 0.7926965
## sample estimates:
## cor
## 0.7728932
from the above test we can infer that there is significant corrrelation between montly income and total working years since the p-vale of the test is <0.05.
cor.test(hr.df$Education,hr.df$MonthlyIncome)
##
## Pearson's product-moment correlation
##
## data: hr.df$Education and hr.df$MonthlyIncome
## t = 3.6549, df = 1468, p-value = 0.0002664
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.04404707 0.14538229
## sample estimates:
## cor
## 0.09496068
from the above test we can infer that there is significant corrrelation between montly income and education since the p-vale of the test is <0.05.
REGRESSION MODEL-1
fit <- lm(MonthlyIncome ~ ï..Age+Education+JobInvolvement+JobLevel+JobSatisfaction+PerformanceRating+TotalWorkingYears ,data=hr.df)
fit
##
## Call:
## lm(formula = MonthlyIncome ~ ï..Age + Education + JobInvolvement +
## JobLevel + JobSatisfaction + PerformanceRating + TotalWorkingYears,
## data = hr.df)
##
## Coefficients:
## (Intercept) ï..Age Education
## -1515.160 -6.894 -24.801
## JobInvolvement JobLevel JobSatisfaction
## -19.712 3784.821 -16.527
## PerformanceRating TotalWorkingYears
## 13.002 52.408
summary(fit)
##
## Call:
## lm(formula = MonthlyIncome ~ ï..Age + Education + JobInvolvement +
## JobLevel + JobSatisfaction + PerformanceRating + TotalWorkingYears,
## data = hr.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5465.8 -920.7 79.0 785.0 3889.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1515.160 425.779 -3.559 0.000385 ***
## ï..Age -6.894 5.728 -1.204 0.228933
## Education -24.801 37.833 -0.656 0.512213
## JobInvolvement -19.712 53.328 -0.370 0.711702
## JobLevel 3784.821 55.031 68.776 < 2e-16 ***
## JobSatisfaction -16.527 34.351 -0.481 0.630508
## PerformanceRating 13.002 105.085 0.124 0.901549
## TotalWorkingYears 52.408 9.192 5.701 1.44e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1451 on 1462 degrees of freedom
## Multiple R-squared: 0.9055, Adjusted R-squared: 0.905
## F-statistic: 2001 on 7 and 1462 DF, p-value: < 2.2e-16
coefficients(fit)
## (Intercept) ï..Age Education JobInvolvement
## -1515.160223 -6.894373 -24.801441 -19.712164
## JobLevel JobSatisfaction PerformanceRating TotalWorkingYears
## 3784.820884 -16.526687 13.001700 52.408062
REGRESSION MODEL-2
fit1 <- lm(MonthlyIncome ~ ï..Age+Education+JobLevel+TotalWorkingYears ,data=hr.df)
fit1
##
## Call:
## lm(formula = MonthlyIncome ~ ï..Age + Education + JobLevel +
## TotalWorkingYears, data = hr.df)
##
## Coefficients:
## (Intercept) ï..Age Education
## -1568.480 -7.019 -25.265
## JobLevel TotalWorkingYears
## 3784.140 52.654
summary(fit1)
##
## Call:
## lm(formula = MonthlyIncome ~ ï..Age + Education + JobLevel +
## TotalWorkingYears, data = hr.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5446.0 -919.2 75.0 785.0 3877.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1568.480 195.595 -8.019 2.16e-15 ***
## ï..Age -7.019 5.718 -1.228 0.220
## Education -25.265 37.759 -0.669 0.504
## JobLevel 3784.140 54.913 68.912 < 2e-16 ***
## TotalWorkingYears 52.654 9.172 5.741 1.14e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1450 on 1465 degrees of freedom
## Multiple R-squared: 0.9055, Adjusted R-squared: 0.9052
## F-statistic: 3508 on 4 and 1465 DF, p-value: < 2.2e-16
coefficients(fit1)
## (Intercept) ï..Age Education JobLevel
## -1568.480036 -7.018903 -25.264873 3784.139537
## TotalWorkingYears
## 52.653732
REGRESSION MODEL-3
fit2 <- lm(MonthlyIncome ~ JobLevel+TotalWorkingYears ,data=hr.df)
fit2
##
## Call:
## lm(formula = MonthlyIncome ~ JobLevel + TotalWorkingYears, data = hr.df)
##
## Coefficients:
## (Intercept) JobLevel TotalWorkingYears
## -1835.86 3788.38 46.08
summary(fit2)
##
## Call:
## lm(formula = MonthlyIncome ~ JobLevel + TotalWorkingYears, data = hr.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5425.2 -924.7 83.0 791.2 3917.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1835.862 80.019 -22.943 < 2e-16 ***
## JobLevel 3788.378 54.843 69.077 < 2e-16 ***
## TotalWorkingYears 46.082 7.802 5.906 4.34e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1450 on 1467 degrees of freedom
## Multiple R-squared: 0.9053, Adjusted R-squared: 0.9052
## F-statistic: 7014 on 2 and 1467 DF, p-value: < 2.2e-16
coefficients(fit2)
## (Intercept) JobLevel TotalWorkingYears
## -1835.86170 3788.37847 46.08199
INFERENCES FROM THE DATA:
1.monthly income depends upon factors such as education,gender, total working years of an employee,age.
2.job satisfaction of an emplyee is insignificant of thier monthly income.
3.from the above three regression models the best one is the second one due to its high r-squared value of 0.9055.
4.from the best regression model we can write the equation for the monthly income of an employee as:
monthlyincome=-1568.480036-7.018903(age)-25.264873(education)+3784.139537(joblevel)+52.653732(totalworkingyears).
REFRENCES:
2.Professor Sameer Mathur at the Indian Institute of Management (IIM), Lucknow