Salary : Truth or bluff

A potential employee told his new company that his current salary is $160k, the human resource retreived the salary amount associated position levels in his previous company. The potential new employee has a postion level 6.5, we would like to build a salary bluffing detector to see if he told the truth.

Import the position_salary data

dataset=read.csv('Position_Salaries.csv')
dataset=dataset[2:3]

Fitting Linear Regression to the dataset

lin_reg=lm(formula=Salary ~. , data=dataset)
summary(lin_reg)
## 
## Call:
## lm(formula = Salary ~ ., data = dataset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -170818 -129720  -40379   65856  386545 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  -195333     124790  -1.565  0.15615   
## Level          80879      20112   4.021  0.00383 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 182700 on 8 degrees of freedom
## Multiple R-squared:  0.669,  Adjusted R-squared:  0.6277 
## F-statistic: 16.17 on 1 and 8 DF,  p-value: 0.003833

Fitting Polynomial Regression to the Salary dataset

In order to generate the polynomial term, we need to build a a new variable level3, level4

polynomial level3

dataset$level2=dataset$Level^2
dataset$level3=dataset$Level^3
poly_reg=lm(formula=Salary ~. , data=dataset)
summary(poly_reg)
## 
## Call:
## lm(formula = Salary ~ ., data = dataset)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -75695 -28148   7091  29256  49538 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -121333.3    97544.8  -1.244  0.25994   
## Level        180664.3    73114.5   2.471  0.04839 * 
## level2       -48549.0    15081.0  -3.219  0.01816 * 
## level3         4120.0      904.3   4.556  0.00387 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50260 on 6 degrees of freedom
## Multiple R-squared:  0.9812, Adjusted R-squared:  0.9718 
## F-statistic: 104.4 on 3 and 6 DF,  p-value: 1.441e-05

Visualizing the Linear Regression results

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.2
g1=ggplot()+
  geom_point(aes(x=dataset$Level , y=dataset$Salary), 
             colour="red")+
  geom_line(aes(x=dataset$Level, y=predict(lin_reg, newdata= dataset)),
            colour="green")+
  ggtitle("Truth or Bluff(Linear Regression)")+
  xlab('Level')+
  ylab("Salary")
g1

Predict the salary with Linear Regression prediction

y_linear_pred=predict(lin_reg, data.frame(Level=6.5))
y_linear_pred
##        1 
## 330378.8

Visualizing the Poly Regression

results with level 3

g2=ggplot()+
  geom_point(aes(x=dataset$Level , y=dataset$Salary), 
             colour="pink")+
  geom_line(aes(x=dataset$Level, y=predict(poly_reg, newdata= dataset)),
            colour="yellow")+
  ggtitle("Truth or Bluff(Poly Regression)")+
  xlab('Level')+
  ylab("Salary")
g2

y_poly3_pred=predict(poly_reg, data.frame(Level=6.5, level2=6.5^2, level3=6.5^3))
y_poly3_pred
##        1 
## 133259.5

results with level 4

dataset1=dataset
dataset1$level4=dataset1$Level^4
poly_reg1=lm(formula=Salary ~. , data=dataset1)
summary(poly_reg1)
## 
## Call:
## lm(formula = Salary ~ ., data = dataset1)
## 
## Residuals:
##      1      2      3      4      5      6      7      8      9     10 
##  -8357  18240   1358 -14633 -11725   6725  15997  10006 -28695  11084 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  184166.7    67768.0   2.718  0.04189 * 
## Level       -211002.3    76382.2  -2.762  0.03972 * 
## level2        94765.4    26454.2   3.582  0.01584 * 
## level3       -15463.3     3535.0  -4.374  0.00719 **
## level4          890.2      159.8   5.570  0.00257 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20510 on 5 degrees of freedom
## Multiple R-squared:  0.9974, Adjusted R-squared:  0.9953 
## F-statistic: 478.1 on 4 and 5 DF,  p-value: 1.213e-06
g3=ggplot()+
  geom_point(aes(x=dataset$Level , y=dataset$Salary), 
             colour="purple")+
  geom_line(aes(x=dataset$Level, y=predict(poly_reg1, newdata= dataset1)),
            colour="black")+
  ggtitle("Truth or Bluff(Poly Regression)")+
  xlab('Level')+
  ylab("Salary")
g3

y_poly4_pred=predict(poly_reg1, data.frame(Level=6.5, level2=6.5^2, level3=6.5^3, level4=6.5^4))
y_poly4_pred
##        1 
## 158862.5

as we can see the polynominal regession is better than the linear regression, in addition, 4th polynomial regression is better than the 3rd polynomial regression. We can make the final conclusion that the new employee told us the truth, because the predicted value is 158862.5 dollars.