A potential employee told his new company that his current salary is $160k, the human resource retreived the salary amount associated position levels in his previous company. The potential new employee has a postion level 6.5, we would like to build a salary bluffing detector to see if he told the truth.
dataset=read.csv('Position_Salaries.csv')
dataset=dataset[2:3]
lin_reg=lm(formula=Salary ~. , data=dataset)
summary(lin_reg)
##
## Call:
## lm(formula = Salary ~ ., data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -170818 -129720 -40379 65856 386545
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -195333 124790 -1.565 0.15615
## Level 80879 20112 4.021 0.00383 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 182700 on 8 degrees of freedom
## Multiple R-squared: 0.669, Adjusted R-squared: 0.6277
## F-statistic: 16.17 on 1 and 8 DF, p-value: 0.003833
In order to generate the polynomial term, we need to build a a new variable level3, level4
dataset$level2=dataset$Level^2
dataset$level3=dataset$Level^3
poly_reg=lm(formula=Salary ~. , data=dataset)
summary(poly_reg)
##
## Call:
## lm(formula = Salary ~ ., data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -75695 -28148 7091 29256 49538
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -121333.3 97544.8 -1.244 0.25994
## Level 180664.3 73114.5 2.471 0.04839 *
## level2 -48549.0 15081.0 -3.219 0.01816 *
## level3 4120.0 904.3 4.556 0.00387 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 50260 on 6 degrees of freedom
## Multiple R-squared: 0.9812, Adjusted R-squared: 0.9718
## F-statistic: 104.4 on 3 and 6 DF, p-value: 1.441e-05
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.2
g1=ggplot()+
geom_point(aes(x=dataset$Level , y=dataset$Salary),
colour="red")+
geom_line(aes(x=dataset$Level, y=predict(lin_reg, newdata= dataset)),
colour="green")+
ggtitle("Truth or Bluff(Linear Regression)")+
xlab('Level')+
ylab("Salary")
g1
y_linear_pred=predict(lin_reg, data.frame(Level=6.5))
y_linear_pred
## 1
## 330378.8
g2=ggplot()+
geom_point(aes(x=dataset$Level , y=dataset$Salary),
colour="pink")+
geom_line(aes(x=dataset$Level, y=predict(poly_reg, newdata= dataset)),
colour="yellow")+
ggtitle("Truth or Bluff(Poly Regression)")+
xlab('Level')+
ylab("Salary")
g2
y_poly3_pred=predict(poly_reg, data.frame(Level=6.5, level2=6.5^2, level3=6.5^3))
y_poly3_pred
## 1
## 133259.5
dataset1=dataset
dataset1$level4=dataset1$Level^4
poly_reg1=lm(formula=Salary ~. , data=dataset1)
summary(poly_reg1)
##
## Call:
## lm(formula = Salary ~ ., data = dataset1)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9 10
## -8357 18240 1358 -14633 -11725 6725 15997 10006 -28695 11084
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 184166.7 67768.0 2.718 0.04189 *
## Level -211002.3 76382.2 -2.762 0.03972 *
## level2 94765.4 26454.2 3.582 0.01584 *
## level3 -15463.3 3535.0 -4.374 0.00719 **
## level4 890.2 159.8 5.570 0.00257 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20510 on 5 degrees of freedom
## Multiple R-squared: 0.9974, Adjusted R-squared: 0.9953
## F-statistic: 478.1 on 4 and 5 DF, p-value: 1.213e-06
g3=ggplot()+
geom_point(aes(x=dataset$Level , y=dataset$Salary),
colour="purple")+
geom_line(aes(x=dataset$Level, y=predict(poly_reg1, newdata= dataset1)),
colour="black")+
ggtitle("Truth or Bluff(Poly Regression)")+
xlab('Level')+
ylab("Salary")
g3
y_poly4_pred=predict(poly_reg1, data.frame(Level=6.5, level2=6.5^2, level3=6.5^3, level4=6.5^4))
y_poly4_pred
## 1
## 158862.5
as we can see the polynominal regession is better than the linear regression, in addition, 4th polynomial regression is better than the 3rd polynomial regression. We can make the final conclusion that the new employee told us the truth, because the predicted value is 158862.5 dollars.