dataset=read.csv('Position_Salaries.csv')
dataset=dataset[2:3]
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.2
library(rpart)
## Warning: package 'rpart' was built under R version 3.4.2
regressor=rpart(formula= Salary ~. , data=dataset)
y_tree_pred=predict(regressor, data.frame(Level=6.5))
y_tree_pred
## [1] 249500
g2=ggplot()+
geom_point(aes(x=dataset$Level , y=dataset$Salary),
colour="purple")+
geom_line(aes(x=dataset$Level, y=predict(regressor, newdata= dataset)),
colour="black")+
ggtitle("Truth or Bluff(Decision Tree)")+
xlab('Level')+
ylab("Salary")
g2
##Analysis the result1 Decision tree over estimate the salary, and it has a straight line for all levels , because it just took the average of the 10 level salaries. In order to fix this problem, we need to make some splits.
regressor1=rpart(formula= Salary ~. , data=dataset ,
control=rpart.control(minsplit = 1))
y_tree_pred1=predict(regressor1, data.frame(Level=6.5))
y_tree_pred1
## 1
## 250000
g3=ggplot()+
geom_point(aes(x=dataset$Level , y=dataset$Salary),
colour="pink")+
geom_line(aes(x=dataset$Level, y=predict(regressor1, newdata= dataset)),
colour="green")+
ggtitle("Truth or Bluff(Decision Tree with split)")+
xlab('Level')+
ylab("Salary")
g3
We can see the model is improved, we solved the problem of split, we can clearly see there are 4 intervals. However, the decision tree should take the average of level, therefore, we shouldn’t have the slant lines. We detect the problem is we only plot the observations according to 10 levels. Decision tree is non-continuous. Next section We will improve the our graph by adding higher resulotion
xgrid=seq(min(dataset$Level), max(dataset$Level),0.01)
g4=ggplot()+
geom_point(aes(x=dataset$Level , y=dataset$Salary),
colour="red")+
geom_line(aes(x=xgrid, y=predict(regressor1, newdata=data.frame(Level=xgrid))),
colour="blue")+
ggtitle("Truth or Bluff(Decision Tree with higher resolution)")+
xlab('Level')+
ylab("Salary")
g4
Now we can clearly see the intervals 1-6.5, 6.5-8.5, 8.5-9.5, 9.5-10. Finally, our graph shows a good decision tree model looks like.