??? Goal and Background – What is the problem? ??? Approach – What have you done? ??? Major findings – What do you find and what is your conclusion?
# load Boston data
library(MASS)
## Warning: package 'MASS' was built under R version 3.4.4
data(Boston)
index <- sample(nrow(Boston),nrow(Boston)*0.75)
boston.train <- Boston[index,]
boston.test <- Boston[-index,]
library(ipred)
## Warning: package 'ipred' was built under R version 3.4.4
library(rpart)
## Warning: package 'rpart' was built under R version 3.4.4
Regression tree 100. Improving the accuracy
boston.bag<- bagging(medv~., data = boston.train, nbagg=100)
boston.bag
##
## Bagging regression trees with 100 bootstrap replications
##
## Call: bagging.data.frame(formula = medv ~ ., data = boston.train, nbagg = 100)
summary(boston.bag)
## Length Class Mode
## y 379 -none- numeric
## X 13 data.frame list
## mtrees 100 -none- list
## OOB 1 -none- logical
## comb 1 -none- logical
## call 4 -none- call
Prediction on testing sample.
boston.bag.pred<- predict(boston.bag, newdata = boston.test)
mean((boston.test$medv-boston.bag.pred)^2)#mse
## [1] 11.09548
In sample
boston.bag.pred_train<- predict(boston.bag, newdata = boston.train)
mean((boston.train$medv-boston.bag.pred_train)^2)#mse
## [1] 10.90266
Comparing with a single tree.
library(rpart)
boston.tree<- rpart(medv~., data = boston.train)
boston.tree.pred<- predict(boston.tree, newdata = boston.test)
mean((boston.test$medv-boston.tree.pred)^2)
## [1] 15.86878
Tree performs better. Boostrap is a resample your data distribution.
ntree<- c(seq(1, 200, 10))
MSE.test<- rep(0, length(ntree))
for(i in 1:length(ntree)){
boston.bag1<- bagging(medv~., data = boston.train, nbagg=ntree[i])
boston.bag.pred1<- predict(boston.bag1, newdata = boston.test)
MSE.test[i]<- mean((boston.test$medv-boston.bag.pred1)^2)
}
plot(ntree, MSE.test, type = 'l', col=2, lwd=2, xaxt="n")
axis(1, at = ntree, las=1)
boston.bag.oob<- bagging(medv~., data = boston.train, coob=T, nbagg=100)
boston.bag.oob
##
## Bagging regression trees with 100 bootstrap replications
##
## Call: bagging.data.frame(formula = medv ~ ., data = boston.train, coob = T,
## nbagg = 100)
##
## Out-of-bag estimate of root mean squared error: 4.2074
misclassfication error Bagging outperforms
```