# classification decision tree
library (tree)
## Warning: package 'tree' was built under R version 3.3.3
library ( ISLR)
## Warning: package 'ISLR' was built under R version 3.3.3
attach ( Carseats )
High= ifelse (Sales <=8 ," No"," Yes ")
Carseats = data.frame ( Carseats, High)
tree.carseats = tree( High~.-Sales , Carseats )
summary ( tree.carseats )
##
## Classification tree:
## tree(formula = High ~ . - Sales, data = Carseats)
## Variables actually used in tree construction:
## [1] "ShelveLoc" "Price" "Income" "CompPrice" "Population"
## [6] "Advertising" "Age" "US"
## Number of terminal nodes: 27
## Residual mean deviance: 0.4575 = 170.7 / 373
## Misclassification error rate: 0.09 = 36 / 400
# tree plot
plot( tree.carseats )
text( tree.carseats , pretty =0)
# train and test
set.seed (2)
train = sample(1: nrow( Carseats ), 200)
Carseats.test= Carseats [- train ,]
High.test=High [- train ]
tree.carseats = tree( High~.-Sales ,Carseats ,subset = train )
tree.pred= predict (tree.carseats , Carseats.test , type ="class")
table (tree.pred , High.test )
## High.test
## tree.pred No Yes
## No 86 27
## Yes 30 57
# cross-validation in order to determine the optimal level of tree complexity
set.seed (3)
cv.carseats =cv.tree(tree.carseats, FUN=prune.misclass )
names (cv.carseats )
## [1] "size" "dev" "k" "method"
min_class_error_rate=cv.carseats$dev[ which.min(cv.carseats$dev)]
final_terminal_nodes=cv.carseats$size[ which.min(cv.carseats$dev)]
par (mfrow =c(1 ,2))
plot(cv.carseats$size ,cv.carseats$dev ,type ="b")
plot(cv.carseats$k ,cv.carseats$dev ,type ="b")
# Pruning tree nodes
prune.carseats = prune.misclass (tree.carseats ,best =9)
plot( prune.carseats )
text(prune.carseats , pretty =0)
print(prune.carseats)
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 200 269.200 No ( 0.6000 0.4000 )
## 2) ShelveLoc: Bad,Medium 153 185.400 No ( 0.7059 0.2941 )
## 4) Price < 142 130 167.700 No ( 0.6538 0.3462 )
## 8) ShelveLoc: Bad 39 29.870 No ( 0.8718 0.1282 ) *
## 9) ShelveLoc: Medium 91 124.800 No ( 0.5604 0.4396 )
## 18) Price < 86.5 9 0.000 Yes ( 0.0000 1.0000 ) *
## 19) Price > 86.5 82 108.700 No ( 0.6220 0.3780 )
## 38) Advertising < 6.5 52 56.180 No ( 0.7692 0.2308 ) *
## 39) Advertising > 6.5 30 39.430 Yes ( 0.3667 0.6333 )
## 78) Age < 37.5 5 0.000 Yes ( 0.0000 1.0000 ) *
## 79) Age > 37.5 25 34.300 Yes ( 0.4400 0.5600 )
## 158) CompPrice < 118.5 8 8.997 No ( 0.7500 0.2500 ) *
## 159) CompPrice > 118.5 17 20.600 Yes ( 0.2941 0.7059 ) *
## 5) Price > 142 23 0.000 No ( 1.0000 0.0000 ) *
## 3) ShelveLoc: Good 47 53.400 Yes ( 0.2553 0.7447 )
## 6) Price < 142.5 38 29.590 Yes ( 0.1316 0.8684 ) *
## 7) Price > 142.5 9 9.535 No ( 0.7778 0.2222 ) *
prune.carseats
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 200 269.200 No ( 0.6000 0.4000 )
## 2) ShelveLoc: Bad,Medium 153 185.400 No ( 0.7059 0.2941 )
## 4) Price < 142 130 167.700 No ( 0.6538 0.3462 )
## 8) ShelveLoc: Bad 39 29.870 No ( 0.8718 0.1282 ) *
## 9) ShelveLoc: Medium 91 124.800 No ( 0.5604 0.4396 )
## 18) Price < 86.5 9 0.000 Yes ( 0.0000 1.0000 ) *
## 19) Price > 86.5 82 108.700 No ( 0.6220 0.3780 )
## 38) Advertising < 6.5 52 56.180 No ( 0.7692 0.2308 ) *
## 39) Advertising > 6.5 30 39.430 Yes ( 0.3667 0.6333 )
## 78) Age < 37.5 5 0.000 Yes ( 0.0000 1.0000 ) *
## 79) Age > 37.5 25 34.300 Yes ( 0.4400 0.5600 )
## 158) CompPrice < 118.5 8 8.997 No ( 0.7500 0.2500 ) *
## 159) CompPrice > 118.5 17 20.600 Yes ( 0.2941 0.7059 ) *
## 5) Price > 142 23 0.000 No ( 1.0000 0.0000 ) *
## 3) ShelveLoc: Good 47 53.400 Yes ( 0.2553 0.7447 )
## 6) Price < 142.5 38 29.590 Yes ( 0.1316 0.8684 ) *
## 7) Price > 142.5 9 9.535 No ( 0.7778 0.2222 ) *
tree.pred= predict (prune.carseats, Carseats.test ,type ="class")
table (tree.pred , High.test )
## High.test
## tree.pred No Yes
## No 94 24
## Yes 22 60
Bootstrap aggregating, also called bagging, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach.
Title: Bagging predictors
Author: Leo Brieman.
Year: 1984.
Aim: To improve the classification by combining classifications of randomly generated training sets, which in fact combines classifiers trained on bootstrap samples of the original data.
Given a standard training set \(D\) of size \(n\), bagging generates \(m\) new training sets \(D_{i}\), each of size \(n*\), by sampling from \(D\) uniformly and with replacement. By sampling with replacement, some observations may be repeated in each \(D_{i}\). If \(n*=n\), then for large \(D_{i}\)the set \(D\) is expected to have the fraction \(1-\frac{1}{e}\) (\(\approx\) 63.2%) of the unique examples of \(D\), the rest being duplicates. The \(m\) models are fitted using the above \(m\) bootstrap samples and combined by averaging the output (for regression) or voting (for classification).
Bagging leads to “improvements for unstable procedures: artificial neural networks, classification and regression trees, and subset selection in linear regression.
Compared with KNN: Bagging mildly degrade the performance of stable methods such as K-nearest neighbors.
The out-of-bag estimate uses the observations which are left out in a boot- strap sample to estimate the misclassification error at almost no additional computational costs. Hothorn and Lausen propose to use the out- of-bag samples for a combination of linear discriminant analysis and clas- sification trees, called ??Double-Bagging??. For example, a combination of a stabilised linear disciminant analysis with classification trees can be computed along the following lines.
Especially in a medical context it often occurs that a priori knowledge about a classifying structure is given. For example it might be known that a dis- ease is assessed on a subgroup of the given variables or, moreover, that class memberships are assigned by a deterministically known classifying function. Hand proposes the framework of indirect classification which incorporates this a priori knowledge into a classification rule. For more information, please read materials at the website: https://cran.r-project.org/web/packages/ipred/vignettes/ipred-examples.pdf
In the following we demonstrate the functionality of the pack- age in the example of glaucoma classification. We start with an overview about the disease and data and review the implemented classification and estimation methods in context with their application to glaucoma diagnosis. In the data, \(w_{lora}\) represents the loss of nerve fibers and is obtained by a 2-dimensional fundus photography, \(w_{cs}\) and \(w_{clv}\) describe the visual field defect. Two example datasets are included in the package. The first one contains measurements of the eye morphology only (GlaucomaM), including 62 variables for 196 observations. The second dataset (GlaucomaMVF) contains additional visual field measurements for a different set of patients. In both example datasets, the observations in the two groups are matched by age and sex to prevent any bias.
library(ipred)
## Warning: package 'ipred' was built under R version 3.3.3
library(rpart)
library(MASS)
library(TH.data)
## Warning: package 'TH.data' was built under R version 3.3.3
## Loading required package: survival
##
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
data(GlaucomaM, package="TH.data" )
gbag <- bagging(Class ~ ., data = GlaucomaM, coob=TRUE)
print(gbag)
##
## Bagging classification trees with 25 bootstrap replications
##
## Call: bagging.data.frame(formula = Class ~ ., data = GlaucomaM, coob = TRUE)
##
## Out-of-bag estimate of misclassification error: 0.199
scomb <- list(list(model=slda, predict=function(object, newdata)
+ predict(object, newdata)$x))
gbagc <- bagging(Class ~ ., data = GlaucomaM, comb=scomb)
predict(gbagc, newdata=GlaucomaM[c(1:3, 99:102), ])
## [1] normal normal normal glaucoma glaucoma glaucoma glaucoma
## Levels: glaucoma normal
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees’ habit of overfitting to their training set.
The first algorithm for random decision forests was created by Tin Kam Ho using the random subspace method, which in Ho’s formulation, is a way to implement the “stochastic discrimination” approach to classification proposed by Eugene Kleinberg.
An extension of the algorithm was developed by Leo Breiman and Adele Cutler, and “Random Forests” is their trademark. The extension combines Breiman’s “bagging” idea and random selection of features, introduced first by Ho and later independently by Amit and Geman in order to construct a collection of decision trees with controlled variance.
Here we apply bagging and random forests to the Boston data, using the randomForest package in R. The exact results obtained in this section may depend on the version of R and the version of the randomForest package installed on your computer. Recall that bagging is simply a special case of a random forest with m = p. Therefore, the randomForest() function can be used to perform both random forests and bagging. We perform bagging as follows:
library ( randomForest)
## Warning: package 'randomForest' was built under R version 3.3.3
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
library(MASS)
help(Boston)
## starting httpd help server ...
## done
set.seed (1)
# Bagging
bag.boston = randomForest(medv~., data=Boston, subset =train, mtry =13, importance = TRUE)
boston.test = Boston [-train ,"medv"]
yhat.bag = predict (bag.boston , newdata =Boston[-train ,])
plot( yhat.bag , boston.test)
abline (0 ,1)
mean (( yhat.bag - boston.test)^2)
## [1] 13.23173
# ntree=25
bag.boston = randomForest(medv~., data=Boston, subset =train ,
mtry =13, ntree =25)
yhat.bag = predict (bag.boston , newdata =Boston [-train ,])
mean (( yhat.bag - boston.test)^2)
## [1] 12.01848
Growing a random forest proceeds in exactly the same way, except that we use a smaller value of the mtry argument. By default, randomForest() uses p/3 variables when building a random forest of regression trees, and \(sqrt(q)\) variables when building a random forest of classification trees. Here we use mtry = 6.
set.seed (1)
rf.boston = randomForest( medv~.,data =Boston , subset =train ,
mtry =6, importance = TRUE)
yhat.rf = predict (rf.boston , newdata = Boston [- train ,])
mean ((yhat.rf - boston.test )^2)
## [1] 12.6098
importance (rf.boston )
## %IncMSE IncNodePurity
## crim 10.056635 910.80213
## zn 2.027511 66.43274
## indus 7.856146 711.18115
## chas 1.237987 69.59153
## nox 5.719429 470.51553
## rm 30.326688 5576.34277
## age 8.712729 610.05150
## dis 11.902977 1369.07530
## rad 4.618294 150.23860
## tax 5.613355 292.92698
## ptratio 11.632363 598.41235
## black 2.513735 266.24335
## lstat 31.234637 6374.49147
varImpPlot (rf.boston )