1. Decision Tree

1.5 Decision Tree in R

# classification decision tree
library (tree)

## Warning: package 'tree' was built under R version 3.3.3

library ( ISLR)

## Warning: package 'ISLR' was built under R version 3.3.3

attach ( Carseats )
High= ifelse (Sales <=8 ," No"," Yes ")
Carseats = data.frame ( Carseats, High)
tree.carseats = tree( High~.-Sales , Carseats )
summary ( tree.carseats )

## 
## Classification tree:
## tree(formula = High ~ . - Sales, data = Carseats)
## Variables actually used in tree construction:
## [1] "ShelveLoc"   "Price"       "Income"      "CompPrice"   "Population" 
## [6] "Advertising" "Age"         "US"         
## Number of terminal nodes:  27 
## Residual mean deviance:  0.4575 = 170.7 / 373 
## Misclassification error rate: 0.09 = 36 / 400

# tree plot
plot( tree.carseats )
text( tree.carseats , pretty =0)

# train and test
set.seed (2)
train = sample(1: nrow( Carseats ), 200)
Carseats.test= Carseats [- train ,]
High.test=High [- train ]
tree.carseats = tree( High~.-Sales ,Carseats ,subset = train )
tree.pred= predict (tree.carseats , Carseats.test , type ="class")
table (tree.pred , High.test )

##          High.test
## tree.pred  No  Yes 
##      No    86    27
##      Yes   30    57

# cross-validation in order to determine the optimal level of tree complexity
set.seed (3)
cv.carseats =cv.tree(tree.carseats, FUN=prune.misclass )
names (cv.carseats )

## [1] "size"   "dev"    "k"      "method"

min_class_error_rate=cv.carseats$dev[ which.min(cv.carseats$dev)]
final_terminal_nodes=cv.carseats$size[ which.min(cv.carseats$dev)]

par (mfrow =c(1 ,2))
plot(cv.carseats$size ,cv.carseats$dev ,type ="b")
plot(cv.carseats$k ,cv.carseats$dev ,type ="b")

# Pruning tree nodes
prune.carseats = prune.misclass (tree.carseats ,best =9)
plot( prune.carseats )
text(prune.carseats , pretty =0)
print(prune.carseats)

## node), split, n, deviance, yval, (yprob)
##       * denotes terminal node
## 
##   1) root 200 269.200  No ( 0.6000 0.4000 )  
##     2) ShelveLoc: Bad,Medium 153 185.400  No ( 0.7059 0.2941 )  
##       4) Price < 142 130 167.700  No ( 0.6538 0.3462 )  
##         8) ShelveLoc: Bad 39  29.870  No ( 0.8718 0.1282 ) *
##         9) ShelveLoc: Medium 91 124.800  No ( 0.5604 0.4396 )  
##          18) Price < 86.5 9   0.000  Yes  ( 0.0000 1.0000 ) *
##          19) Price > 86.5 82 108.700  No ( 0.6220 0.3780 )  
##            38) Advertising < 6.5 52  56.180  No ( 0.7692 0.2308 ) *
##            39) Advertising > 6.5 30  39.430  Yes  ( 0.3667 0.6333 )  
##              78) Age < 37.5 5   0.000  Yes  ( 0.0000 1.0000 ) *
##              79) Age > 37.5 25  34.300  Yes  ( 0.4400 0.5600 )  
##               158) CompPrice < 118.5 8   8.997  No ( 0.7500 0.2500 ) *
##               159) CompPrice > 118.5 17  20.600  Yes  ( 0.2941 0.7059 ) *
##       5) Price > 142 23   0.000  No ( 1.0000 0.0000 ) *
##     3) ShelveLoc: Good 47  53.400  Yes  ( 0.2553 0.7447 )  
##       6) Price < 142.5 38  29.590  Yes  ( 0.1316 0.8684 ) *
##       7) Price > 142.5 9   9.535  No ( 0.7778 0.2222 ) *

prune.carseats

## node), split, n, deviance, yval, (yprob)
##       * denotes terminal node
## 
##   1) root 200 269.200  No ( 0.6000 0.4000 )  
##     2) ShelveLoc: Bad,Medium 153 185.400  No ( 0.7059 0.2941 )  
##       4) Price < 142 130 167.700  No ( 0.6538 0.3462 )  
##         8) ShelveLoc: Bad 39  29.870  No ( 0.8718 0.1282 ) *
##         9) ShelveLoc: Medium 91 124.800  No ( 0.5604 0.4396 )  
##          18) Price < 86.5 9   0.000  Yes  ( 0.0000 1.0000 ) *
##          19) Price > 86.5 82 108.700  No ( 0.6220 0.3780 )  
##            38) Advertising < 6.5 52  56.180  No ( 0.7692 0.2308 ) *
##            39) Advertising > 6.5 30  39.430  Yes  ( 0.3667 0.6333 )  
##              78) Age < 37.5 5   0.000  Yes  ( 0.0000 1.0000 ) *
##              79) Age > 37.5 25  34.300  Yes  ( 0.4400 0.5600 )  
##               158) CompPrice < 118.5 8   8.997  No ( 0.7500 0.2500 ) *
##               159) CompPrice > 118.5 17  20.600  Yes  ( 0.2941 0.7059 ) *
##       5) Price > 142 23   0.000  No ( 1.0000 0.0000 ) *
##     3) ShelveLoc: Good 47  53.400  Yes  ( 0.2553 0.7447 )  
##       6) Price < 142.5 38  29.590  Yes  ( 0.1316 0.8684 ) *
##       7) Price > 142.5 9   9.535  No ( 0.7778 0.2222 ) *

tree.pred= predict (prune.carseats, Carseats.test ,type ="class")
table (tree.pred , High.test )

##          High.test
## tree.pred  No  Yes 
##      No    94    24
##      Yes   22    60

2. Bagging

Bootstrap aggregating, also called bagging, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach.

2.1 Bagging History

Title: Bagging predictors

Author: Leo Brieman.

Year: 1984.

Aim: To improve the classification by combining classifications of randomly generated training sets, which in fact combines classifiers trained on bootstrap samples of the original data.

2.2 Bootstrap Samples

Given a standard training set \(D\) of size \(n\), bagging generates \(m\) new training sets \(D_{i}\), each of size \(n*\), by sampling from \(D\) uniformly and with replacement. By sampling with replacement, some observations may be repeated in each \(D_{i}\). If \(n*=n\), then for large \(D_{i}\)the set \(D\) is expected to have the fraction \(1-\frac{1}{e}\) (\(\approx\) 63.2%) of the unique examples of \(D\), the rest being duplicates. The \(m\) models are fitted using the above \(m\) bootstrap samples and combined by averaging the output (for regression) or voting (for classification).

2.3 Bagging effect

Bagging leads to “improvements for unstable procedures: artificial neural networks, classification and regression trees, and subset selection in linear regression.

Compared with KNN: Bagging mildly degrade the performance of stable methods such as K-nearest neighbors.

2.4 Double bagging

The out-of-bag estimate uses the observations which are left out in a boot- strap sample to estimate the misclassification error at almost no additional computational costs. Hothorn and Lausen propose to use the out- of-bag samples for a combination of linear discriminant analysis and clas- sification trees, called ??Double-Bagging??. For example, a combination of a stabilised linear disciminant analysis with classification trees can be computed along the following lines.

2.5 Indirect Classification

Especially in a medical context it often occurs that a priori knowledge about a classifying structure is given. For example it might be known that a dis- ease is assessed on a subgroup of the given variables or, moreover, that class memberships are assigned by a deterministically known classifying function. Hand proposes the framework of indirect classification which incorporates this a priori knowledge into a classification rule. For more information, please read materials at the website: https://cran.r-project.org/web/packages/ipred/vignettes/ipred-examples.pdf

2.10 Bagging in R

2.10.1 Exmaple data

In the following we demonstrate the functionality of the pack- age in the example of glaucoma classification. We start with an overview about the disease and data and review the implemented classification and estimation methods in context with their application to glaucoma diagnosis. In the data, \(w_{lora}\) represents the loss of nerve fibers and is obtained by a 2-dimensional fundus photography, \(w_{cs}\) and \(w_{clv}\) describe the visual field defect. Two example datasets are included in the package. The first one contains measurements of the eye morphology only (GlaucomaM), including 62 variables for 196 observations. The second dataset (GlaucomaMVF) contains additional visual field measurements for a different set of patients. In both example datasets, the observations in the two groups are matched by age and sex to prevent any bias.

2.10.2 R code for bagging

library(ipred)

## Warning: package 'ipred' was built under R version 3.3.3

library(rpart)
library(MASS)
library(TH.data)

## Warning: package 'TH.data' was built under R version 3.3.3

## Loading required package: survival

## 
## Attaching package: 'TH.data'

## The following object is masked from 'package:MASS':
## 
##     geyser

data(GlaucomaM, package="TH.data" )
gbag <- bagging(Class ~ ., data = GlaucomaM, coob=TRUE)
print(gbag)

## 
## Bagging classification trees with 25 bootstrap replications 
## 
## Call: bagging.data.frame(formula = Class ~ ., data = GlaucomaM, coob = TRUE)
## 
## Out-of-bag estimate of misclassification error:  0.199

2.10.3 R code for double bagging

scomb <- list(list(model=slda, predict=function(object, newdata)
+ predict(object, newdata)$x))

gbagc <- bagging(Class ~ ., data = GlaucomaM, comb=scomb)

predict(gbagc, newdata=GlaucomaM[c(1:3, 99:102), ])

## [1] normal   normal   normal   glaucoma glaucoma glaucoma glaucoma
## Levels: glaucoma normal

3. Random Forest

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees’ habit of overfitting to their training set.

The first algorithm for random decision forests was created by Tin Kam Ho using the random subspace method, which in Ho’s formulation, is a way to implement the “stochastic discrimination” approach to classification proposed by Eugene Kleinberg.

An extension of the algorithm was developed by Leo Breiman and Adele Cutler, and “Random Forests” is their trademark. The extension combines Breiman’s “bagging” idea and random selection of features, introduced first by Ho and later independently by Amit and Geman in order to construct a collection of decision trees with controlled variance.

3.5 Random forest in R

Here we apply bagging and random forests to the Boston data, using the randomForest package in R. The exact results obtained in this section may depend on the version of R and the version of the randomForest package installed on your computer. Recall that bagging is simply a special case of a random forest with m = p. Therefore, the randomForest() function can be used to perform both random forests and bagging. We perform bagging as follows:

library ( randomForest)

## Warning: package 'randomForest' was built under R version 3.3.3

## randomForest 4.6-12

## Type rfNews() to see new features/changes/bug fixes.

library(MASS)
help(Boston)

## starting httpd help server ...

##  done

set.seed (1)

# Bagging
bag.boston = randomForest(medv~., data=Boston, subset =train, mtry =13, importance = TRUE)
boston.test = Boston [-train ,"medv"]
yhat.bag = predict (bag.boston , newdata =Boston[-train ,])
plot( yhat.bag , boston.test)
abline (0 ,1)

mean (( yhat.bag - boston.test)^2)

## [1] 13.23173

# ntree=25
bag.boston = randomForest(medv~., data=Boston, subset =train ,
mtry =13, ntree =25)
yhat.bag = predict (bag.boston , newdata =Boston [-train ,])
mean (( yhat.bag - boston.test)^2)

## [1] 12.01848

Growing a random forest proceeds in exactly the same way, except that we use a smaller value of the mtry argument. By default, randomForest() uses p/3 variables when building a random forest of regression trees, and \(sqrt(q)\) variables when building a random forest of classification trees. Here we use mtry = 6.

set.seed (1)
rf.boston = randomForest( medv~.,data =Boston , subset =train ,
mtry =6, importance = TRUE)
yhat.rf = predict (rf.boston , newdata = Boston [- train ,])
mean ((yhat.rf - boston.test )^2)

## [1] 12.6098

importance (rf.boston )

##           %IncMSE IncNodePurity
## crim    10.056635     910.80213
## zn       2.027511      66.43274
## indus    7.856146     711.18115
## chas     1.237987      69.59153
## nox      5.719429     470.51553
## rm      30.326688    5576.34277
## age      8.712729     610.05150
## dis     11.902977    1369.07530
## rad      4.618294     150.23860
## tax      5.613355     292.92698
## ptratio 11.632363     598.41235
## black    2.513735     266.24335
## lstat   31.234637    6374.49147

varImpPlot (rf.boston )

3.3 RF vs KNN

4. Relations among three methods

Decision Tree, Bagging and Random Forest

Kangrinboqe

April 7, 2017