Random Forests

Here is an example of Random Forests.
We load the Carseats data from package “ISLR” and to run Random Forests, we use “party” package.

require(ISLR)
require(party)
data(Carseats)
head(Carseats)

##   Sales CompPrice Income Advertising Population Price ShelveLoc Age
## 1  9.50       138     73          11        276   120       Bad  42
## 2 11.22       111     48          16        260    83      Good  65
## 3 10.06       113     35          10        269    80    Medium  59
## 4  7.40       117    100           4        466    97    Medium  55
## 5  4.15       141     64           3        340   128       Bad  38
## 6 10.81       124    113          13        501    72       Bad  78
##   Education Urban  US
## 1        17   Yes Yes
## 2        10   Yes Yes
## 3        12   Yes Yes
## 4        14   Yes Yes
## 5        13   Yes  No
## 6        16    No Yes

Since our response variable Sales is a continous variable, we change it to binary variable.

high <- ifelse(Carseats$Sales <= 8, "No", "Yes")
Carseats <- data.frame(Carseats, high)

Since we have high as a response variable, we will remove the Sales variable.

carseats <- Carseats[, c(12, 2:11)]
names(carseats)

##  [1] "high"        "CompPrice"   "Income"      "Advertising" "Population" 
##  [6] "Price"       "ShelveLoc"   "Age"         "Education"   "Urban"      
## [11] "US"

dim(carseats)

## [1] 400  11

We divide the dataset into train and test dataset.

set.seed(500)
train <- sample(1:nrow(carseats), .7*nrow(carseats))
carseats.train <- carseats[train, ]
carseats.test <- carseats[-train, ]
dim(carseats.train)

## [1] 280  11

dim(carseats.test)

## [1] 120  11

First, we will build a tree model and then we will build random forests model, and see whether the model can improve.

carseats.tree <- ctree(high ~., data = carseats.train)
carseats.tree

## 
##   Conditional inference tree with 9 terminal nodes
## 
## Response:  high 
## Inputs:  CompPrice, Income, Advertising, Population, Price, ShelveLoc, Age, Education, Urban, US 
## Number of observations:  280 
## 
## 1) ShelveLoc == {Good}; criterion = 1, statistic = 48.659
##   2) Price <= 134; criterion = 0.999, statistic = 15.62
##     3) US == {Yes}; criterion = 0.998, statistic = 13.559
##       4)*  weights = 37 
##     3) US == {No}
##       5)*  weights = 14 
##   2) Price > 134
##     6)*  weights = 10 
## 1) ShelveLoc == {Bad, Medium}
##   7) Price <= 87; criterion = 1, statistic = 31.944
##     8)*  weights = 21 
##   7) Price > 87
##     9) Advertising <= 13; criterion = 1, statistic = 20.076
##       10) Price <= 126; criterion = 0.987, statistic = 10.335
##         11) CompPrice <= 126; criterion = 0.999, statistic = 14.531
##           12)*  weights = 64 
##         11) CompPrice > 126
##           13) ShelveLoc == {Bad}; criterion = 0.977, statistic = 9.235
##             14)*  weights = 8 
##           13) ShelveLoc == {Medium}
##             15)*  weights = 33 
##       10) Price > 126
##         16)*  weights = 61 
##     9) Advertising > 13
##       17)*  weights = 32

plot(carseats.tree)

actuals <- carseats.test$high
predicted <- predict(carseats.tree, newdata = carseats.test)
table(true = actuals, pred = predicted)

##      pred
## true  No Yes
##   No  51  20
##   Yes 13  36

mean(carseats.test$high != predicted)

## [1] 0.275

Now we build random forests model.

carseats.forests <- cforest(high ~., data = carseats.train)
carseats.forests

## 
##   Random Forest using Conditional Inference Trees
## 
## Number of trees:  500 
## 
## Response:  high 
## Inputs:  CompPrice, Income, Advertising, Population, Price, ShelveLoc, Age, Education, Urban, US 
## Number of observations:  280

predicted <- predict(carseats.forests, newdata = carseats.test)
table(true = actuals, pred = predicted)

##      pred
## true  No Yes
##   No  65   6
##   Yes 14  35

mean(carseats.test$high != predicted)

## [1] 0.1666667

As we can see with random forests model, misclassification rate came down from 27.5% to 16.7%. Hence, random forests model improves the prediction accuracy compares to tree model.

Random Forests

Loy

Wednesday, March 18, 2015