Tree Classification - Hastie work

Loading library

library(ISLR)

## Warning: package 'ISLR' was built under R version 3.1.1

library(tree)

## Warning: package 'tree' was built under R version 3.1.2

attach(Carseats)

see all data types for all variables

sapply(names(Carseats),class)

##       Sales   CompPrice      Income Advertising  Population       Price 
## "character" "character" "character" "character" "character" "character" 
##   ShelveLoc         Age   Education       Urban          US 
## "character" "character" "character" "character" "character"

In these data, Sales is a continuous variable, and so we begin by recoding it as a binary variable. We use the ifelse() function to create a variable, called ifelse() High, which takes on a value of Yes if the Sales variable exceeds 8, and takes on a value of No otherwise.

High <- ifelse(Carseats$Sales <= 8,"No","Yes")

We use the data.frame() function to merge High with the rest of the Carseats data.

Carseats <- data.frame(Carseats, High)

We now use the tree() function to fit a classification tree in order to predict tree() High using all variables but Sales. The syntax of the tree() function is quite similar to that of the lm() function.

tree.carseats <- tree(High ~ . -Sales, Carseats)

The summary() function lists the variables that are used as internal nodes in the tree, the number of terminal nodes, and the (training) error rate.

summary(tree.carseats)

## 
## Classification tree:
## tree(formula = High ~ . - Sales, data = Carseats)
## Variables actually used in tree construction:
## [1] "ShelveLoc"   "Price"       "Income"      "CompPrice"   "Population" 
## [6] "Advertising" "Age"         "US"         
## Number of terminal nodes:  27 
## Residual mean deviance:  0.4575 = 170.7 / 373 
## Misclassification error rate: 0.09 = 36 / 400