Loading library
library(ISLR)
## Warning: package 'ISLR' was built under R version 3.1.1
library(tree)
## Warning: package 'tree' was built under R version 3.1.2
attach(Carseats)
see all data types for all variables
sapply(names(Carseats),class)
## Sales CompPrice Income Advertising Population Price
## "character" "character" "character" "character" "character" "character"
## ShelveLoc Age Education Urban US
## "character" "character" "character" "character" "character"
In these data, Sales is a continuous variable, and so we begin by recoding it as a binary variable. We use the ifelse() function to create a variable, called ifelse() High, which takes on a value of Yes if the Sales variable exceeds 8, and takes on a value of No otherwise.
High <- ifelse(Carseats$Sales <= 8,"No","Yes")
We use the data.frame() function to merge High with the rest of the Carseats data.
Carseats <- data.frame(Carseats, High)
We now use the tree() function to fit a classification tree in order to predict tree() High using all variables but Sales. The syntax of the tree() function is quite similar to that of the lm() function.
tree.carseats <- tree(High ~ . -Sales, Carseats)
The summary() function lists the variables that are used as internal nodes in the tree, the number of terminal nodes, and the (training) error rate.
summary(tree.carseats)
##
## Classification tree:
## tree(formula = High ~ . - Sales, data = Carseats)
## Variables actually used in tree construction:
## [1] "ShelveLoc" "Price" "Income" "CompPrice" "Population"
## [6] "Advertising" "Age" "US"
## Number of terminal nodes: 27
## Residual mean deviance: 0.4575 = 170.7 / 373
## Misclassification error rate: 0.09 = 36 / 400