1 Goal

The goal of this tutorial is to be able to visualize a decision tree in order to get information and insights from it.

2 Data import

library(caret)
library(rpart.plot)

# In this example we will use the open repository of plants classification Iris. 
data("iris")
summary(iris)

##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
##

3 Creating the model

# First we do train and test
my_index <- createDataPartition(iris$Species, p = 0.7, list = FALSE)
trainSet <- iris[my_index, ]
testSet <- iris[ -my_index, ]

# Now we create the model
my_tree <- train(Species ~., data = trainSet, method = "rpart")
my_tree

## CART 
## 
## 105 samples
##   4 predictor
##   3 classes: 'setosa', 'versicolor', 'virginica' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 105, 105, 105, 105, 105, 105, ... 
## Resampling results across tuning parameters:
## 
##   cp         Accuracy   Kappa    
##   0.0000000  0.9115723  0.8645299
##   0.4142857  0.7814455  0.6846417
##   0.5000000  0.4783996  0.2809499
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was cp = 0.

# And we can predict the Species in the testSet
my_prediction <- predict(my_tree, newdata = testSet)
postResample(my_prediction, testSet$Species)

## Accuracy    Kappa 
##        1        1

4 Plotting the decision tree

# In order to plot the tree we can use the rpart.plot function
rpart.plot(my_tree$finalModel)

5 Conclusion

In this tutorial we have learnt how to visualize a decision tree using the rpart.plot function. This visualization can later be used in reports or presentations or just to understand how the model was built.

Plot a decision tree

Luis Serra @ Ubiqum Code Academy

1 Goal

2 Data import

3 Creating the model

4 Plotting the decision tree

5 Conclusion