The goal of this tutorial is to be able to visualize a decision tree in order to get information and insights from it.
library(caret)
library(rpart.plot)
# In this example we will use the open repository of plants classification Iris.
data("iris")
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
# First we do train and test
my_index <- createDataPartition(iris$Species, p = 0.7, list = FALSE)
trainSet <- iris[my_index, ]
testSet <- iris[ -my_index, ]
# Now we create the model
my_tree <- train(Species ~., data = trainSet, method = "rpart")
my_tree
## CART
##
## 105 samples
## 4 predictor
## 3 classes: 'setosa', 'versicolor', 'virginica'
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 105, 105, 105, 105, 105, 105, ...
## Resampling results across tuning parameters:
##
## cp Accuracy Kappa
## 0.0000000 0.9115723 0.8645299
## 0.4142857 0.7814455 0.6846417
## 0.5000000 0.4783996 0.2809499
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cp = 0.
# And we can predict the Species in the testSet
my_prediction <- predict(my_tree, newdata = testSet)
postResample(my_prediction, testSet$Species)
## Accuracy Kappa
## 1 1
# In order to plot the tree we can use the rpart.plot function
rpart.plot(my_tree$finalModel)
In this tutorial we have learnt how to visualize a decision tree using the rpart.plot function. This visualization can later be used in reports or presentations or just to understand how the model was built.