We will use R.A. Fisher’s classic iris data set to generate a classification tree. This data set is freely available from the UCI Machine Learing Repository, https://archive.ics.uci.edu/ml/datasets/Iris.
The data set includes:
class:
-Iris setosa
-Iris versicolor
-Iris virginica
#Load data
data(iris)
nrow <- nrow(iris); ncol <- ncol(iris) #inline code
iris[1:4,] # View first four rowss of data set
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
Analysis consisted of fitting a predictive model of iris species based on petal length/width and sepal length/width using the caret package.
#Load required package:
library(caret)
modFit <- train(Species ~., method = "rpart", data=iris) #Fit model
print(modFit$finalModel) #Summarize model
## n= 150
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 150 100 setosa (0.33333 0.33333 0.33333)
## 2) Petal.Length< 2.45 50 0 setosa (1.00000 0.00000 0.00000) *
## 3) Petal.Length>=2.45 100 50 versicolor (0.00000 0.50000 0.50000)
## 6) Petal.Width< 1.75 54 5 versicolor (0.00000 0.90741 0.09259) *
## 7) Petal.Width>=1.75 46 1 virginica (0.00000 0.02174 0.97826) *
The final analysis is presented as a decison tree, using the rattle package.
library(rattle)
fancyRpartPlot(modFit$finalModel) # Plot decision tree
I. setosa can be successfully isolated from other species by petal length. I. versicolor and I. virgicina are somewhat more difficult to separate based on petal width.
This document is hosted freely on RPubs: http://rpubs.com/BlackWidoww/58757
.Rmd source code can be found here: https://github.com/BlackWidoww/ABG250