Analysis steps

Step 1: Split data into two subsets, with 70% training and 30% test

set.seed(1234)
SampleID <- sample(2, nrow(iris), replace = TRUE, prob = c(0.7, 0.3))
trainData <- iris[SampleID==1, ]
testData <- iris[SampleID==2, ]

Step 2: Build the decision tree and check the predict

library(party)
iris_ctree <- ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = trainData)
table(predict(iris_ctree),trainData$Species)

##             
##              setosa versicolor virginica
##   setosa         40          0         0
##   versicolor      0         37         3
##   virginica       0          1        31

Conclusion: the prediction is high reliable, we can see that there are only 4 mistakes in the 112 training samples.

Step 3: Plot the decision tree

plot(iris_ctree, type="simple")

Conclusion: the above tree has 4 terminal nodes. P values shows the confidence we have that an instance falling into the groups. For example, if Petal.Length is small or even 1.9, we have extreme high confidence (p<0.001) that the it belongs to setosa specie.

Step 4: Predict the test data

testPred <- predict(iris_ctree, newdata = testData)
table(testPred, testData$Species)

##             
## testPred     setosa versicolor virginica
##   setosa         10          0         0
##   versicolor      0         12         2
##   virginica       0          0        14

Conclusion: the prediction of test data is good too, especially for setosa and virginic

Decision tree - iris case from zhao (2013)

Yang

September 11, 2015

Purpose

Analysis steps