Using independent variables (Sepal lenght, Sepal width, Petal length, and Petal width) to predict species of iris
Step 1: Split data into two subsets, with 70% training and 30% test
set.seed(1234)
SampleID <- sample(2, nrow(iris), replace = TRUE, prob = c(0.7, 0.3))
trainData <- iris[SampleID==1, ]
testData <- iris[SampleID==2, ]
Step 2: Build the decision tree and check the predict
library(party)
iris_ctree <- ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = trainData)
table(predict(iris_ctree),trainData$Species)
##
## setosa versicolor virginica
## setosa 40 0 0
## versicolor 0 37 3
## virginica 0 1 31
Conclusion: the prediction is high reliable, we can see that there are only 4 mistakes in the 112 training samples.
Step 3: Plot the decision tree
plot(iris_ctree, type="simple")
Conclusion: the above tree has 4 terminal nodes. P values shows the confidence we have that an instance falling into the groups. For example, if Petal.Length is small or even 1.9, we have extreme high confidence (p<0.001) that the it belongs to setosa specie.
Step 4: Predict the test data
testPred <- predict(iris_ctree, newdata = testData)
table(testPred, testData$Species)
##
## testPred setosa versicolor virginica
## setosa 10 0 0
## versicolor 0 12 2
## virginica 0 0 14
Conclusion: the prediction of test data is good too, especially for setosa and virginic