DecisionTree

Decsion Tree

suppressMessages(library(caret))

## Warning: package 'caret' was built under R version 3.6.3

library(ggplot2)
data("iris")
set.seed(430)

names(iris)<- tolower(names(iris))
table(iris$species)

## 
##     setosa versicolor  virginica 
##         50         50         50

index <- createDataPartition(y = iris$species ,p = 0.7, list = FALSE)

train.set <- iris[index,]
test.set <- iris[-index,]

dim(train.set)

## [1] 105   5

g <- ggplot(data = train.set)

g1 <- g + geom_point(aes(x=sepal.length, y = sepal.width, color = species),pch = 19)

g2 <- g+ geom_point(aes(x=petal.length, y = petal.width, color = species), pch = 19) 

gridExtra::grid.arrange(g1,g2)

From the Plots the petal length and width are good feeatures for root node.

iris.tree = train(species ~ ., 
                  data=train.set, 
                  method="rpart", 
                  trControl = trainControl(method = "cv"))

iris.tree

## CART 
## 
## 105 samples
##   4 predictor
##   3 classes: 'setosa', 'versicolor', 'virginica' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 96, 94, 94, 94, 95, 95, ... 
## Resampling results across tuning parameters:
## 
##   cp         Accuracy   Kappa    
##   0.0000000  0.9527273  0.9282660
##   0.4428571  0.7733333  0.6637422
##   0.5000000  0.3857576  0.1285714
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cp = 0.

suppressMessages(library(rattle))

## Warning: package 'rattle' was built under R version 3.6.3

fancyRpartPlot(iris.tree$finalModel)

The Feature is automatially selected as petal.width and petal.length.

Let’s predict the values for the error.rate

Error rate in Training set

iris.pred = predict(iris.tree)
table(iris.pred, train.set$species)

##             
## iris.pred    setosa versicolor virginica
##   setosa         35          0         0
##   versicolor      0         34         3
##   virginica       0          1        32

iris.pred = predict(iris.tree, newdata = test.set)
table(iris.pred, test.set$species)

##             
## iris.pred    setosa versicolor virginica
##   setosa         15          0         0
##   versicolor      0         15         2
##   virginica       0          0        13

error.rate = round(mean(iris.pred != test.set$species),2)
error.rate

## [1] 0.04

Error rate is 2% with the classifiers like petal length and petal width.

DecisionTree

Praveen Jalaja

5/1/2020

Decsion Tree

From the Plots the petal length and width are good feeatures for root node.

The Feature is automatially selected as petal.width and petal.length.

Let’s predict the values for the error.rate