Pros:
Cons:
\[\hat{p}_{mk} = \frac{1}{N_m}\sum_{x_i\; in \; Leaf \; m}\mathbb{1}(y_i = k)\]
Misclassification Error: \[ 1 - \hat{p}_{m k(m)}; k(m) = {\rm most; common; k}\] * 0 = perfect purity * 0.5 = no purity
Gini index: \[ \sum_{k \neq k'} \hat{p}_{mk} \times \hat{p}_{mk'} = \sum_{k=1}^K \hat{p}_{mk}(1-\hat{p}_{mk}) = 1 - \sum_{k=1}^K p_{mk}^2\]
http://en.wikipedia.org/wiki/Decision_tree_learning
Deviance/information gain:
\[ -\sum_{k=1}^K \hat{p}_{mk} \log_2\hat{p}_{mk} \] * 0 = perfect purity * 1 = no purity
http://en.wikipedia.org/wiki/Decision_tree_learning
— &twocol w1:50% w2:50% ## Measures of impurity
*** =left
*** =right
data(iris); library(ggplot2)
names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
table(iris$Species)
setosa versicolor virginica
50 50 50
inTrain <- createDataPartition(y=iris$Species,
p=0.7, list=FALSE)
training <- iris[inTrain,]
testing <- iris[-inTrain,]
dim(training); dim(testing)
[1] 45 5
qplot(Petal.Width,Sepal.Width,colour=Species,data=training)
library(caret)
modFit <- train(Species ~ .,method="rpart",data=training)
print(modFit$finalModel)
n= 105
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 105 70 setosa (0.3333 0.3333 0.3333)
2) Petal.Length< 2.45 35 0 setosa (1.0000 0.0000 0.0000) *
3) Petal.Length>=2.45 70 35 versicolor (0.0000 0.5000 0.5000)
6) Petal.Length< 4.75 31 0 versicolor (0.0000 1.0000 0.0000) *
7) Petal.Length>=4.75 39 4 virginica (0.0000 0.1026 0.8974) *
plot(modFit$finalModel, uniform=TRUE,
main="Classification Tree")
text(modFit$finalModel, use.n=TRUE, all=TRUE, cex=.8)
library(rattle)
fancyRpartPlot(modFit$finalModel)
predict(modFit,newdata=testing)
[1] setosa setosa setosa setosa setosa setosa setosa setosa
[9] setosa setosa setosa setosa setosa setosa setosa versicolor
[17] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[25] virginica versicolor virginica versicolor versicolor versicolor virginica virginica
[33] virginica versicolor virginica virginica virginica virginica virginica virginica
[41] virginica virginica virginica virginica virginica
Levels: setosa versicolor virginica