不修剪決策樹,比較tree套件與rpart套件的準確度

tree套件

Plot decision tree of tree:

library(tree)
## Warning: package 'tree' was built under R version 3.4.4
library(rpart)
library(MASS)

pima.tree <- tree(type ~., data=Pima.tr )
plot(pima.tree)
text(pima.tree)

Accuracy of tree

test_tree <- predict(pima.tree, Pima.te, type = "class")
compare_tree <- ifelse(test_tree == Pima.te$type, 1, 0)
sum_tree <- sum(compare_tree)
length_tree <- length(compare_tree)
accuracy_tree <- sum_tree / length_tree
accuracy_tree
## [1] 0.7138554

rpart套件

Plot decision tree of rpart:

pima.tree1 <- rpart(type ~., Pima.tr, control=rpart.control(cp=0))
plot(pima.tree1)
text(pima.tree1)

Accuracy of rpart

test_rpart <- predict(pima.tree1, Pima.te, type = "class")
compare_rpart <- ifelse(test_rpart == Pima.te$type, 1, 0)
sum_rpart <- sum(compare_rpart)
length_rpart <- length(compare_rpart)
accuracy_rpart <- sum_rpart / length_rpart
accuracy_rpart
## [1] 0.7319277

使用rpart套件,比較不同決策樹修剪幅度的準確度,並建議最合適的修剪幅度(cp值)

pima.tree2 <- rpart(type ~., Pima.tr, control=rpart.control(cp=0))
printcp(pima.tree2)
## 
## Classification tree:
## rpart(formula = type ~ ., data = Pima.tr, control = rpart.control(cp = 0))
## 
## Variables actually used in tree construction:
## [1] age bmi bp  glu ped
## 
## Root node error: 68/200 = 0.34
## 
## n= 200 
## 
##         CP nsplit rel error  xerror     xstd
## 1 0.220588      0   1.00000 1.00000 0.098518
## 2 0.161765      1   0.77941 1.02941 0.099197
## 3 0.073529      2   0.61765 0.79412 0.092331
## 4 0.058824      3   0.54412 0.77941 0.091785
## 5 0.014706      4   0.48529 0.57353 0.082399
## 6 0.000000      7   0.44118 0.75000 0.090647
plotcp(pima.tree2)

利用randomForest套件,比較randomForest與tree、rpart的準確度

Accuracy of randomForest:

library(randomForest)
## Warning: package 'randomForest' was built under R version 3.4.4
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
set.seed(777)
pima.rf <- randomForest(type ~., data = Pima.tr)

test.rf <- predict(pima.rf, Pima.te, type = "class")
compare.rf <- ifelse(test.rf == Pima.te$type, 1, 0)
sum.rf <- sum(compare.rf)
length.rf <- length(compare.rf)
accuracy.rf <- sum.rf / length.rf
accuracy.rf
## [1] 0.7590361

利用randomForest套件,比較不同的森林大小的準確度,並建議最合適的森林大小(決策樹的個數)

plot(pima.rf)