Creating factor levels
##split dataset into factor levels by type
levels(mushrooms$type)<-c("poisonous","edible")
Splitting the dataset into test and train. The train set will be used to build the model and the test set will estimate accuracy of the mdodel.
##split sample
splitSample <- sample(1:2, size = nrow(mushrooms), prob = c(0.8,0.2), replace=TRUE)
mushroom.train <- mushrooms[splitSample==1,]
mushroom.test <- mushrooms[splitSample==2,]
Running a regression- using this model to test/predict whether a mushroom is poisonous. The dependent variable, type, determines which is poisonous (or edible). The independent variables are cap color, odor, gill size, gill color, and cap shape.
##creating models
model.mush <- C5.0(type ~ cap_color + odor + gill_size + gill_color + cap_shape, data = mushroom.train, family = "binomial")
summary(model.mush)
##
## Call:
## C5.0.formula(formula = type ~ cap_color + odor + gill_size + gill_color
## + cap_shape, data = mushroom.train, family = "binomial")
##
##
## C5.0 [Release 2.07 GPL Edition] Mon Mar 30 11:10:39 2020
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 6513 cases (6 attributes) from undefined.data
##
## Decision tree:
##
## odor in {creosote,fishy,foul,musty,pungent,spicy}: edible (3039)
## odor in {almond,anise,none}:
## :...cap_color in {buff,pink}:
## :...gill_color in {black,brown,buff,chocolate,orange,pink,purple,
## : : yellow}: poisonous (0)
## : gill_color in {gray,green}: edible (25)
## : gill_color in {red,white}:
## : :...cap_shape = bell: edible (6)
## : cap_shape in {conical,convex,flat,knobbed,
## : sunken}: poisonous (90/7)
## cap_color in {brown,cinnamon,gray,green,purple,red,white,yellow}:
## :...gill_size = broad:
## :...cap_color in {brown,cinnamon,gray,green,purple,red,
## : : yellow}: poisonous (2532)
## : cap_color = white:
## : :...gill_color in {black,brown,chocolate,
## : : pink}: poisonous (359)
## : gill_color in {buff,green,orange,purple,red,
## : : yellow}: edible (6)
## : gill_color in {gray,white}:
## : :...cap_shape in {bell,conical,convex,knobbed,
## : : sunken}: poisonous (188/7)
## : cap_shape = flat: edible (8)
## gill_size = narrow:
## :...cap_shape in {bell,conical}: edible (13)
## cap_shape in {convex,flat,knobbed,sunken}:
## :...gill_color in {black,brown,buff,chocolate,gray,green,orange,
## : pink,purple,red}: poisonous (152)
## gill_color in {white,yellow}:
## :...odor in {almond,anise}: poisonous (27)
## odor = none:
## :...cap_color in {brown,cinnamon,gray,green,purple,red,
## : white}: poisonous (56/12)
## cap_color = yellow: edible (12)
##
##
## Evaluation on training data (6513 cases):
##
## Decision Tree
## ----------------
## Size Errors
##
## 14 26( 0.4%) <<
##
##
## (a) (b) <-classified as
## ---- ----
## 3378 (a): class poisonous
## 26 3109 (b): class edible
##
##
## Attribute usage:
##
## 100.00% odor
## 53.34% cap_color
## 51.48% gill_size
## 14.26% gill_color
## 8.48% cap_shape
##
##
## Time: 0.0 secs
head(predict(model.mush,mushroom.train))
## [1] edible poisonous poisonous edible poisonous poisonous
## Levels: poisonous edible
prediction model with train set
##prediction model
model.predict <- predict(model.mush, mushroom.train)
head(model.predict)
## [1] edible poisonous poisonous edible poisonous poisonous
## Levels: poisonous edible
predictresult <- predict(model.mush,mushroom.test)
xtabs(~mushroom.test$type)
## mushroom.test$type
## poisonous edible
## 830 781
Creating a confusion matrix. Using a confusion matrix to measure the accuracy of the model for predicting if a mushroom is poisonous. The model worked really well because thre were no type two errors, meaning it never predicted edible and was poisonous, which wouldn’t end well…
CrossTable( mushroom.test$type,predictresult, prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE, dnn = c('Actual','Predicted'))
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 1611
##
##
## | Predicted
## Actual | poisonous | edible | Row Total |
## -------------|-----------|-----------|-----------|
## poisonous | 830 | 0 | 830 |
## | 0.515 | 0.000 | |
## -------------|-----------|-----------|-----------|
## edible | 6 | 775 | 781 |
## | 0.004 | 0.481 | |
## -------------|-----------|-----------|-----------|
## Column Total | 836 | 775 | 1611 |
## -------------|-----------|-----------|-----------|
##
##