Creating factor levels

##split dataset into factor levels by type 
levels(mushrooms$type)<-c("poisonous","edible")

Splitting the dataset into test and train. The train set will be used to build the model and the test set will estimate accuracy of the mdodel.

##split sample
splitSample <- sample(1:2, size = nrow(mushrooms), prob = c(0.8,0.2), replace=TRUE)
mushroom.train <- mushrooms[splitSample==1,]
mushroom.test <- mushrooms[splitSample==2,]

Running a regression- using this model to test/predict whether a mushroom is poisonous. The dependent variable, type, determines which is poisonous (or edible). The independent variables are cap color, odor, gill size, gill color, and cap shape.

##creating models
model.mush <- C5.0(type ~ cap_color + odor + gill_size + gill_color + cap_shape, data = mushroom.train, family = "binomial")
summary(model.mush)
## 
## Call:
## C5.0.formula(formula = type ~ cap_color + odor + gill_size + gill_color
##  + cap_shape, data = mushroom.train, family = "binomial")
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Mon Mar 30 11:10:39 2020
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 6513 cases (6 attributes) from undefined.data
## 
## Decision tree:
## 
## odor in {creosote,fishy,foul,musty,pungent,spicy}: edible (3039)
## odor in {almond,anise,none}:
## :...cap_color in {buff,pink}:
##     :...gill_color in {black,brown,buff,chocolate,orange,pink,purple,
##     :   :              yellow}: poisonous (0)
##     :   gill_color in {gray,green}: edible (25)
##     :   gill_color in {red,white}:
##     :   :...cap_shape = bell: edible (6)
##     :       cap_shape in {conical,convex,flat,knobbed,
##     :                     sunken}: poisonous (90/7)
##     cap_color in {brown,cinnamon,gray,green,purple,red,white,yellow}:
##     :...gill_size = broad:
##         :...cap_color in {brown,cinnamon,gray,green,purple,red,
##         :   :             yellow}: poisonous (2532)
##         :   cap_color = white:
##         :   :...gill_color in {black,brown,chocolate,
##         :       :              pink}: poisonous (359)
##         :       gill_color in {buff,green,orange,purple,red,
##         :       :              yellow}: edible (6)
##         :       gill_color in {gray,white}:
##         :       :...cap_shape in {bell,conical,convex,knobbed,
##         :           :             sunken}: poisonous (188/7)
##         :           cap_shape = flat: edible (8)
##         gill_size = narrow:
##         :...cap_shape in {bell,conical}: edible (13)
##             cap_shape in {convex,flat,knobbed,sunken}:
##             :...gill_color in {black,brown,buff,chocolate,gray,green,orange,
##                 :              pink,purple,red}: poisonous (152)
##                 gill_color in {white,yellow}:
##                 :...odor in {almond,anise}: poisonous (27)
##                     odor = none:
##                     :...cap_color in {brown,cinnamon,gray,green,purple,red,
##                         :             white}: poisonous (56/12)
##                         cap_color = yellow: edible (12)
## 
## 
## Evaluation on training data (6513 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##      14   26( 0.4%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##    3378          (a): class poisonous
##      26  3109    (b): class edible
## 
## 
##  Attribute usage:
## 
##  100.00% odor
##   53.34% cap_color
##   51.48% gill_size
##   14.26% gill_color
##    8.48% cap_shape
## 
## 
## Time: 0.0 secs
head(predict(model.mush,mushroom.train))
## [1] edible    poisonous poisonous edible    poisonous poisonous
## Levels: poisonous edible

prediction model with train set

##prediction model
model.predict <- predict(model.mush, mushroom.train)
head(model.predict)
## [1] edible    poisonous poisonous edible    poisonous poisonous
## Levels: poisonous edible
predictresult <- predict(model.mush,mushroom.test)
xtabs(~mushroom.test$type)
## mushroom.test$type
## poisonous    edible 
##       830       781

Creating a confusion matrix. Using a confusion matrix to measure the accuracy of the model for predicting if a mushroom is poisonous. The model worked really well because thre were no type two errors, meaning it never predicted edible and was poisonous, which wouldn’t end well…

CrossTable( mushroom.test$type,predictresult, prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE, dnn = c('Actual','Predicted'))
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1611 
## 
##  
##              | Predicted 
##       Actual | poisonous |    edible | Row Total | 
## -------------|-----------|-----------|-----------|
##    poisonous |       830 |         0 |       830 | 
##              |     0.515 |     0.000 |           | 
## -------------|-----------|-----------|-----------|
##       edible |         6 |       775 |       781 | 
##              |     0.004 |     0.481 |           | 
## -------------|-----------|-----------|-----------|
## Column Total |       836 |       775 |      1611 | 
## -------------|-----------|-----------|-----------|
## 
##