Introduction

About Mushroom

A mushroom or toadstool is the fleshy, spore-bearing fruiting body of a fungus, typically produced above ground, on soil, or on its food source. The terms “mushroom” and “toadstool” go back centuries and were never precisely defined, nor was there consensus on application. During the 15th and 16th centuries, the terms mushrom, mushrum, muscheron, mousheroms, mussheron, or musserouns were used. There are edible mushrooms and poisonous mushrooms.

Mushrooms are used extensively in cooking, in many cuisines (notably Chinese, Korean, European, and Japanese). Most mushrooms sold in supermarkets have been commercially grown on mushroom farms. The most popular of these, Agaricus bisporus, is considered safe for most people to eat because it is grown in controlled, sterilized environments.

A number of species of mushrooms are poisonous; although some resemble certain edible species, consuming them could be fatal. Eating mushrooms gathered in the wild is risky and should only be undertaken by individuals knowledgeable in mushroom identification. Common best practice is for wild mushroom pickers to focus on collecting a small number of visually distinctive, edible mushroom species that cannot be easily confused with poisonous varieties.

What we’ll do

Separating edible from poisonous species requires meticulous attention to detail; there is no single trait by which all toxic mushrooms can be identified, nor one by which all edible mushrooms can be identified. Identifying mushrooms requires a basic understanding of their macroscopic structure. These day, identification require microscopic analysis. But for this case, we need to identify whether a mushroom is edible or not only from macroscopic feature.

For this case, the dataset is from kaggle. The dataset contain macroscopic characteristic of edible mushroom and poisonous mushroom. From this dataset, we will try to create a machine learning model with 3 methods: naive bayes, decision tree, and random forest. We will see which one have the best performance to predict whether a mushroom edible or poisonous based on the macroscopic characteristics.

Data Preparation

Import Library

Load the required library

library(tidyverse) 
library(e1071)
library(partykit) 
library(randomForest) 
library(caret)
library(ROCR)

Read Data

This dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family Mushroom drawn from The Audubon Society Field Guide to North American Mushrooms (1981).

mushroom <- read.csv("mushrooms.csv")
head(mushroom)

Data Pre-processing

Data Wrangling

Lets see the data structure.

str(mushroom)
## 'data.frame':    8124 obs. of  23 variables:
##  $ class                   : chr  "p" "e" "e" "p" ...
##  $ cap.shape               : chr  "x" "x" "b" "x" ...
##  $ cap.surface             : chr  "s" "s" "s" "y" ...
##  $ cap.color               : chr  "n" "y" "w" "w" ...
##  $ bruises                 : chr  "t" "t" "t" "t" ...
##  $ odor                    : chr  "p" "a" "l" "p" ...
##  $ gill.attachment         : chr  "f" "f" "f" "f" ...
##  $ gill.spacing            : chr  "c" "c" "c" "c" ...
##  $ gill.size               : chr  "n" "b" "b" "n" ...
##  $ gill.color              : chr  "k" "k" "n" "n" ...
##  $ stalk.shape             : chr  "e" "e" "e" "e" ...
##  $ stalk.root              : chr  "e" "c" "c" "e" ...
##  $ stalk.surface.above.ring: chr  "s" "s" "s" "s" ...
##  $ stalk.surface.below.ring: chr  "s" "s" "s" "s" ...
##  $ stalk.color.above.ring  : chr  "w" "w" "w" "w" ...
##  $ stalk.color.below.ring  : chr  "w" "w" "w" "w" ...
##  $ veil.type               : chr  "p" "p" "p" "p" ...
##  $ veil.color              : chr  "w" "w" "w" "w" ...
##  $ ring.number             : chr  "o" "o" "o" "o" ...
##  $ ring.type               : chr  "p" "p" "p" "p" ...
##  $ spore.print.color       : chr  "k" "n" "n" "k" ...
##  $ population              : chr  "s" "n" "n" "s" ...
##  $ habitat                 : chr  "u" "g" "m" "u" ...

We can see from the structure that all the columns are in character type. Meaning we are dealing with pure categorical dataset. We need to transform all of them into type factor.

mushroom <- mushroom %>% 
  mutate_if(is.character, as.factor)
str(mushroom)
## 'data.frame':    8124 obs. of  23 variables:
##  $ class                   : Factor w/ 2 levels "e","p": 2 1 1 2 1 1 1 1 2 1 ...
##  $ cap.shape               : Factor w/ 6 levels "b","c","f","k",..: 6 6 1 6 6 6 1 1 6 1 ...
##  $ cap.surface             : Factor w/ 4 levels "f","g","s","y": 3 3 3 4 3 4 3 4 4 3 ...
##  $ cap.color               : Factor w/ 10 levels "b","c","e","g",..: 5 10 9 9 4 10 9 9 9 10 ...
##  $ bruises                 : Factor w/ 2 levels "f","t": 2 2 2 2 1 2 2 2 2 2 ...
##  $ odor                    : Factor w/ 9 levels "a","c","f","l",..: 7 1 4 7 6 1 1 4 7 1 ...
##  $ gill.attachment         : Factor w/ 2 levels "a","f": 2 2 2 2 2 2 2 2 2 2 ...
##  $ gill.spacing            : Factor w/ 2 levels "c","w": 1 1 1 1 2 1 1 1 1 1 ...
##  $ gill.size               : Factor w/ 2 levels "b","n": 2 1 1 2 1 1 1 1 2 1 ...
##  $ gill.color              : Factor w/ 12 levels "b","e","g","h",..: 5 5 6 6 5 6 3 6 8 3 ...
##  $ stalk.shape             : Factor w/ 2 levels "e","t": 1 1 1 1 2 1 1 1 1 1 ...
##  $ stalk.root              : Factor w/ 5 levels "?","b","c","e",..: 4 3 3 4 4 3 3 3 4 3 ...
##  $ stalk.surface.above.ring: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ stalk.surface.below.ring: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ stalk.color.above.ring  : Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ stalk.color.below.ring  : Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ veil.type               : Factor w/ 1 level "p": 1 1 1 1 1 1 1 1 1 1 ...
##  $ veil.color              : Factor w/ 4 levels "n","o","w","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ ring.number             : Factor w/ 3 levels "n","o","t": 2 2 2 2 2 2 2 2 2 2 ...
##  $ ring.type               : Factor w/ 5 levels "e","f","l","n",..: 5 5 5 5 1 5 5 5 5 5 ...
##  $ spore.print.color       : Factor w/ 9 levels "b","h","k","n",..: 3 4 4 3 4 3 3 4 3 3 ...
##  $ population              : Factor w/ 6 levels "a","c","n","s",..: 4 3 3 4 1 3 3 4 5 4 ...
##  $ habitat                 : Factor w/ 7 levels "d","g","l","m",..: 6 2 4 6 2 2 4 4 2 4 ...

From the result, we can deselect veil.type because there is only 1 level.

mushroom <- mushroom %>% 
  select(-veil.type)
str(mushroom)
## 'data.frame':    8124 obs. of  22 variables:
##  $ class                   : Factor w/ 2 levels "e","p": 2 1 1 2 1 1 1 1 2 1 ...
##  $ cap.shape               : Factor w/ 6 levels "b","c","f","k",..: 6 6 1 6 6 6 1 1 6 1 ...
##  $ cap.surface             : Factor w/ 4 levels "f","g","s","y": 3 3 3 4 3 4 3 4 4 3 ...
##  $ cap.color               : Factor w/ 10 levels "b","c","e","g",..: 5 10 9 9 4 10 9 9 9 10 ...
##  $ bruises                 : Factor w/ 2 levels "f","t": 2 2 2 2 1 2 2 2 2 2 ...
##  $ odor                    : Factor w/ 9 levels "a","c","f","l",..: 7 1 4 7 6 1 1 4 7 1 ...
##  $ gill.attachment         : Factor w/ 2 levels "a","f": 2 2 2 2 2 2 2 2 2 2 ...
##  $ gill.spacing            : Factor w/ 2 levels "c","w": 1 1 1 1 2 1 1 1 1 1 ...
##  $ gill.size               : Factor w/ 2 levels "b","n": 2 1 1 2 1 1 1 1 2 1 ...
##  $ gill.color              : Factor w/ 12 levels "b","e","g","h",..: 5 5 6 6 5 6 3 6 8 3 ...
##  $ stalk.shape             : Factor w/ 2 levels "e","t": 1 1 1 1 2 1 1 1 1 1 ...
##  $ stalk.root              : Factor w/ 5 levels "?","b","c","e",..: 4 3 3 4 4 3 3 3 4 3 ...
##  $ stalk.surface.above.ring: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ stalk.surface.below.ring: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ stalk.color.above.ring  : Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ stalk.color.below.ring  : Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ veil.color              : Factor w/ 4 levels "n","o","w","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ ring.number             : Factor w/ 3 levels "n","o","t": 2 2 2 2 2 2 2 2 2 2 ...
##  $ ring.type               : Factor w/ 5 levels "e","f","l","n",..: 5 5 5 5 1 5 5 5 5 5 ...
##  $ spore.print.color       : Factor w/ 9 levels "b","h","k","n",..: 3 4 4 3 4 3 3 4 3 3 ...
##  $ population              : Factor w/ 6 levels "a","c","n","s",..: 4 3 3 4 1 3 3 4 5 4 ...
##  $ habitat                 : Factor w/ 7 levels "d","g","l","m",..: 6 2 4 6 2 2 4 4 2 4 ...

Lets check if there is any NA in the dataset.

colSums(is.na(mushroom))
##                    class                cap.shape              cap.surface 
##                        0                        0                        0 
##                cap.color                  bruises                     odor 
##                        0                        0                        0 
##          gill.attachment             gill.spacing                gill.size 
##                        0                        0                        0 
##               gill.color              stalk.shape               stalk.root 
##                        0                        0                        0 
## stalk.surface.above.ring stalk.surface.below.ring   stalk.color.above.ring 
##                        0                        0                        0 
##   stalk.color.below.ring               veil.color              ring.number 
##                        0                        0                        0 
##                ring.type        spore.print.color               population 
##                        0                        0                        0 
##                  habitat 
##                        0

Fortunately, we don’t have missing value so we can continue to next step.

Cross Validation

Before we make prediction model, we need to split the data into data train and data test. Although random forest don’t need cross validation because already have OBB score, cross validation still needed to equalized the treatment for all models. For this case, we can use 80% for data train and 20% for data test.

RNGkind(sample.kind = "Rounding")
## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(123)

index <- sample(x = nrow(mushroom), nrow(mushroom) * 0.80)

mushroom_train <- mushroom[index,]
mushroom_test <- mushroom[-index,]

After, we split the data, we need to check if the target is imbalance.

prop.table(table(mushroom_train$class))
## 
##         e         p 
## 0.5183874 0.4816126

As we can see, the data train isn’t imbalance. After there is no problem, we continue to build prediction models. For this case, we will use 3 models (Naive Bayes, Decision Tree, and Random Forest).

Model Fitting

Naive Bayes

This is how we create the model for Naive Bayes.

RNGkind(sample.kind = "Rounding")
## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(123)
naive_model <- naiveBayes(x = mushroom_train %>% select(-class), 
                          y = mushroom_train$class,
                          laplace = 1)
naive_model
## 
## Naive Bayes Classifier for Discrete Predictors
## 
## Call:
## naiveBayes.default(x = mushroom_train %>% select(-class), y = mushroom_train$class, 
##     laplace = 1)
## 
## A-priori probabilities:
## mushroom_train$class
##         e         p 
## 0.5183874 0.4816126 
## 
## Conditional probabilities:
##                     cap.shape
## mushroom_train$class            b            c            f            k
##                    e 0.0915555556 0.0002962963 0.3760000000 0.0521481481
##                    p 0.0137117347 0.0012755102 0.3963647959 0.1517857143
##                     cap.shape
## mushroom_train$class            s            x
##                    e 0.0085925926 0.4714074074
##                    p 0.0003188776 0.4365433673
## 
##                     cap.surface
## mushroom_train$class            f            g            s            y
##                    e 0.3702935073 0.0002964720 0.2680106730 0.3613993478
##                    p 0.1968730057 0.0009572431 0.3596043395 0.4425654116
## 
##                     cap.color
## mushroom_train$class            b            c            e            g
##                    e 0.0103580941 0.0071026931 0.1497484463 0.2441550755
##                    p 0.0312101911 0.0031847134 0.2156050955 0.2079617834
##                     cap.color
## mushroom_train$class            n            p            r            u
##                    e 0.2989050015 0.0147972773 0.0044391832 0.0044391832
##                    p 0.2595541401 0.0235668790 0.0003184713 0.0003184713
##                     cap.color
## mushroom_train$class            w            y
##                    e 0.1701686890 0.0958863569
##                    p 0.0837579618 0.1745222930
## 
##                     bruises
## mushroom_train$class         f         t
##                    e 0.3444082 0.6555918
##                    p 0.8384419 0.1615581
## 
##                     odor
## mushroom_train$class            a            c            f            l
##                    e 0.0976909414 0.0002960332 0.0002960332 0.0941385435
##                    p 0.0003185728 0.0503345014 0.5517680790 0.0003185728
##                     odor
## mushroom_train$class            m            n            p            s
##                    e 0.0002960332 0.8063943162 0.0002960332 0.0002960332
##                    p 0.0089200382 0.0324944250 0.0669002867 0.1462249124
##                     odor
## mushroom_train$class            y
##                    e 0.0002960332
##                    p 0.1427206117
## 
##                     gill.attachment
## mushroom_train$class           a           f
##                    e 0.043903886 0.956096114
##                    p 0.005427842 0.994572158
## 
##                     gill.spacing
## mushroom_train$class          c          w
##                    e 0.71610798 0.28389202
##                    p 0.97126437 0.02873563
## 
##                     gill.size
## mushroom_train$class          b          n
##                    e 0.93177099 0.06822901
##                    p 0.43773946 0.56226054
## 
##                     gill.color
## mushroom_train$class            b            e            g            h
##                    e 0.0002957705 0.0230700976 0.0603371783 0.0499852115
##                    p 0.4315722470 0.0003182686 0.1333545512 0.1346276257
##                     gill.color
## mushroom_train$class            k            n            o            p
##                    e 0.0792664892 0.2212363206 0.0168589175 0.2023070098
##                    p 0.0162316996 0.0286441757 0.0003182686 0.1651814131
##                     gill.color
## mushroom_train$class            r            u            w            y
##                    e 0.0002957705 0.1064773736 0.2253771074 0.0144927536
##                    p 0.0057288351 0.0127307447 0.0652450668 0.0060471038
## 
##                     stalk.shape
## mushroom_train$class         e         t
##                    e 0.3835657 0.6164343
##                    p 0.4939336 0.5060664
## 
##                     stalk.root
## mushroom_train$class            ?            b            c            e
##                    e 0.1680497925 0.4555423829 0.1218138708 0.2065797273
##                    p 0.4414673046 0.4803827751 0.0108452951 0.0669856459
##                     stalk.root
## mushroom_train$class            r
##                    e 0.0480142264
##                    p 0.0003189793
## 
##                     stalk.surface.above.ring
## mushroom_train$class           f           k           s           y
##                    e 0.097242811 0.035280166 0.863029944 0.004447080
##                    p 0.037332482 0.567326101 0.393107849 0.002233567
## 
##                     stalk.surface.below.ring
## mushroom_train$class          f          k          s          y
##                    e 0.10999111 0.03172250 0.80610732 0.05217907
##                    p 0.03605616 0.55264837 0.39151244 0.01978302
## 
##                     stalk.color.above.ring
## mushroom_train$class            b            c            e            g
##                    e 0.0002960332 0.0002960332 0.0224985198 0.1358792185
##                    p 0.1137304874 0.0089200382 0.0003185728 0.0003185728
##                     stalk.color.above.ring
## mushroom_train$class            n            o            p            w
##                    e 0.0044404973 0.0438129070 0.1361752516 0.6563055062
##                    p 0.1134119146 0.0003185728 0.3265371137 0.4342147181
##                     stalk.color.above.ring
## mushroom_train$class            y
##                    e 0.0002960332
##                    p 0.0022300096
## 
##                     stalk.color.below.ring
## mushroom_train$class            b            c            e            g
##                    e 0.0002960332 0.0002960332 0.0219064535 0.1367673179
##                    p 0.1130933418 0.0089200382 0.0003185728 0.0003185728
##                     stalk.color.below.ring
## mushroom_train$class            n            o            p            w
##                    e 0.0150976909 0.0438129070 0.1343990527 0.6471284784
##                    p 0.1140490602 0.0003185728 0.3313157056 0.4252946798
##                     stalk.color.below.ring
## mushroom_train$class            y
##                    e 0.0002960332
##                    p 0.0063714559
## 
##                     veil.color
## mushroom_train$class           n           o           w           y
##                    e 0.021049511 0.023124815 0.955529202 0.000296472
##                    p 0.000319081 0.000319081 0.997128271 0.002233567
## 
##                     ring.number
## mushroom_train$class            n            o            t
##                    e 0.0002965599 0.8748517200 0.1248517200
##                    p 0.0089371210 0.9709543568 0.0201085222
## 
##                     ring.type
## mushroom_train$class            e            f            l            n
##                    e 0.2391819798 0.0121517487 0.0002963841 0.0002963841
##                    p 0.4433811802 0.0003189793 0.3358851675 0.0089314195
##                     ring.type
## mushroom_train$class            p
##                    e 0.7480735033
##                    p 0.2114832536
## 
##                     spore.print.color
## mushroom_train$class            b            h            k            n
##                    e 0.0109532268 0.0121373594 0.3975725281 0.4100059207
##                    p 0.0003185728 0.4084103218 0.0602102580 0.0570245301
##                     spore.print.color
## mushroom_train$class            o            r            u            w
##                    e 0.0127294257 0.0002960332 0.0109532268 0.1352871522
##                    p 0.0003185728 0.0200700860 0.0003185728 0.4530105129
##                     spore.print.color
## mushroom_train$class            y
##                    e 0.0100651273
##                    p 0.0003185728
## 
##                     population
## mushroom_train$class            a            c            n            s
##                    e 0.0909629630 0.0675555556 0.0957037037 0.2097777778
##                    p 0.0003188776 0.0124362245 0.0003188776 0.0937500000
##                     population
## mushroom_train$class            v            y
##                    e 0.2838518519 0.2521481481
##                    p 0.7190688776 0.1741071429
## 
##                     habitat
## mushroom_train$class            d            g            l            m
##                    e 0.4466824645 0.3335308057 0.0545023697 0.0619075829
##                    p 0.3222824354 0.1985973860 0.1517373287 0.0098820529
##                     habitat
## mushroom_train$class            p            u            w
##                    e 0.0346563981 0.0234004739 0.0453199052
##                    p 0.2496015301 0.0675804909 0.0003187759

Decision Tree

This is how we create the model for Decision Tree.

RNGkind(sample.kind = "Rounding")
## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(123)
dtree_model <- ctree(formula = class ~.,
                     data = mushroom_train)
plot(dtree_model, type = "simple")

Random Forest

This is how we create the model for Random Forest.

#RNGkind(sample.kind = "Rounding")
#set.seed(123)

#ctrl <- trainControl(method = "repeatedcv",
#                     number = 5, 
#                     repeats = 3) 

#rf_model <- train(class ~ .,
#                   data = mushroom_train,
#                   method = "rf",
#                   trControl = ctrl)

#saveRDS(rf_model, "mushroom_randomforest.RDS")

For knitting purpose, we use model that already saved into RDS.

rf_model <- readRDS("mushroom_randomforest.RDS")

Model Evaluation

Naive Bayes

Lets evaluate the Naive Bayes model for test data.

prediction_naive_class <- predict(naive_model,
                         mushroom_test,
                         type = "class")
confusionMatrix(prediction_naive_class, 
                mushroom_test$class, 
                positive = "e")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   e   p
##          e 830  61
##          p   9 725
##                                           
##                Accuracy : 0.9569          
##                  95% CI : (0.9459, 0.9663)
##     No Information Rate : 0.5163          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9136          
##                                           
##  Mcnemar's Test P-Value : 1.09e-09        
##                                           
##             Sensitivity : 0.9893          
##             Specificity : 0.9224          
##          Pos Pred Value : 0.9315          
##          Neg Pred Value : 0.9877          
##              Prevalence : 0.5163          
##          Detection Rate : 0.5108          
##    Detection Prevalence : 0.5483          
##       Balanced Accuracy : 0.9558          
##                                           
##        'Positive' Class : e               
## 

As you can see from the confusion matrix, Naive Bayes model already have high accuracy, sensitivity, specificity, and precision.

prediction_naive_raw <- predict(naive_model,
                            mushroom_test, 
                            type = "raw")
data_roc <- data.frame(pred_prob = prediction_naive_raw[,"e"],
                       actual = ifelse(mushroom_test$class == "e", 1, 0))
prediction_roc <-  prediction(predictions = data_roc$pred_prob,
                    labels = data_roc$actual)
auc_number <- performance(prediction_roc, measure = "auc")
plot(performance(prediction_roc, "tpr", "fpr"))
abline(0, 1, lty = 2)
text(0.4, 0.6, paste("AUC = ", auc_number@y.values[[1]], 2))

If we look at AUC number, the model is good enough because closer to 1.

Decision Tree

Lets evaluate the Decision Tree model for test data.

prediction_dtree <- predict(dtree_model, 
                            mushroom_test)
confusionMatrix(prediction_dtree, 
                mushroom_test$class, 
                positive = "e")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   e   p
##          e 839   3
##          p   0 783
##                                           
##                Accuracy : 0.9982          
##                  95% CI : (0.9946, 0.9996)
##     No Information Rate : 0.5163          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9963          
##                                           
##  Mcnemar's Test P-Value : 0.2482          
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.9962          
##          Pos Pred Value : 0.9964          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.5163          
##          Detection Rate : 0.5163          
##    Detection Prevalence : 0.5182          
##       Balanced Accuracy : 0.9981          
##                                           
##        'Positive' Class : e               
## 

As we can see, the model is better than Naive Bayes. Even though Decision Tree prone to overfit, the performance still good so we don’t need to check confusion matrix for data train.

Random Forest

Before we check the model, lets interprate what is the most important variable in this model.

varImp(rf_model)
## rf variable importance
## 
##   only 20 most important variables shown (out of 95)
## 
##                           Overall
## odorn                     100.000
## odorf                      31.304
## gill.sizen                 31.231
## stalk.rootc                16.278
## stalk.surface.above.ringk  10.503
## bruisest                    9.301
## spore.print.colorr          7.876
## stalk.surface.below.ringk   6.337
## stalk.surface.below.ringy   5.651
## ring.typep                  4.598
## odorl                       4.380
## stalk.rootr                 4.234
## spore.print.colorh          4.037
## gill.spacingw               3.792
## odorp                       2.298
## cap.colory                  2.193
## spore.print.colorw          2.098
## stalk.roote                 2.096
## odorc                       1.753
## ring.numbert                1.571
plot(varImp(rf_model))

Based on the result, odor is the most important variable in this model. If we want to improve other model we can do it with variable selection based on this result.

rf_model$finalModel
## 
## Call:
##  randomForest(x = x, y = y, mtry = param$mtry) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 48
## 
##         OOB estimate of  error rate: 0%
## Confusion matrix:
##      e    p class.error
## e 3369    0           0
## p    0 3130           0

Based on the result, the OOB is 0%. That means, the model will 100% predict any new data. Lets prove it.

prediction_rf <- predict(rf_model, mushroom_test, type = "raw")

confusionMatrix(data = prediction_rf, 
                reference = mushroom_test$class, 
                positive = "e")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   e   p
##          e 839   0
##          p   0 786
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9977, 1)
##     No Information Rate : 0.5163     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.0000     
##             Specificity : 1.0000     
##          Pos Pred Value : 1.0000     
##          Neg Pred Value : 1.0000     
##              Prevalence : 0.5163     
##          Detection Rate : 0.5163     
##    Detection Prevalence : 0.5163     
##       Balanced Accuracy : 1.0000     
##                                      
##        'Positive' Class : e          
## 

Based on the result, the model can predict 100% whether the mushroom is edible or not.

Conclusion

Random Forest have the best performance out of all three but we can’t interprate the model. To interprate which variable make a mushroom poisonous or not, we can use Naive Bayes or Decision Tree model. In this case Decision Tree model is better than Naive Bayes.