’shroom-a-thon

To Eat or Not-to-Eat?

There is no simple rule to determine if a mushroom is edible or not. Unlike little ditties for poison ivy like leaflets three, let it be, you need to examine multiple features of a mushroom to determine if it is edible or not. For instance, see Rule 14 in this resource on mushroom collecting.

This provides the results on predicting the edible from poisonous mushrooms provided in the UCI Mushroom Data Set. The dataset was modified to remove the variable with na values (it was not a good predictor. The remaining data included 8,123 mushrooms and 21 variables.

This deck is written in R Slidy to demonstrate modeling approaches that yield accurate prediction of the data provided into one of two classes.

Machine learning can keep you alive in the zombie apocalypse when you forage for grub…at least when it comes to picking mushrooms. Keeping yourself safe from the many other ways of becoming zombie-chow is on you.

Good luck, and eat well with this model.

Three classification methods checked

If you had to eat, which ones “work for you?”

Classification Tree

Let’s split the data into a 70% training set to do the machine learning and use the other 30% to test the model.

mush <- read.csv("./data/mushroomUCI_modified.csv")

set.seed(524)
train <- sample_frac(mush, 0.7, replace = FALSE)
rows <- as.numeric(row.names(train))
test <- mush[-rows, ]

fit <- rpart(result ~ . , data = train, method = "class")
predicted <- predict(fit, newdata = test, type = "class")
table(predicted, test$result)
##          
## predicted edible poison
##    edible   1262     11
##    poison      0   1164

Not bad…unless you get one of those 11 ’shrooms.

Classification Tree

fancyRpartPlot(fit, main = "Mushroom Classification Results",
               sub = "Variable Feature legend is available in the data dictionary")

Conditional Inference Tree

fitC <- ctree(result ~ ., data = train)
table(predict(fitC, newdata = test), test$result)
##         
##          edible poison
##   edible   1262      6
##   poison      0   1169

This is a better predictor, but you still have 6 chances to…well…

                     (*_*)  -->  (X_X)

Conditional Inference Tree

Dark Gray bars indicates proportion of poisonous mushrooms in terminal leaf

Random Forest

fitRF <- randomForest(result ~ . ,   data = train)
predictedRF <- predict(fitRF, newdata = test, type = "class")
table(predictedRF, test$result)
##            
## predictedRF edible poison
##      edible   1262      0
##      poison      0   1175

This table shows the results of a random forest model.

In this case, it is a perfect classifier, because of the iterative nature of random forests.

They are not plotted easily, so this output represents the output.

Random Forest

…and here are the relative importance of the variables

##                    variable importance
## 1                      odor 964.837247
## 2         spore.print.color 455.849565
## 3                gill.color 242.741022
## 4                 gill.size 199.543320
## 5  stalk.surface.above.ring 176.112387
## 6  stalk.surface.below.ring 130.274585
## 7                 ring.type 126.560245
## 8                population 118.859445
## 9                   habitat  87.804615
## 10             gill.spacing  55.909031
## 11                  bruises  49.949402
## 12                cap.color  46.770311
## 13   stalk.color.below.ring  38.344326
## 14              ring.number  35.821492
## 15   stalk.color.above.ring  33.417914
## 16              stalk.shape  30.562732
## 17              cap.surface  17.984336
## 18                cap.shape  10.443502
## 19               veil.color   2.405235
## 20          gill.attachment   1.306411
## 21                veil.type   0.000000