There is no simple rule to determine if a mushroom is edible or not. Unlike little ditties for poison ivy like leaflets three, let it be, you need to examine multiple features of a mushroom to determine if it is edible or not. For instance, see Rule 14 in this resource on mushroom collecting.
This provides the results on predicting the edible from poisonous mushrooms provided in the UCI Mushroom Data Set. The dataset was modified to remove the variable with na values (it was not a good predictor. The remaining data included 8,123 mushrooms and 21 variables.
This deck is written in R Slidy to demonstrate modeling approaches that yield accurate prediction of the data provided into one of two classes.
Machine learning can keep you alive in the zombie apocalypse when you forage for grub…at least when it comes to picking mushrooms. Keeping yourself safe from the many other ways of becoming zombie-chow is on you.
Good luck, and eat well with this model.
If you had to eat, which ones “work for you?”
Let’s split the data into a 70% training set to do the machine learning and use the other 30% to test the model.
mush <- read.csv("./data/mushroomUCI_modified.csv")
set.seed(524)
train <- sample_frac(mush, 0.7, replace = FALSE)
rows <- as.numeric(row.names(train))
test <- mush[-rows, ]
fit <- rpart(result ~ . , data = train, method = "class")
predicted <- predict(fit, newdata = test, type = "class")
table(predicted, test$result)##
## predicted edible poison
## edible 1262 11
## poison 0 1164
Not bad…unless you get one of those 11 ’shrooms.
fancyRpartPlot(fit, main = "Mushroom Classification Results",
sub = "Variable Feature legend is available in the data dictionary")fitC <- ctree(result ~ ., data = train)
table(predict(fitC, newdata = test), test$result)##
## edible poison
## edible 1262 6
## poison 0 1169
This is a better predictor, but you still have 6 chances to…well…
(*_*) --> (X_X)
fitRF <- randomForest(result ~ . , data = train)
predictedRF <- predict(fitRF, newdata = test, type = "class")
table(predictedRF, test$result)##
## predictedRF edible poison
## edible 1262 0
## poison 0 1175
This table shows the results of a random forest model.
In this case, it is a perfect classifier, because of the iterative nature of random forests.
They are not plotted easily, so this output represents the output.
…and here are the relative importance of the variables
## variable importance
## 1 odor 964.837247
## 2 spore.print.color 455.849565
## 3 gill.color 242.741022
## 4 gill.size 199.543320
## 5 stalk.surface.above.ring 176.112387
## 6 stalk.surface.below.ring 130.274585
## 7 ring.type 126.560245
## 8 population 118.859445
## 9 habitat 87.804615
## 10 gill.spacing 55.909031
## 11 bruises 49.949402
## 12 cap.color 46.770311
## 13 stalk.color.below.ring 38.344326
## 14 ring.number 35.821492
## 15 stalk.color.above.ring 33.417914
## 16 stalk.shape 30.562732
## 17 cap.surface 17.984336
## 18 cap.shape 10.443502
## 19 veil.color 2.405235
## 20 gill.attachment 1.306411
## 21 veil.type 0.000000