The Data

The mushroom data set1 is a well-known example data set used frequently by people learning R for the first time. Sourced from The Audubon Society Field Guide to North American Mushrooms2, it contains 22 variables describing each type of mushroom as well as a classification indicating that it is poisonous (P) or edible (E).

# Load dataset from the UCI repository
mushrooms <- read_csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data"), col_names=FALSE, na='?')

# Add names to the dataset
names(mushrooms) <- c("category","capShape","capSurface","capColor","bruises","odor","gillAttachment","gillSpacing","gillSize","gillColor","stalkShape","stalkRoot","stalkSurfaceAbove","stalkSurfaceBelow","stalkColorAbove","stalkColorBelow","veilType","veilColor","ringNumber","ringType","sporePrintColor","population","habitat")

# Add some factors
mushrooms$category <- factor(mushrooms$category, levels=c("p","e"), labels=c("Poisonous","Edible"))

mushrooms$capColor <- factor(mushrooms$capColor, levels=c("n","b","c","g","r","p","u","e","w","y"), labels=c("brown","buff","cinnamon","grey","green","pink","purple","red","white","yellow"))

mushrooms$capShape <- factor(mushrooms$capShape, levels = c("b","c","x","f","k","s"), labels = c("bell","conical","convex","flat","knobbed","sunken"))

mushrooms$odor <- factor(mushrooms$odor, levels=c("a","l","c","y","f","m","n","p","s"), labels = c("almond","anise","creosote","fishy","foul","musty","none","pungent","spicy"))

Looking at the data, there are a somewhat even split of poisonous and non-poisonous mushrooms respresented:

Analysis

Narrowing down the dataset to a few easy-to-recognize features, we want to see if there is an easy rule of thumb to know when a mushroom is edible.

shrooms <- mushrooms %>% select(category, capShape, capColor, odor)

Cap Color

First, let’s look at color of the cap:

A sample mushroom cap
# Vector of colors
cols <- c("saddlebrown","bisque2","peru","grey","green","pink","purple","red","white","yellow")

# Plot the cap colors
ggplot(data=shrooms) + geom_bar(mapping=aes(x=category, fill=capColor)) + scale_fill_manual(values = cols) + ggtitle("Mushroom Cap Colors","Edible vs. Poisonous") + labs(fill="Cap Color")

It appears that unless you happen upon a purple or green-capped mushroom, the odds are not in your favor that it is edible.

shrooms %>% filter(capColor == "purple" | capColor == "green") %>% {round(prop.table(table(.$category,.$capColor),margin=2),3)}
##            
##             brown buff cinnamon grey green pink purple red white yellow
##   Poisonous                              0           0                 
##   Edible                                 1           1

At best the white-capped mushrooms are nearly 70% likely to be edible - not a chance I’d be willing to take.

shrooms %>% filter(capColor == "white") %>% {prop.table(table(.$category))}
## 
## Poisonous    Edible 
## 0.3076923 0.6923077

Cap Shape

Now on to the shape of the cap. Is there some general rule we can make about a mushroom’s safety based on the shape of the cap, rather than color?

# Plot the cap shapes
ggplot(data=shrooms) + geom_bar(mapping=aes(x=category, fill=capShape)) + scale_colour_brewer(palette = "Set3") + ggtitle("Mushroom Cap Shape","Edible vs. Poisonous") + labs(fill="Cap Shape")

Here we see (again) that there is no truly safe shape except the 32 sunken-capped mushrooms:

shrooms %>% filter(capShape == "sunken") %>% {prop.table(table(.$category))}
## 
## Poisonous    Edible 
##         0         1

Bell-shaped caps look semi-safe:

shrooms %>% filter(capShape == "bell") %>% {round(prop.table(table(.$category)),3)}
## 
## Poisonous    Edible 
##     0.106     0.894

but even a nearly-90% chance is still dicey unless there’s no real choice offered to you (say, a survival situation).

Looking deeper at the bell shape, is there a combination of color and shape that can help us a bit more?

shrooms %>% filter(capShape == "bell") %>% {round(prop.table(table(.$category,.$capColor), margin=2),3)}
##            
##             brown  buff cinnamon  grey green  pink purple red white yellow
##   Poisonous 0.071 1.000          0.000       1.000            0.074  0.045
##   Edible    0.929 0.000          1.000       0.000            0.926  0.955

It appears that yellow AND bell-shaped is the most safe combination, but even then there’s still some chance of getting a poisonous one.

Odor

The odor may also be an easy way to rule out poisonous mushrooms:

# Plot the odor
ggplot(data=shrooms) + geom_bar(mapping=aes(x=category, fill=odor)) + scale_colour_brewer(palette = "Set3") + ggtitle("Mushroom Odor","Edible vs. Poisonous") + labs(fill="Odor")

Finally we have a bit clearer of a line to draw. It appears that some odors are 100% poisonous and some are 100% edible. Interestingly, a vast majority of those with no odor are safe as well.

round(prop.table(table(shrooms$category, shrooms$odor),margin=2),3)
##            
##             almond anise creosote fishy  foul musty  none pungent spicy
##   Poisonous  0.000 0.000    1.000 1.000 1.000 1.000 0.034   1.000 1.000
##   Edible     1.000 1.000    0.000 0.000 0.000 0.000 0.966   0.000 0.000

Conclusion

After looking over the data, it appears that there are a few categories of easy-to-classify features (color, shape, and smell) of mushrooms one could possibly use in order to be sure that a parcticular sample is safe to eat.

Certain odors are safe: namely anise and almonds, while odorless mushrooms are relatively safe in general (~96% of the samples). Some with certain shaped caps were also deemed edible (those rare examples with a sunken cap) or mostly-edible (about 89% of bell-shaped caps). Colors, on the other hand, were poor predictors by themselves unless you’re lucky to find a rare purple or green capped mushroom.

In summary, there are few hard and fast rules that make you 100% safe when choosing to eat a wild mushroom. Unless you are an expert, I’d stick to the produce section of your local supermarket.

Citations


  1. https://archive.ics.uci.edu/ml/datasets/Mushroom

  2. The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf