The mushroom data set1 is a well-known example data set used frequently by people learning R for the first time. Sourced from The Audubon Society Field Guide to North American Mushrooms2, it contains 22 variables describing each type of mushroom as well as a classification indicating that it is poisonous (P) or edible (E).
# Load dataset from the UCI repository
mushrooms <- read_csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data"), col_names=FALSE, na='?')
# Add names to the dataset
names(mushrooms) <- c("category","capShape","capSurface","capColor","bruises","odor","gillAttachment","gillSpacing","gillSize","gillColor","stalkShape","stalkRoot","stalkSurfaceAbove","stalkSurfaceBelow","stalkColorAbove","stalkColorBelow","veilType","veilColor","ringNumber","ringType","sporePrintColor","population","habitat")
# Add some factors
mushrooms$category <- factor(mushrooms$category, levels=c("p","e"), labels=c("Poisonous","Edible"))
mushrooms$capColor <- factor(mushrooms$capColor, levels=c("n","b","c","g","r","p","u","e","w","y"), labels=c("brown","buff","cinnamon","grey","green","pink","purple","red","white","yellow"))
mushrooms$capShape <- factor(mushrooms$capShape, levels = c("b","c","x","f","k","s"), labels = c("bell","conical","convex","flat","knobbed","sunken"))
mushrooms$odor <- factor(mushrooms$odor, levels=c("a","l","c","y","f","m","n","p","s"), labels = c("almond","anise","creosote","fishy","foul","musty","none","pungent","spicy"))
Looking at the data, there are a somewhat even split of poisonous and non-poisonous mushrooms respresented:
Narrowing down the dataset to a few easy-to-recognize features, we want to see if there is an easy rule of thumb to know when a mushroom is edible.
shrooms <- mushrooms %>% select(category, capShape, capColor, odor)
First, let’s look at color of the cap:
# Vector of colors
cols <- c("saddlebrown","bisque2","peru","grey","green","pink","purple","red","white","yellow")
# Plot the cap colors
ggplot(data=shrooms) + geom_bar(mapping=aes(x=category, fill=capColor)) + scale_fill_manual(values = cols) + ggtitle("Mushroom Cap Colors","Edible vs. Poisonous") + labs(fill="Cap Color")
It appears that unless you happen upon a purple or green-capped mushroom, the odds are not in your favor that it is edible.
shrooms %>% filter(capColor == "purple" | capColor == "green") %>% {round(prop.table(table(.$category,.$capColor),margin=2),3)}
##
## brown buff cinnamon grey green pink purple red white yellow
## Poisonous 0 0
## Edible 1 1
At best the white-capped mushrooms are nearly 70% likely to be edible - not a chance I’d be willing to take.
shrooms %>% filter(capColor == "white") %>% {prop.table(table(.$category))}
##
## Poisonous Edible
## 0.3076923 0.6923077
Now on to the shape of the cap. Is there some general rule we can make about a mushroom’s safety based on the shape of the cap, rather than color?
# Plot the cap shapes
ggplot(data=shrooms) + geom_bar(mapping=aes(x=category, fill=capShape)) + scale_colour_brewer(palette = "Set3") + ggtitle("Mushroom Cap Shape","Edible vs. Poisonous") + labs(fill="Cap Shape")
Here we see (again) that there is no truly safe shape except the 32 sunken-capped mushrooms:
shrooms %>% filter(capShape == "sunken") %>% {prop.table(table(.$category))}
##
## Poisonous Edible
## 0 1
Bell-shaped caps look semi-safe:
shrooms %>% filter(capShape == "bell") %>% {round(prop.table(table(.$category)),3)}
##
## Poisonous Edible
## 0.106 0.894
but even a nearly-90% chance is still dicey unless there’s no real choice offered to you (say, a survival situation).
Looking deeper at the bell shape, is there a combination of color and shape that can help us a bit more?
shrooms %>% filter(capShape == "bell") %>% {round(prop.table(table(.$category,.$capColor), margin=2),3)}
##
## brown buff cinnamon grey green pink purple red white yellow
## Poisonous 0.071 1.000 0.000 1.000 0.074 0.045
## Edible 0.929 0.000 1.000 0.000 0.926 0.955
It appears that yellow AND bell-shaped is the most safe combination, but even then there’s still some chance of getting a poisonous one.
The odor may also be an easy way to rule out poisonous mushrooms:
# Plot the odor
ggplot(data=shrooms) + geom_bar(mapping=aes(x=category, fill=odor)) + scale_colour_brewer(palette = "Set3") + ggtitle("Mushroom Odor","Edible vs. Poisonous") + labs(fill="Odor")
Finally we have a bit clearer of a line to draw. It appears that some odors are 100% poisonous and some are 100% edible. Interestingly, a vast majority of those with no odor are safe as well.
round(prop.table(table(shrooms$category, shrooms$odor),margin=2),3)
##
## almond anise creosote fishy foul musty none pungent spicy
## Poisonous 0.000 0.000 1.000 1.000 1.000 1.000 0.034 1.000 1.000
## Edible 1.000 1.000 0.000 0.000 0.000 0.000 0.966 0.000 0.000
After looking over the data, it appears that there are a few categories of easy-to-classify features (color, shape, and smell) of mushrooms one could possibly use in order to be sure that a parcticular sample is safe to eat.
Certain odors are safe: namely anise and almonds, while odorless mushrooms are relatively safe in general (~96% of the samples). Some with certain shaped caps were also deemed edible (those rare examples with a sunken cap) or mostly-edible (about 89% of bell-shaped caps). Colors, on the other hand, were poor predictors by themselves unless you’re lucky to find a rare purple or green capped mushroom.
In summary, there are few hard and fast rules that make you 100% safe when choosing to eat a wild mushroom. Unless you are an expert, I’d stick to the produce section of your local supermarket.
The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf↩