Importing the Mushrooms Dataset

Data Source: https://archive.ics.uci.edu/ml/datasets/Mushroom

mushrooms<-read.csv(file="agaricus-lepiota.data.csv", header= FALSE, sep=",")

Original Mushroom Data

head(mushrooms)

Subsetting and Renaming Data

The columns included in the subset of the original mushroom data are cap shape, cap color, stalk shape, and ring number. These columns are all descriptors of the appearance of a mushroom.

library(plyr)
mushrooms_subset <- subset(mushrooms, select=c(1,2,4,11,19))
colnames(mushrooms_subset) <- c("classes",'cap_shape',"cap_color","stalk_shape","ring_number")
mushrooms_subset$classes <- revalue(mushrooms_subset$classes, 
                                    c("p" = "poisonous", "e" = "edible"))
mushrooms_subset$cap_shape <- revalue(mushrooms_subset$cap_shape,
                                      c("b"="bell", "c"="conical", "x"="convex", "f"="flat",
                                        "k"="knobbed", "s"="sunken"))
mushrooms_subset$cap_color <- revalue(mushrooms_subset$cap_color,
                                      c("n"="brown", "b"="buff", "c"="cinnamon", "g"="gray",
                                        "r"="green", "p"="pink", "u"="purple", "e"="red",
                                        "w"="white", "y"="yellow"))
mushrooms_subset$stalk_shape <- revalue(mushrooms_subset$stalk_shape, 
                                    c("e" = "enlarging", "t" = "tapering"))
mushrooms_subset$ring_number <- revalue(mushrooms_subset$ring_number, 
                                    c("n" = "none", "o" = "one", "t" = "two"))
head(mushrooms_subset)

Edible Mushrooms

Without looking at column 1 (classes), how can we determine if a mushroom is poisonous or edible?

Proportion Table of Edible and Poisonous Mushrooms - by Cap Shape

The table below shows the proportion of edible and poisonous mushrooms for each mushroom cap shape type. Observations: In the sample, all mushrooms with sunken caps are edible, and all mushrooms with conical caps are poisonous. Most mushrooms with bell caps are edible and most mushrooms with knobbed caps are poisonous.

prop.table(table(mushrooms_subset$classes, mushrooms_subset$cap_shape),2)
##            
##                  bell   conical      flat   knobbed    sunken    convex
##   edible    0.8938053 0.0000000 0.5063452 0.2753623 1.0000000 0.5328228
##   poisonous 0.1061947 1.0000000 0.4936548 0.7246377 0.0000000 0.4671772

Proportion Table of Edible and Poisonous Mushrooms - by Cap Color

The table below shows the proportion of edible and poisonous mushrooms for each mushroom cap color. Interesting observations: in the sample, all mushrooms with green and purple caps are edible. Most mushrooms with cinnamon caps are edible and most mushrooms with buff caps are poisonous

prop.table(table(mushrooms_subset$classes, mushrooms_subset$cap_color),2)
##            
##                  buff  cinnamon       red      gray     brown      pink
##   edible    0.2857143 0.7272727 0.4160000 0.5608696 0.5534151 0.3888889
##   poisonous 0.7142857 0.2727273 0.5840000 0.4391304 0.4465849 0.6111111
##            
##                 green    purple     white    yellow
##   edible    1.0000000 1.0000000 0.6923077 0.3731343
##   poisonous 0.0000000 0.0000000 0.3076923 0.6268657

Proportion Table of Edible and Poisonous Mushrooms - by Stalk Shape

The table below shows the proportion of edible and poisonous mushrooms for each mushroom stalk shape. There does not seem to be a relationship between stalk shape and whether a mushroom is edible.

prop.table(table(mushrooms_subset$classes, mushrooms_subset$stalk_shape),2)
##            
##             enlarging  tapering
##   edible    0.4596132 0.5625000
##   poisonous 0.5403868 0.4375000

Proportion Table of Edible and Poisonous Mushrooms - by Ring Number

The table below shows the proportion of edible and poisonous mushrooms by number of rings on a mushroom. All sampled mushrooms with no rings are poisonous, and most mushrooms with two rings are edible.

prop.table(table(mushrooms_subset$classes, mushrooms_subset$ring_number),2)
##            
##                 none      one      two
##   edible    0.000000 0.491453 0.880000
##   poisonous 1.000000 0.508547 0.120000

Subset of Edible Mushrooms

From the proportion tables above, we determined that the following characteristics indicate that a mushroom is safe to eat:

The following characteristics indicate that a mushroom is poisonous:

Suppose we need to determine if a mushroom is edible based only on physical characteristics. We can create another subset of the mushrooms data specifying the requirements above.

edible_mushrooms <- subset(mushrooms_subset, cap_shape == 'sunken' | cap_shape == 'bell' | cap_color == 'green' | cap_color == 'purple' | cap_color == 'cinnamon' | ring_number == 'two' & cap_shape != 'conical' & ring_number != 'none' & cap_color != 'buff' & cap_shape != 'knobbed')

summary(edible_mushrooms$classes)
##    edible poisonous 
##       724        84
summary(mushrooms_subset$classes)
##    edible poisonous 
##      4208      3916

In the new edible mushrooms subset, 89.6% of mushrooms are safe to eat; in the original dataset, only 51.8% of mushrooms are edible. Although there are still poisonous mushrooms included in the edible mushrooms subset, the proportion of edible mushrooms is significantly more than in the original subset.