Data Source: https://archive.ics.uci.edu/ml/datasets/Mushroom
mushrooms<-read.csv(file="agaricus-lepiota.data.csv", header= FALSE, sep=",")
head(mushrooms)
The columns included in the subset of the original mushroom data are cap shape, cap color, stalk shape, and ring number. These columns are all descriptors of the appearance of a mushroom.
library(plyr)
mushrooms_subset <- subset(mushrooms, select=c(1,2,4,11,19))
colnames(mushrooms_subset) <- c("classes",'cap_shape',"cap_color","stalk_shape","ring_number")
mushrooms_subset$classes <- revalue(mushrooms_subset$classes,
c("p" = "poisonous", "e" = "edible"))
mushrooms_subset$cap_shape <- revalue(mushrooms_subset$cap_shape,
c("b"="bell", "c"="conical", "x"="convex", "f"="flat",
"k"="knobbed", "s"="sunken"))
mushrooms_subset$cap_color <- revalue(mushrooms_subset$cap_color,
c("n"="brown", "b"="buff", "c"="cinnamon", "g"="gray",
"r"="green", "p"="pink", "u"="purple", "e"="red",
"w"="white", "y"="yellow"))
mushrooms_subset$stalk_shape <- revalue(mushrooms_subset$stalk_shape,
c("e" = "enlarging", "t" = "tapering"))
mushrooms_subset$ring_number <- revalue(mushrooms_subset$ring_number,
c("n" = "none", "o" = "one", "t" = "two"))
head(mushrooms_subset)
Without looking at column 1 (classes), how can we determine if a mushroom is poisonous or edible?
Proportion Table of Edible and Poisonous Mushrooms - by Cap Shape
The table below shows the proportion of edible and poisonous mushrooms for each mushroom cap shape type. Observations: In the sample, all mushrooms with sunken caps are edible, and all mushrooms with conical caps are poisonous. Most mushrooms with bell caps are edible and most mushrooms with knobbed caps are poisonous.
prop.table(table(mushrooms_subset$classes, mushrooms_subset$cap_shape),2)
##
## bell conical flat knobbed sunken convex
## edible 0.8938053 0.0000000 0.5063452 0.2753623 1.0000000 0.5328228
## poisonous 0.1061947 1.0000000 0.4936548 0.7246377 0.0000000 0.4671772
Proportion Table of Edible and Poisonous Mushrooms - by Cap Color
The table below shows the proportion of edible and poisonous mushrooms for each mushroom cap color. Interesting observations: in the sample, all mushrooms with green and purple caps are edible. Most mushrooms with cinnamon caps are edible and most mushrooms with buff caps are poisonous
prop.table(table(mushrooms_subset$classes, mushrooms_subset$cap_color),2)
##
## buff cinnamon red gray brown pink
## edible 0.2857143 0.7272727 0.4160000 0.5608696 0.5534151 0.3888889
## poisonous 0.7142857 0.2727273 0.5840000 0.4391304 0.4465849 0.6111111
##
## green purple white yellow
## edible 1.0000000 1.0000000 0.6923077 0.3731343
## poisonous 0.0000000 0.0000000 0.3076923 0.6268657
Proportion Table of Edible and Poisonous Mushrooms - by Stalk Shape
The table below shows the proportion of edible and poisonous mushrooms for each mushroom stalk shape. There does not seem to be a relationship between stalk shape and whether a mushroom is edible.
prop.table(table(mushrooms_subset$classes, mushrooms_subset$stalk_shape),2)
##
## enlarging tapering
## edible 0.4596132 0.5625000
## poisonous 0.5403868 0.4375000
Proportion Table of Edible and Poisonous Mushrooms - by Ring Number
The table below shows the proportion of edible and poisonous mushrooms by number of rings on a mushroom. All sampled mushrooms with no rings are poisonous, and most mushrooms with two rings are edible.
prop.table(table(mushrooms_subset$classes, mushrooms_subset$ring_number),2)
##
## none one two
## edible 0.000000 0.491453 0.880000
## poisonous 1.000000 0.508547 0.120000
From the proportion tables above, we determined that the following characteristics indicate that a mushroom is safe to eat:
The following characteristics indicate that a mushroom is poisonous:
Suppose we need to determine if a mushroom is edible based only on physical characteristics. We can create another subset of the mushrooms data specifying the requirements above.
edible_mushrooms <- subset(mushrooms_subset, cap_shape == 'sunken' | cap_shape == 'bell' | cap_color == 'green' | cap_color == 'purple' | cap_color == 'cinnamon' | ring_number == 'two' & cap_shape != 'conical' & ring_number != 'none' & cap_color != 'buff' & cap_shape != 'knobbed')
summary(edible_mushrooms$classes)
## edible poisonous
## 724 84
summary(mushrooms_subset$classes)
## edible poisonous
## 4208 3916
In the new edible mushrooms subset, 89.6% of mushrooms are safe to eat; in the original dataset, only 51.8% of mushrooms are edible. Although there are still poisonous mushrooms included in the edible mushrooms subset, the proportion of edible mushrooms is significantly more than in the original subset.