Data: Mushroom Dataset in the UCI repository. Data URL: https://archive.ics.uci.edu/ml/datasets/Mushroom
Let’s first study the data in it’s current state.
require(ggplot2)
## Loading required package: ggplot2
theURL <- read.csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data"))
mushrooms <- data.frame(theURL)
Let’s look at the summary of the data frame.
summary(mushrooms)
## p x s n t p.1
## e:4208 b: 452 f:2320 n :2283 f:4748 n :3528
## p:3915 c: 4 g: 4 g :1840 t:3375 f :2160
## f:3152 s:2555 e :1500 s : 576
## k: 828 y:3244 y :1072 y : 576
## s: 32 w :1040 a : 400
## x:3655 b : 168 l : 400
## (Other): 220 (Other): 483
## f c n.1 k e e.1 s.1
## a: 210 c:6811 b:5612 b :1728 e:3515 ?:2480 f: 552
## f:7913 w:1312 n:2511 p :1492 t:4608 b:3776 k:2372
## w :1202 c: 556 s:5175
## n :1048 e:1119 y: 24
## g : 752 r: 192
## h : 732
## (Other):1169
## s.2 w w.1 p.2 w.2 o
## f: 600 w :4463 w :4383 p:8123 n: 96 n: 36
## k:2304 p :1872 p :1872 o: 96 o:7487
## s:4935 g : 576 g : 576 w:7923 t: 600
## y: 284 n : 448 n : 512 y: 8
## b : 432 b : 432
## o : 192 o : 192
## (Other): 140 (Other): 156
## p.3 k.1 s.3 u
## e:2776 w :2388 a: 384 d:3148
## f: 48 n :1968 c: 340 g:2148
## l:1296 k :1871 n: 400 l: 832
## n: 36 h :1632 s:1247 m: 292
## p:3967 r : 72 v:4040 p:1144
## b : 48 y:1712 u: 367
## (Other): 144 w: 192
Let’s look at the first few entries of the data frame.
head(mushrooms)
## p x s n t p.1 f c n.1 k e e.1 s.1 s.2 w w.1 p.2 w.2 o p.3 k.1 s.3 u
## 1 e x s y t a f c b k e c s s w w p w o p n n g
## 2 e b s w t l f c b n e c s s w w p w o p n n m
## 3 p x y w t p f c n n e e s s w w p w o p k s u
## 4 e x s g f n f w b k t e s s w w p w o e n a g
## 5 e x y y t a f c b n e c s s w w p w o p k n g
## 6 e b s w t a f c b g e c s s w w p w o p k n m
Let’s take a subset of this data frame, and rename the columns to more appropriate title.
mushrooms <- data.frame(Type = mushrooms$p, CapShape = mushrooms$x, Odor = mushrooms$p.1, Habitat = mushrooms$u)
head(mushrooms)
## Type CapShape Odor Habitat
## 1 e x a g
## 2 e b l m
## 3 p x p u
## 4 e x n g
## 5 e x a g
## 6 e b a m
Let’s rename the values in the factors in order to make the data easier to understand at a glance.
#Replace values in Type column
levels(mushrooms$Type) <- c(levels(mushrooms$Type), "Edible", "Poisonous")
mushrooms$Type[mushrooms$Type == "e"] <- "Edible"
mushrooms$Type[mushrooms$Type == "p"] <- "Poisonous"
mushrooms$Type <- factor(mushrooms$Type)
#Replace values in CapShape column
levels(mushrooms$CapShape) <- c(levels(mushrooms$CapShape), "Bell", "Conical", "Convex", "Flat", "Knobbed", "Sunken")
mushrooms$CapShape[mushrooms$CapShape == "b"] <- "Bell"
mushrooms$CapShape[mushrooms$CapShape == "c"] <- "Conical"
mushrooms$CapShape[mushrooms$CapShape == "x"] <- "Convex"
mushrooms$CapShape[mushrooms$CapShape == "f"] <- "Flat"
mushrooms$CapShape[mushrooms$CapShape == "k"] <- "Knobbed"
mushrooms$CapShape[mushrooms$CapShape == "s"] <- "Sunken"
mushrooms$CapShape <- factor(mushrooms$CapShape)
#Replace values in Odor column
levels(mushrooms$Odor) <- c(levels(mushrooms$Odor), "Almond", "Anise", "Creosote", "Fishy", "Foul", "Musty", "None", "Pungent", "Spicy")
mushrooms$Odor[mushrooms$Odor == "a"] <- "Almond"
mushrooms$Odor[mushrooms$Odor == "l"] <- "Anise"
mushrooms$Odor[mushrooms$Odor == "c"] <- "Creosote"
mushrooms$Odor[mushrooms$Odor == "y"] <- "Fishy"
mushrooms$Odor[mushrooms$Odor == "f"] <- "Foul"
mushrooms$Odor[mushrooms$Odor == "m"] <- "Musty"
mushrooms$Odor[mushrooms$Odor == "n"] <- "None"
mushrooms$Odor[mushrooms$Odor == "p"] <- "Pungent"
mushrooms$Odor[mushrooms$Odor == "s"] <- "Spicy"
mushrooms$Odor <- factor(mushrooms$Odor)
#Replace values in Habitat column
levels(mushrooms$Habitat) <- c(levels(mushrooms$Habitat), "Grasses", "Leaves", "Meadows", "Paths", "Urban", "Waste", "Woods")
mushrooms$Habitat[mushrooms$Habitat == "g"] <- "Grasses"
mushrooms$Habitat[mushrooms$Habitat == "l"] <- "Leaves"
mushrooms$Habitat[mushrooms$Habitat == "m"] <- "Meadows"
mushrooms$Habitat[mushrooms$Habitat == "p"] <- "Paths"
mushrooms$Habitat[mushrooms$Habitat == "u"] <- "Urban"
mushrooms$Habitat[mushrooms$Habitat == "w"] <- "Waste"
mushrooms$Habitat[mushrooms$Habitat == "d"] <- "Woods"
mushrooms$Habitat <- factor(mushrooms$Habitat)
summary(mushrooms)
## Type CapShape Odor Habitat
## Edible :4208 Bell : 452 None :3528 Grasses:2148
## Poisonous:3915 Conical: 4 Foul :2160 Leaves : 832
## Convex :3655 Fishy : 576 Meadows: 292
## Flat :3152 Spicy : 576 Paths :1144
## Knobbed: 828 Almond : 400 Urban : 367
## Sunken : 32 Anise : 400 Waste : 192
## (Other): 483 Woods :3148
head(mushrooms)
## Type CapShape Odor Habitat
## 1 Edible Convex Almond Grasses
## 2 Edible Bell Anise Meadows
## 3 Poisonous Convex Pungent Urban
## 4 Edible Convex None Grasses
## 5 Edible Convex Almond Grasses
## 6 Edible Bell Almond Meadows
I want to be able to do comparisons between edible and poisonous mushrooms, so I will create a subset for each.
edib <- subset(mushrooms, Type == 'Edible')
pois <- subset(mushrooms, Type == 'Poisonous')
head(edib)
## Type CapShape Odor Habitat
## 1 Edible Convex Almond Grasses
## 2 Edible Bell Anise Meadows
## 4 Edible Convex None Grasses
## 5 Edible Convex Almond Grasses
## 6 Edible Bell Almond Meadows
## 7 Edible Bell Anise Meadows
summary(edib)
## Type CapShape Odor Habitat
## Edible :4208 Bell : 404 None :3408 Grasses:1408
## Poisonous: 0 Conical: 0 Almond : 400 Leaves : 240
## Convex :1948 Anise : 400 Meadows: 256
## Flat :1596 Creosote: 0 Paths : 136
## Knobbed: 228 Fishy : 0 Urban : 96
## Sunken : 32 Foul : 0 Waste : 192
## (Other) : 0 Woods :1880
head(pois)
## Type CapShape Odor Habitat
## 3 Poisonous Convex Pungent Urban
## 8 Poisonous Convex Pungent Grasses
## 13 Poisonous Convex Pungent Urban
## 17 Poisonous Convex Pungent Grasses
## 18 Poisonous Convex Pungent Urban
## 19 Poisonous Convex Pungent Urban
summary(pois)
## Type CapShape Odor Habitat
## Edible : 0 Bell : 48 Foul :2160 Grasses: 740
## Poisonous:3915 Conical: 4 Fishy : 576 Leaves : 592
## Convex :1707 Spicy : 576 Meadows: 36
## Flat :1556 Pungent : 255 Paths :1008
## Knobbed: 600 Creosote: 192 Urban : 271
## Sunken : 0 None : 120 Waste : 0
## (Other) : 36 Woods :1268
ggplot(mushrooms, aes(y = Type, x = CapShape)) + ggtitle("Cap Shapes Distribution of Mushrooms") + geom_boxplot()
barplot(table(edib$CapShape), main = "Cap Shapes Distribution of Edible Mushrooms")
barplot(table(pois$CapShape), main = "Cap Shapes Distribution of Poisonous Mushrooms")
ggplot(mushrooms, aes(y = Type, x = Odor)) + ggtitle("Odor Distribution of Mushrooms") + geom_boxplot()
barplot(table(edib$Odor), main = "Odor Distribution of Edible Mushrooms")
barplot(table(pois$Odor), main = "Odor Distribution of Poisonous Mushrooms")
ggplot(mushrooms, aes(y = Type, x = Habitat)) + ggtitle("Habitat Distribution of Mushrooms") + geom_boxplot()
barplot(table(edib$Habitat), main = "Habitat Distribution of Edible Mushrooms")
barplot(table(pois$Habitat), main = "Habitat Distribution of Poisonous Mushrooms")
This data shows that poisonous and edible mushrooms have some similarities in their cap shapes, odor, and habitat as some of the data overlaps, but there are also places of differentiation in the distribution.
Cap Shape Looking at the Cap Shapes Distribution of Mushrooms table we can see that poisonous mushrooms have a wide range of possible cap shapes, but that a mushroom with a bell or sunken cap is more likely to be edible. However, it is still important to look at the Cap Shapes Distribution of Poisonous Mushrooms table and the Cap Shapes Distribution of Edible Mushrooms table, because it highlights that the majority of both kinds of mushrooms have the convex or flat cap shape.
Odor Looking at the Odor Distribution of Mushrooms table we can see that poisonous mushrooms have a wide range of possible odors, except for anise or almond odor. The Odor Distribution of Poisonous Mushrooms table shows that most poisonous mushrooms have a foul odor, but they could still be odorless or have other scents. The Odor Distribution of Edible Mushrooms table shows that most edible mushrooms have no odor.
Habitat Looking at the Habitat Distribution of Mushrooms table we both mushrooms have a wide range of overlapping habitats. The main differentiation is that edible mushrooms are mostly in the grasses or woods as shown by the Habitat Distribution of Edible Mushrooms table, whereas the poisnous mushrooms are distributed throughout more habitats.
Overall, if you’re lost in the grasses or woods and have to eat mushrooms, try to find a sunken cap mushroom with almond or anise odor.