Assignment - Loading Data into a Data Frame

Data: Mushroom Dataset in the UCI repository. Data URL: https://archive.ics.uci.edu/ml/datasets/Mushroom

Let’s first study the data in it’s current state.

require(ggplot2)
## Loading required package: ggplot2
theURL <- read.csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data"))
mushrooms <- data.frame(theURL)

Let’s look at the summary of the data frame.

summary(mushrooms)
##  p        x        s              n        t             p.1      
##  e:4208   b: 452   f:2320   n      :2283   f:4748   n      :3528  
##  p:3915   c:   4   g:   4   g      :1840   t:3375   f      :2160  
##           f:3152   s:2555   e      :1500            s      : 576  
##           k: 828   y:3244   y      :1072            y      : 576  
##           s:  32            w      :1040            a      : 400  
##           x:3655            b      : 168            l      : 400  
##                             (Other): 220            (Other): 483  
##  f        c        n.1            k        e        e.1      s.1     
##  a: 210   c:6811   b:5612   b      :1728   e:3515   ?:2480   f: 552  
##  f:7913   w:1312   n:2511   p      :1492   t:4608   b:3776   k:2372  
##                             w      :1202            c: 556   s:5175  
##                             n      :1048            e:1119   y:  24  
##                             g      : 752            r: 192           
##                             h      : 732                             
##                             (Other):1169                             
##  s.2            w             w.1       p.2      w.2      o       
##  f: 600   w      :4463   w      :4383   p:8123   n:  96   n:  36  
##  k:2304   p      :1872   p      :1872            o:  96   o:7487  
##  s:4935   g      : 576   g      : 576            w:7923   t: 600  
##  y: 284   n      : 448   n      : 512            y:   8           
##           b      : 432   b      : 432                             
##           o      : 192   o      : 192                             
##           (Other): 140   (Other): 156                             
##  p.3           k.1       s.3      u       
##  e:2776   w      :2388   a: 384   d:3148  
##  f:  48   n      :1968   c: 340   g:2148  
##  l:1296   k      :1871   n: 400   l: 832  
##  n:  36   h      :1632   s:1247   m: 292  
##  p:3967   r      :  72   v:4040   p:1144  
##           b      :  48   y:1712   u: 367  
##           (Other): 144            w: 192

Let’s look at the first few entries of the data frame.

head(mushrooms)
##   p x s n t p.1 f c n.1 k e e.1 s.1 s.2 w w.1 p.2 w.2 o p.3 k.1 s.3 u
## 1 e x s y t   a f c   b k e   c   s   s w   w   p   w o   p   n   n g
## 2 e b s w t   l f c   b n e   c   s   s w   w   p   w o   p   n   n m
## 3 p x y w t   p f c   n n e   e   s   s w   w   p   w o   p   k   s u
## 4 e x s g f   n f w   b k t   e   s   s w   w   p   w o   e   n   a g
## 5 e x y y t   a f c   b n e   c   s   s w   w   p   w o   p   k   n g
## 6 e b s w t   a f c   b g e   c   s   s w   w   p   w o   p   k   n m

Let’s take a subset of this data frame, and rename the columns to more appropriate title.

mushrooms <- data.frame(Type = mushrooms$p, CapShape = mushrooms$x, Odor = mushrooms$p.1, Habitat = mushrooms$u)

head(mushrooms)
##   Type CapShape Odor Habitat
## 1    e        x    a       g
## 2    e        b    l       m
## 3    p        x    p       u
## 4    e        x    n       g
## 5    e        x    a       g
## 6    e        b    a       m

Let’s rename the values in the factors in order to make the data easier to understand at a glance.

#Replace values in Type column
levels(mushrooms$Type) <- c(levels(mushrooms$Type), "Edible", "Poisonous")
mushrooms$Type[mushrooms$Type == "e"] <- "Edible"
mushrooms$Type[mushrooms$Type == "p"] <- "Poisonous"
mushrooms$Type <- factor(mushrooms$Type)

#Replace values in CapShape column
levels(mushrooms$CapShape) <- c(levels(mushrooms$CapShape), "Bell", "Conical", "Convex", "Flat", "Knobbed", "Sunken")
mushrooms$CapShape[mushrooms$CapShape == "b"] <- "Bell"
mushrooms$CapShape[mushrooms$CapShape == "c"] <- "Conical"
mushrooms$CapShape[mushrooms$CapShape == "x"] <- "Convex"
mushrooms$CapShape[mushrooms$CapShape == "f"] <- "Flat"
mushrooms$CapShape[mushrooms$CapShape == "k"] <- "Knobbed"
mushrooms$CapShape[mushrooms$CapShape == "s"] <- "Sunken"
mushrooms$CapShape <- factor(mushrooms$CapShape)

#Replace values in Odor column
levels(mushrooms$Odor) <- c(levels(mushrooms$Odor), "Almond", "Anise", "Creosote", "Fishy", "Foul", "Musty", "None", "Pungent", "Spicy")
mushrooms$Odor[mushrooms$Odor == "a"] <- "Almond"
mushrooms$Odor[mushrooms$Odor == "l"] <- "Anise"
mushrooms$Odor[mushrooms$Odor == "c"] <- "Creosote"
mushrooms$Odor[mushrooms$Odor == "y"] <- "Fishy"
mushrooms$Odor[mushrooms$Odor == "f"] <- "Foul"
mushrooms$Odor[mushrooms$Odor == "m"] <- "Musty"
mushrooms$Odor[mushrooms$Odor == "n"] <- "None"
mushrooms$Odor[mushrooms$Odor == "p"] <- "Pungent"
mushrooms$Odor[mushrooms$Odor == "s"] <- "Spicy"
mushrooms$Odor <- factor(mushrooms$Odor)

#Replace values in Habitat column
levels(mushrooms$Habitat) <- c(levels(mushrooms$Habitat), "Grasses", "Leaves", "Meadows", "Paths", "Urban", "Waste", "Woods")
mushrooms$Habitat[mushrooms$Habitat == "g"] <- "Grasses"
mushrooms$Habitat[mushrooms$Habitat == "l"] <- "Leaves"
mushrooms$Habitat[mushrooms$Habitat == "m"] <- "Meadows"
mushrooms$Habitat[mushrooms$Habitat == "p"] <- "Paths"
mushrooms$Habitat[mushrooms$Habitat == "u"] <- "Urban"
mushrooms$Habitat[mushrooms$Habitat == "w"] <- "Waste"
mushrooms$Habitat[mushrooms$Habitat == "d"] <- "Woods"
mushrooms$Habitat <- factor(mushrooms$Habitat)

summary(mushrooms)
##         Type         CapShape         Odor         Habitat    
##  Edible   :4208   Bell   : 452   None   :3528   Grasses:2148  
##  Poisonous:3915   Conical:   4   Foul   :2160   Leaves : 832  
##                   Convex :3655   Fishy  : 576   Meadows: 292  
##                   Flat   :3152   Spicy  : 576   Paths  :1144  
##                   Knobbed: 828   Almond : 400   Urban  : 367  
##                   Sunken :  32   Anise  : 400   Waste  : 192  
##                                  (Other): 483   Woods  :3148
head(mushrooms)
##        Type CapShape    Odor Habitat
## 1    Edible   Convex  Almond Grasses
## 2    Edible     Bell   Anise Meadows
## 3 Poisonous   Convex Pungent   Urban
## 4    Edible   Convex    None Grasses
## 5    Edible   Convex  Almond Grasses
## 6    Edible     Bell  Almond Meadows

I want to be able to do comparisons between edible and poisonous mushrooms, so I will create a subset for each.

edib <- subset(mushrooms, Type == 'Edible')
pois <- subset(mushrooms, Type == 'Poisonous')
head(edib)
##     Type CapShape   Odor Habitat
## 1 Edible   Convex Almond Grasses
## 2 Edible     Bell  Anise Meadows
## 4 Edible   Convex   None Grasses
## 5 Edible   Convex Almond Grasses
## 6 Edible     Bell Almond Meadows
## 7 Edible     Bell  Anise Meadows
summary(edib)
##         Type         CapShape          Odor         Habitat    
##  Edible   :4208   Bell   : 404   None    :3408   Grasses:1408  
##  Poisonous:   0   Conical:   0   Almond  : 400   Leaves : 240  
##                   Convex :1948   Anise   : 400   Meadows: 256  
##                   Flat   :1596   Creosote:   0   Paths  : 136  
##                   Knobbed: 228   Fishy   :   0   Urban  :  96  
##                   Sunken :  32   Foul    :   0   Waste  : 192  
##                                  (Other) :   0   Woods  :1880
head(pois)
##         Type CapShape    Odor Habitat
## 3  Poisonous   Convex Pungent   Urban
## 8  Poisonous   Convex Pungent Grasses
## 13 Poisonous   Convex Pungent   Urban
## 17 Poisonous   Convex Pungent Grasses
## 18 Poisonous   Convex Pungent   Urban
## 19 Poisonous   Convex Pungent   Urban
summary(pois)
##         Type         CapShape          Odor         Habitat    
##  Edible   :   0   Bell   :  48   Foul    :2160   Grasses: 740  
##  Poisonous:3915   Conical:   4   Fishy   : 576   Leaves : 592  
##                   Convex :1707   Spicy   : 576   Meadows:  36  
##                   Flat   :1556   Pungent : 255   Paths  :1008  
##                   Knobbed: 600   Creosote: 192   Urban  : 271  
##                   Sunken :   0   None    : 120   Waste  :   0  
##                                  (Other) :  36   Woods  :1268

Let’s see the cap shapes of Poisonous and Edible mushrooms.

ggplot(mushrooms, aes(y = Type, x = CapShape)) + ggtitle("Cap Shapes Distribution of Mushrooms") + geom_boxplot()

barplot(table(edib$CapShape), main = "Cap Shapes Distribution of Edible Mushrooms")

barplot(table(pois$CapShape), main = "Cap Shapes Distribution of Poisonous Mushrooms")

Let’s see the odor of Poisonous and Edible mushrooms.

ggplot(mushrooms, aes(y = Type, x = Odor)) + ggtitle("Odor Distribution of Mushrooms") + geom_boxplot()

barplot(table(edib$Odor), main = "Odor Distribution of Edible Mushrooms")

barplot(table(pois$Odor), main = "Odor Distribution of Poisonous Mushrooms")

Let’s see the habitat of Poisonous and Edible mushrooms.

ggplot(mushrooms, aes(y = Type, x = Habitat)) + ggtitle("Habitat Distribution of Mushrooms") + geom_boxplot()

barplot(table(edib$Habitat), main = "Habitat Distribution of Edible Mushrooms")

barplot(table(pois$Habitat), main = "Habitat Distribution of Poisonous Mushrooms")

Conclusion

This data shows that poisonous and edible mushrooms have some similarities in their cap shapes, odor, and habitat as some of the data overlaps, but there are also places of differentiation in the distribution.

Cap Shape Looking at the Cap Shapes Distribution of Mushrooms table we can see that poisonous mushrooms have a wide range of possible cap shapes, but that a mushroom with a bell or sunken cap is more likely to be edible. However, it is still important to look at the Cap Shapes Distribution of Poisonous Mushrooms table and the Cap Shapes Distribution of Edible Mushrooms table, because it highlights that the majority of both kinds of mushrooms have the convex or flat cap shape.

Odor Looking at the Odor Distribution of Mushrooms table we can see that poisonous mushrooms have a wide range of possible odors, except for anise or almond odor. The Odor Distribution of Poisonous Mushrooms table shows that most poisonous mushrooms have a foul odor, but they could still be odorless or have other scents. The Odor Distribution of Edible Mushrooms table shows that most edible mushrooms have no odor.

Habitat Looking at the Habitat Distribution of Mushrooms table we both mushrooms have a wide range of overlapping habitats. The main differentiation is that edible mushrooms are mostly in the grasses or woods as shown by the Habitat Distribution of Edible Mushrooms table, whereas the poisnous mushrooms are distributed throughout more habitats.

Overall, if you’re lost in the grasses or woods and have to eat mushrooms, try to find a sunken cap mushroom with almond or anise odor.