I have chosen to work with the Mushroom Data Set from the UCI Machine Learning Repository website for the problems below. This record was drawn from the Audubon Society Field Guide to North American Muschrooms and includes mushrooms’ physical characteristics, including their classification as poisonous or edible.
library(bitops)
library(RCurl)
# Reading data and changing column names for ease
mushr_url = getURL("https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data")
mushr = read.csv(text = mushr_url, header = F)
names(mushr) = c("Class", "CapShape", "CapSurface",
"CapColor", "Bruising", "Odor", "GillAttachment",
"GillSpacing", "GillSize", "GillColor", "StalkShape",
"StalkRoot", "HighStalk", "LowStalk", "ColorHigh",
"ColorLow", "VeilType", "VeilColor", "RingNumber",
"RingType", "SporeColor", "Population", "Habitat")
head(mushr)
## Class CapShape CapSurface CapColor Bruising Odor GillAttachment
## 1 p x s n t p f
## 2 e x s y t a f
## 3 e b s w t l f
## 4 p x y w t p f
## 5 e x s g f n f
## 6 e x y y t a f
## GillSpacing GillSize GillColor StalkShape StalkRoot HighStalk LowStalk
## 1 c n k e e s s
## 2 c b k e c s s
## 3 c b n e c s s
## 4 c n n e e s s
## 5 w b k t e s s
## 6 c b n e c s s
## ColorHigh ColorLow VeilType VeilColor RingNumber RingType SporeColor
## 1 w w p w o p k
## 2 w w p w o p n
## 3 w w p w o p n
## 4 w w p w o p k
## 5 w w p w o e n
## 6 w w p w o p k
## Population Habitat
## 1 s u
## 2 n g
## 3 n m
## 4 s u
## 5 a g
## 6 n g
Below, I have created a dataframe that subsets the columns to include whether the mushrooms are edible or not, the cap shape, the texture of the caps’ surface, the cap color, whether the mushroom is bruised and the mushrooms’ habitat. The data abbreviations were changed in order to make the entries more understandable upon first glance.
# Subsetting data
sub_mushr = subset(mushr, select = c("Class", "CapShape", "CapSurface", "Bruising", "Habitat"))
head(sub_mushr)
## Class CapShape CapSurface Bruising Habitat
## 1 p x s t u
## 2 e x s t g
## 3 e b s t m
## 4 p x y t u
## 5 e x s f g
## 6 e x y t g
# Data abbreviations changed for each column
levels(sub_mushr$Class) = c("edible", "poisonous")
levels(sub_mushr$CapShape) = c("bell", "conical", "flat", "knobbed", "sunken", "convex")
levels(sub_mushr$CapSurface) = c("fibrous", "grooves", "smooth", "scaly")
levels(sub_mushr$Bruising) = c("none", "bruising")
levels(sub_mushr$Habitat) = c("woods", "grasses", "leaves", "meadows", "paths", "urban", "waste")
head(sub_mushr)
## Class CapShape CapSurface Bruising Habitat
## 1 poisonous convex smooth bruising urban
## 2 edible convex smooth bruising grasses
## 3 edible bell smooth bruising meadows
## 4 poisonous convex scaly bruising urban
## 5 edible convex smooth none grasses
## 6 edible convex scaly bruising grasses